[computer-go] Lazy Evaluation in Monte Carlo/Alpha Beta Search
for Viking4
Don Dailey
drd at mit.edu
Sun Jul 23 06:21:41 PDT 2006
Magnus,
I have a pure alpha/beta searcher that uses monte/carlo as an evaluator
and I have been doing lazy evaluation for a long time. Here is how I
control it:
1. I have a single parameter I call FUDGE
2. Let's say I am doing 1000 simulations:
I tally up wins (-1 or 1 depending on win or loss)
I tally up games as I go.
I compute an upper bound by pretending that I played an
additional FUDGE games and won them all.
I computer a lower bound by pretending I LOST an additional
FUDGE games.
3. If the 2 bounds are outside (or equal to) alpha and beta, I
can stop evaluating.
This is a huge speedup. The single parameter FUDGE controls how
"lazy" I want to be. I lot of testing helps determine how to set FUDGE
to the correct values. The intuitions behind this is if I can lose N
games in a row and still get a beta cutoff, I am probably wasting my
time and should stop.
2 more general principles I discovered:
1. Less simulations doesn't always speed up the search because if
the evaluation isn't relatively stable, your move ordering
heuristics suffer. Same with more agressive FUDGE.
2. It appears that you can "get away" with fewer simulations at
deeper depths.
3. If you "cache" the evaluations, you can do multiple pass
iterations to good advantage (easier time-control management.)
For instance if you have time left over, but not enough to do
a complete iteration, you can do the same iteration over with
a higher "quality" evaluation function.
- Don
On Sun, 2006-07-23 at 13:30 +0200, Magnus Persson wrote:
> Hi!
>
> I once had a plan to write a paper about a simple but effective improvement I
> use in Viking4, but I have been lazy and is to busy doing other stuff so I
> think I just write it down here. It might also be the case other people already
> do this since it is such a simple idea. Also it may be possible to improve it,
> and then I would be thankful if you reported your insights here.
>
> Viking4 uses Monte Carlo simulations as en evaluation function. A lot of the
> playing strength comes from the simulations which are only random perhaps
> 80-90% of the time but the idea presented here should work with any Monte Carlo
> program using alpha beta search.
>
> Many chess programs uses lazy evaluation. The idea (not that I never wrote a
> chess program myself) is that if one can compute a quick evaluation (using
> material only for example) and has an upper or lower bound for how much the
> advanced full (and slow) evaluation would change the score one can safely
> terminate the evaluation process because it is not necessary to know the exact
> score if it is lower than alpha or higher than beta.
>
> A probabilistic version of lazy evaluation is possible for monte carlo
> simulation go programs. Assume for example that we make 1000 simulations for
> one evaluation. The score is the probability of winning. After 100 simulations
> we might have a low or very high score, either it is lower than alpha (so bad
> we do not want to play it) or it is much higher than beta (so good that the
> opponent has a better move to play earlier and we will never reach this
> position and we can stop search - a beta cut off). If we now stop search we
> have a speed up of 90% less evaluations!
>
> For Viking this method means about 1 ply deeper search on 9x9, depending on the
> position. It is effective already at the first ply, since if the best move is
> found early the remaining very bad moves will all be evaluated lazily. There is
> a cost though. Because of sampling errors one cannot be sure that the score
> (probability of winning) at 100 samples is reliable so lazy evaluation will
> make make some errors compared to full evaluation at a given search depth. But
> searching deeper is very beneficial in go, at least as deep as I have been able
> to test viking4. For fixed thinking time I have solid evidence that the program
> play better with lazy evaluation. Searching to fixed depth there is small
> decrement in playing strength. But the tradeoff is a benefit for playing
> strength of Viking.
>
> So far I just presented the idea. But the idea can probably be implemented in
> many ways. Here follows more in detail how I do it.
>
> I have three parameters:
> a) The number of maximum simulations used in full evaluation.
> b) The minimum simulations necessary before a lazy evaluation is allowed to
> terminate sampling. I use 40.
> c) A 95% confidence interval (computed 1.96*STDEV as in basic statistics). If
> the bounds of the lazy evaluation score is outside the alpha-beta interval then
> evaluation stops. I treat the scores for alpha and beta as absolute values. The
> 95% comes from systematic experimentation. Values less than 95% makes to many
> mistakes and higher than 95% does not lead to a speedup. Note that the
> parameters turned out to be what for example social scientists use as a rule of
> thumb.
>
> Some comments about the parameters:
>
> a) The amount of simulations one need for a full evaluation depends on the
> quality of the pseudo-random monte carlo simulation. A program that only avoids
> filling its own eyes will only have small differences between the best and the
> worst moves in any given position. Viking has a lot of blunder-avoidance go
> knowledge so it does not need so many samples to separate a good move from a
> bad. I have not tried this empirically but the speedup of lazy evaluation may
> be a function of how good the evaluation is.
>
> b) Early on I had too low values for the minimum samples allowed, and then the
> program often made blunders because of lazy evaluation although the speedup was
> amazing. With a minimum sample of 40 simulations the speedup is moderate and the
> errors few.
>
> c) The 95%-confidence interval is computed as if the score (probability of
> winning) was normally distributed. But is is not. It is actually binomially
> distributed - and as far as I understand one cannot cannot compute exact
> confidence intervals for binomial distributions. The approximations I have seen
> are also too complex to be used in viking4. There are two factors that makes
> this less of a problem. First, in all interesting positions the score is about
> 50% of winning/losing which is in the range where the normal distribution
> assumption is most accurate. Second, the second parameter (which one should
> set by thorough testing) makes the assumption more valid the higher it is.
> Nethertheless I think I have no real good solution for the cases where the
> score is close to 0 or 100% where any method of computing confidence intervals
> break down (I once played around with the bootstrap method of computing
> confidence intervals from binomial distributions I generated myself and it
> gets really weird when p is close to 0 or 1). In the end of the game this
> become important because then a game is either won or lost, but there might be
> some tactical traps that has to be avoided with little time left to search and
> I think viking4 has lost games because it does handle these things correctly
> and/or efficently. This is an area where I would be thankful if someone bright
> could come up with some deeper understanding that is easy to implement.
>
> I started out my Monte-Carlo project building move trees in memory from the
> simulations as discussed by several others. I think I was quite successful in
> this but not as successful as Crazystone is.
>
> Since I had experimented with alpha beta earlier I realized it would be easy for
> me to reuse the monte carlo program in an alpha beta search. I think that alpha
> beta/iterative deepening with lazy evaluation and good move ordering is better
> than building trees in memory. The reason is that random search will search a
> lot of games that alpha beta will simply cut off. But doing comparisons is not
> easy since alpha beta can be mean something between perfect move ordering and
> random move ordering and any implementation is somewhere in between. I think
> Viking4 has a reasonable good move ordering, and the efficiency of lazy
> evaluation depend on move ordering so if you try lazy evaluation and it does
> not work you might need to improve the move ordering. Some ways of measuring
> the efficiency of lazy evaluation is to count the number of simulations saved,
> and also compare the number of evals per second with and without lazy
> evaluations.
>
> Best
> MP
>
More information about the computer-go
mailing list