[computer-go] Lazy Evaluation in Monte Carlo/Alpha Beta Search for
Viking4
Magnus Persson
magnus.persson at phmp.se
Sun Jul 23 04:30:55 PDT 2006
Hi!
I once had a plan to write a paper about a simple but effective improvement I
use in Viking4, but I have been lazy and is to busy doing other stuff so I
think I just write it down here. It might also be the case other people already
do this since it is such a simple idea. Also it may be possible to improve it,
and then I would be thankful if you reported your insights here.
Viking4 uses Monte Carlo simulations as en evaluation function. A lot of the
playing strength comes from the simulations which are only random perhaps
80-90% of the time but the idea presented here should work with any Monte Carlo
program using alpha beta search.
Many chess programs uses lazy evaluation. The idea (not that I never wrote a
chess program myself) is that if one can compute a quick evaluation (using
material only for example) and has an upper or lower bound for how much the
advanced full (and slow) evaluation would change the score one can safely
terminate the evaluation process because it is not necessary to know the exact
score if it is lower than alpha or higher than beta.
A probabilistic version of lazy evaluation is possible for monte carlo
simulation go programs. Assume for example that we make 1000 simulations for
one evaluation. The score is the probability of winning. After 100 simulations
we might have a low or very high score, either it is lower than alpha (so bad
we do not want to play it) or it is much higher than beta (so good that the
opponent has a better move to play earlier and we will never reach this
position and we can stop search - a beta cut off). If we now stop search we
have a speed up of 90% less evaluations!
For Viking this method means about 1 ply deeper search on 9x9, depending on the
position. It is effective already at the first ply, since if the best move is
found early the remaining very bad moves will all be evaluated lazily. There is
a cost though. Because of sampling errors one cannot be sure that the score
(probability of winning) at 100 samples is reliable so lazy evaluation will
make make some errors compared to full evaluation at a given search depth. But
searching deeper is very beneficial in go, at least as deep as I have been able
to test viking4. For fixed thinking time I have solid evidence that the program
play better with lazy evaluation. Searching to fixed depth there is small
decrement in playing strength. But the tradeoff is a benefit for playing
strength of Viking.
So far I just presented the idea. But the idea can probably be implemented in
many ways. Here follows more in detail how I do it.
I have three parameters:
a) The number of maximum simulations used in full evaluation.
b) The minimum simulations necessary before a lazy evaluation is allowed to
terminate sampling. I use 40.
c) A 95% confidence interval (computed 1.96*STDEV as in basic statistics). If
the bounds of the lazy evaluation score is outside the alpha-beta interval then
evaluation stops. I treat the scores for alpha and beta as absolute values. The
95% comes from systematic experimentation. Values less than 95% makes to many
mistakes and higher than 95% does not lead to a speedup. Note that the
parameters turned out to be what for example social scientists use as a rule of
thumb.
Some comments about the parameters:
a) The amount of simulations one need for a full evaluation depends on the
quality of the pseudo-random monte carlo simulation. A program that only avoids
filling its own eyes will only have small differences between the best and the
worst moves in any given position. Viking has a lot of blunder-avoidance go
knowledge so it does not need so many samples to separate a good move from a
bad. I have not tried this empirically but the speedup of lazy evaluation may
be a function of how good the evaluation is.
b) Early on I had too low values for the minimum samples allowed, and then the
program often made blunders because of lazy evaluation although the speedup was
amazing. With a minimum sample of 40 simulations the speedup is moderate and the
errors few.
c) The 95%-confidence interval is computed as if the score (probability of
winning) was normally distributed. But is is not. It is actually binomially
distributed - and as far as I understand one cannot cannot compute exact
confidence intervals for binomial distributions. The approximations I have seen
are also too complex to be used in viking4. There are two factors that makes
this less of a problem. First, in all interesting positions the score is about
50% of winning/losing which is in the range where the normal distribution
assumption is most accurate. Second, the second parameter (which one should
set by thorough testing) makes the assumption more valid the higher it is.
Nethertheless I think I have no real good solution for the cases where the
score is close to 0 or 100% where any method of computing confidence intervals
break down (I once played around with the bootstrap method of computing
confidence intervals from binomial distributions I generated myself and it
gets really weird when p is close to 0 or 1). In the end of the game this
become important because then a game is either won or lost, but there might be
some tactical traps that has to be avoided with little time left to search and
I think viking4 has lost games because it does handle these things correctly
and/or efficently. This is an area where I would be thankful if someone bright
could come up with some deeper understanding that is easy to implement.
I started out my Monte-Carlo project building move trees in memory from the
simulations as discussed by several others. I think I was quite successful in
this but not as successful as Crazystone is.
Since I had experimented with alpha beta earlier I realized it would be easy for
me to reuse the monte carlo program in an alpha beta search. I think that alpha
beta/iterative deepening with lazy evaluation and good move ordering is better
than building trees in memory. The reason is that random search will search a
lot of games that alpha beta will simply cut off. But doing comparisons is not
easy since alpha beta can be mean something between perfect move ordering and
random move ordering and any implementation is somewhere in between. I think
Viking4 has a reasonable good move ordering, and the efficiency of lazy
evaluation depend on move ordering so if you try lazy evaluation and it does
not work you might need to improve the move ordering. Some ways of measuring
the efficiency of lazy evaluation is to count the number of simulations saved,
and also compare the number of evals per second with and without lazy
evaluations.
Best
MP
--
Magnus Persson, 2 Dan
Zapp at KGS
Author of the go program Viking
More information about the computer-go
mailing list