[computer-go] Lazy Evaluation in Monte Carlo/Alpha Beta Search
for Viking4
Don Dailey
drd at mit.edu
Sun Jul 23 06:46:37 PDT 2006
I wasn't very clear about one of these points (and Remi just posted a
clearer explanation of the same technique I use.)
In point 3, I should have said: If EITHER bound is outside the
alpha/beta window you should stop the evaluator.
- Don
On Sun, 2006-07-23 at 09:21 -0400, Don Dailey wrote:
> Magnus,
>
> I have a pure alpha/beta searcher that uses monte/carlo as an evaluator
> and I have been doing lazy evaluation for a long time. Here is how I
> control it:
>
> 1. I have a single parameter I call FUDGE
>
> 2. Let's say I am doing 1000 simulations:
> I tally up wins (-1 or 1 depending on win or loss)
>
> I tally up games as I go.
>
> I compute an upper bound by pretending that I played an
> additional FUDGE games and won them all.
>
> I computer a lower bound by pretending I LOST an additional
> FUDGE games.
>
> 3. If the 2 bounds are outside (or equal to) alpha and beta, I
> can stop evaluating.
>
> This is a huge speedup. The single parameter FUDGE controls how
> "lazy" I want to be. I lot of testing helps determine how to set FUDGE
> to the correct values. The intuitions behind this is if I can lose N
> games in a row and still get a beta cutoff, I am probably wasting my
> time and should stop.
>
> 2 more general principles I discovered:
>
> 1. Less simulations doesn't always speed up the search because if
> the evaluation isn't relatively stable, your move ordering
> heuristics suffer. Same with more agressive FUDGE.
>
> 2. It appears that you can "get away" with fewer simulations at
> deeper depths.
>
> 3. If you "cache" the evaluations, you can do multiple pass
> iterations to good advantage (easier time-control management.)
> For instance if you have time left over, but not enough to do
> a complete iteration, you can do the same iteration over with
> a higher "quality" evaluation function.
>
> - Don
>
>
>
>
>
> On Sun, 2006-07-23 at 13:30 +0200, Magnus Persson wrote:
> > Hi!
> >
> > I once had a plan to write a paper about a simple but effective improvement I
> > use in Viking4, but I have been lazy and is to busy doing other stuff so I
> > think I just write it down here. It might also be the case other people already
> > do this since it is such a simple idea. Also it may be possible to improve it,
> > and then I would be thankful if you reported your insights here.
> >
> > Viking4 uses Monte Carlo simulations as en evaluation function. A lot of the
> > playing strength comes from the simulations which are only random perhaps
> > 80-90% of the time but the idea presented here should work with any Monte Carlo
> > program using alpha beta search.
> >
> > Many chess programs uses lazy evaluation. The idea (not that I never wrote a
> > chess program myself) is that if one can compute a quick evaluation (using
> > material only for example) and has an upper or lower bound for how much the
> > advanced full (and slow) evaluation would change the score one can safely
> > terminate the evaluation process because it is not necessary to know the exact
> > score if it is lower than alpha or higher than beta.
> >
> > A probabilistic version of lazy evaluation is possible for monte carlo
> > simulation go programs. Assume for example that we make 1000 simulations for
> > one evaluation. The score is the probability of winning. After 100 simulations
> > we might have a low or very high score, either it is lower than alpha (so bad
> > we do not want to play it) or it is much higher than beta (so good that the
> > opponent has a better move to play earlier and we will never reach this
> > position and we can stop search - a beta cut off). If we now stop search we
> > have a speed up of 90% less evaluations!
> >
> > For Viking this method means about 1 ply deeper search on 9x9, depending on the
> > position. It is effective already at the first ply, since if the best move is
> > found early the remaining very bad moves will all be evaluated lazily. There is
> > a cost though. Because of sampling errors one cannot be sure that the score
> > (probability of winning) at 100 samples is reliable so lazy evaluation will
> > make make some errors compared to full evaluation at a given search depth. But
> > searching deeper is very beneficial in go, at least as deep as I have been able
> > to test viking4. For fixed thinking time I have solid evidence that the program
> > play better with lazy evaluation. Searching to fixed depth there is small
> > decrement in playing strength. But the tradeoff is a benefit for playing
> > strength of Viking.
> >
> > So far I just presented the idea. But the idea can probably be implemented in
> > many ways. Here follows more in detail how I do it.
> >
> > I have three parameters:
> > a) The number of maximum simulations used in full evaluation.
> > b) The minimum simulations necessary before a lazy evaluation is allowed to
> > terminate sampling. I use 40.
> > c) A 95% confidence interval (computed 1.96*STDEV as in basic statistics). If
> > the bounds of the lazy evaluation score is outside the alpha-beta interval then
> > evaluation stops. I treat the scores for alpha and beta as absolute values. The
> > 95% comes from systematic experimentation. Values less than 95% makes to many
> > mistakes and higher than 95% does not lead to a speedup. Note that the
> > parameters turned out to be what for example social scientists use as a rule of
> > thumb.
> >
> > Some comments about the parameters:
> >
> > a) The amount of simulations one need for a full evaluation depends on the
> > quality of the pseudo-random monte carlo simulation. A program that only avoids
> > filling its own eyes will only have small differences between the best and the
> > worst moves in any given position. Viking has a lot of blunder-avoidance go
> > knowledge so it does not need so many samples to separate a good move from a
> > bad. I have not tried this empirically but the speedup of lazy evaluation may
> > be a function of how good the evaluation is.
> >
> > b) Early on I had too low values for the minimum samples allowed, and then the
> > program often made blunders because of lazy evaluation although the speedup was
> > amazing. With a minimum sample of 40 simulations the speedup is moderate and the
> > errors few.
> >
> > c) The 95%-confidence interval is computed as if the score (probability of
> > winning) was normally distributed. But is is not. It is actually binomially
> > distributed - and as far as I understand one cannot cannot compute exact
> > confidence intervals for binomial distributions. The approximations I have seen
> > are also too complex to be used in viking4. There are two factors that makes
> > this less of a problem. First, in all interesting positions the score is about
> > 50% of winning/losing which is in the range where the normal distribution
> > assumption is most accurate. Second, the second parameter (which one should
> > set by thorough testing) makes the assumption more valid the higher it is.
> > Nethertheless I think I have no real good solution for the cases where the
> > score is close to 0 or 100% where any method of computing confidence intervals
> > break down (I once played around with the bootstrap method of computing
> > confidence intervals from binomial distributions I generated myself and it
> > gets really weird when p is close to 0 or 1). In the end of the game this
> > become important because then a game is either won or lost, but there might be
> > some tactical traps that has to be avoided with little time left to search and
> > I think viking4 has lost games because it does handle these things correctly
> > and/or efficently. This is an area where I would be thankful if someone bright
> > could come up with some deeper understanding that is easy to implement.
> >
> > I started out my Monte-Carlo project building move trees in memory from the
> > simulations as discussed by several others. I think I was quite successful in
> > this but not as successful as Crazystone is.
> >
> > Since I had experimented with alpha beta earlier I realized it would be easy for
> > me to reuse the monte carlo program in an alpha beta search. I think that alpha
> > beta/iterative deepening with lazy evaluation and good move ordering is better
> > than building trees in memory. The reason is that random search will search a
> > lot of games that alpha beta will simply cut off. But doing comparisons is not
> > easy since alpha beta can be mean something between perfect move ordering and
> > random move ordering and any implementation is somewhere in between. I think
> > Viking4 has a reasonable good move ordering, and the efficiency of lazy
> > evaluation depend on move ordering so if you try lazy evaluation and it does
> > not work you might need to improve the move ordering. Some ways of measuring
> > the efficiency of lazy evaluation is to count the number of simulations saved,
> > and also compare the number of evals per second with and without lazy
> > evaluations.
> >
> > Best
> > MP
> >
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
More information about the computer-go
mailing list