[computer-go] Lazy Evaluation in Monte Carlo/Alpha Beta Search for Viking4

Rémi Coulom Remi.Coulom at univ-lille3.fr
Sun Jul 23 06:26:03 PDT 2006


Magnus Persson wrote:
> A probabilistic version of lazy evaluation is possible for monte carlo
> simulation go programs.

Thanks for sharing your ideas.

> b) Early on I had too low values for the minimum samples allowed, and then the
> program often made blunders because of lazy evaluation although the speedup was
> amazing. With a minimum sample of 40 simulations the speedup is moderate and the
> errors few.

Instead of setting a fixed minimun for the number of simulations, it 
might be more efficient to add virtual wins to the lazy evaluation when 
deciding to be lazy. That is to say, if winning is +1 and losing -1, S 
the total score, N the number of simulations, estimate the lazy value as 
(S+V)/(N+V) instead of S/N, where V is a constant to be tuned. Of 
course, you should still use S/N as the value of the move if it turns 
out it is superior to alpha.

> 
> c) The 95%-confidence interval is computed as if the score (probability of
> winning) was normally distributed. But is is not. It is actually binomially
> distributed - and as far as I understand one cannot cannot compute exact
> confidence intervals for binomial distributions. The approximations I have seen
> are also too complex to be used in viking4. There are two factors that makes
> this less of a problem. First, in all interesting positions the score is about
> 50% of winning/losing which is in the range where the normal distribution
> assumption is most accurate. Second, the second parameter  (which one should
> set by thorough testing) makes the assumption more valid the higher it is.
> Nethertheless I think I have no real good solution for the cases where the
> score is close to 0 or 100% where any method of computing confidence intervals
> break down (I once played around with the bootstrap method of computing
> confidence intervals from binomial  distributions I generated myself and it
> gets really weird when p is close to 0 or 1). In the end of the game this
> become important because then a game is either won or lost, but there might be
> some tactical traps that has to be avoided with little time left to search and
> I think viking4  has lost games because it does handle these things correctly
> and/or efficently. This is an area where I would be thankful if someone bright
> could come up with some deeper understanding that is easy to implement.

I don't know whether it is one of the approximations you have seen, but 
I once bookmarked this page about confidence intervals for binomial 
distributions:
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
These confidence bounds are rather simple, and should cure your problem.

> 
> I started out my Monte-Carlo project building move trees in memory from the
> simulations as discussed by several others. I think I was quite successful in
> this but not as successful as Crazystone is.

I think this is a really big question of Monte-Carlo tree search: does 
alpha-beta work better than min-max?

I tend to be a believer in min-max. I love having an algorithm that is 
anytime. Also it is very clever at following long lines of forced moves 
deeper than the rest, which is really efficient for ko fights. But I 
might be wrong.

Anyway, I am certain that neither of the two approaches will produce a 
super-strong player. Global tree search simply does not work. We have to 
find something else.

Rémi


More information about the computer-go mailing list