[computer-go] Lazy Evaluation in Monte Carlo/Alpha Beta Search
for Viking4
Rémi Coulom
Remi.Coulom at univ-lille3.fr
Sun Jul 23 06:26:03 PDT 2006
Magnus Persson wrote:
> A probabilistic version of lazy evaluation is possible for monte carlo
> simulation go programs.
Thanks for sharing your ideas.
> b) Early on I had too low values for the minimum samples allowed, and then the
> program often made blunders because of lazy evaluation although the speedup was
> amazing. With a minimum sample of 40 simulations the speedup is moderate and the
> errors few.
Instead of setting a fixed minimun for the number of simulations, it
might be more efficient to add virtual wins to the lazy evaluation when
deciding to be lazy. That is to say, if winning is +1 and losing -1, S
the total score, N the number of simulations, estimate the lazy value as
(S+V)/(N+V) instead of S/N, where V is a constant to be tuned. Of
course, you should still use S/N as the value of the move if it turns
out it is superior to alpha.
>
> c) The 95%-confidence interval is computed as if the score (probability of
> winning) was normally distributed. But is is not. It is actually binomially
> distributed - and as far as I understand one cannot cannot compute exact
> confidence intervals for binomial distributions. The approximations I have seen
> are also too complex to be used in viking4. There are two factors that makes
> this less of a problem. First, in all interesting positions the score is about
> 50% of winning/losing which is in the range where the normal distribution
> assumption is most accurate. Second, the second parameter (which one should
> set by thorough testing) makes the assumption more valid the higher it is.
> Nethertheless I think I have no real good solution for the cases where the
> score is close to 0 or 100% where any method of computing confidence intervals
> break down (I once played around with the bootstrap method of computing
> confidence intervals from binomial distributions I generated myself and it
> gets really weird when p is close to 0 or 1). In the end of the game this
> become important because then a game is either won or lost, but there might be
> some tactical traps that has to be avoided with little time left to search and
> I think viking4 has lost games because it does handle these things correctly
> and/or efficently. This is an area where I would be thankful if someone bright
> could come up with some deeper understanding that is easy to implement.
I don't know whether it is one of the approximations you have seen, but
I once bookmarked this page about confidence intervals for binomial
distributions:
http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
These confidence bounds are rather simple, and should cure your problem.
>
> I started out my Monte-Carlo project building move trees in memory from the
> simulations as discussed by several others. I think I was quite successful in
> this but not as successful as Crazystone is.
I think this is a really big question of Monte-Carlo tree search: does
alpha-beta work better than min-max?
I tend to be a believer in min-max. I love having an algorithm that is
anytime. Also it is very clever at following long lines of forced moves
deeper than the rest, which is really efficient for ko fights. But I
might be wrong.
Anyway, I am certain that neither of the two approaches will produce a
super-strong player. Global tree search simply does not work. We have to
find something else.
Rémi
More information about the computer-go
mailing list