[computer-go] Monte Carlo combined with minimax search

Peter Drake drake at lclark.edu
Sat Jul 22 15:53:47 PDT 2006


On Jul 22, 2006, at 6:53 AM, Chaslot G (MICC) wrote:

> In this context, I propose one algorithm for move-selection and for  
> back-up operation. It can be found in this paper:
>
> http://www.cs.unimaas.nl/g.chaslot/mcscg.pdf

Thanks -- I've printed it out and will have a look.
> Back up operation:
>
> Remi proposed a rule that makes some transition from the mean value  
> to the max. In this rule the weight of the mean only depends on the  
> number of simulations made.
> Indeed, it is clear that in the beginning the backed-up value  
> should be the mean, and that after a certain time, it should be the  
> max value.
>
> However, the number of simulations made is not always relevant to  
> build the transition from one to the other.
Yes.  My current cheap kludge is to use a * best + (1-a) * average,  
where a is a confidence measure between 0 and 1.  It is currently  
computed as the size (in nodes) of the smallest subtree divided by  
100 * the estimated number of moves left.  (This is capped at 1)   
Thus, if every subtree has been searched by roughly 100 games,  
confidence is 1.

Remember that I'm keeping the entire tree.  If that memory use  
doesn't appall you, maybe this will:  I'm backing up not just  
estimated scores, but estimated ownership of each point on the  
board.  I can sum these to get scores, but I also use them to  
generate move probabilities.  When making a random move, the  
probability of playing at point p is 1 / (own(p)^2), where own(p)  
ranges from -1 (strongly controlled by white) to 1 (strongly  
controlled by black).  Thus "border" points are tried more often.

This "knowledge-free" program plays pretty well on 6x6.  It opens on  
a center point and seems to have a sense of what's urgent.  Since I  
keep the relevant part of the tree from turn to turn, I can reuse  
some of my previous work.  Often the program has spent a third of its  
search nodes on the move it played, and half of those on my response  
-- in some sense it understands forcing moves.

I'm trying to beat JacquelineGo on 9x9, and my program is suffering  
from its execrable opening.  After that, though, it seems to know  
what's "important", although there are of course still many blunders.

One thing I'm going to try next is counting the score not as sum(own 
(p)) but as sum(signum(own(p))).  This way, it would be more  
interested in scoring points than in strengthening its hold on points  
it already has.  Of course, ideally there would be a continuum  
between these two, where the program would become more conservative  
when it was ahead.

> Another example is that if there are some very bad moves, they will  
> low down the mean value. Then it would be better to compute a  
> “mean” which considers only the good moves.
The signum approach just mentioned might fix this.


Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20060722/e0d0be2a/attachment.htm


More information about the computer-go mailing list