[computer-go] Monte Carlo combined with minimax search

Rémi Coulom Remi.Coulom at univ-lille3.fr
Tue Jul 18 10:22:31 PDT 2006


Peter Drake wrote:
> I've been working with Monte Carlo Go lately.  An idea has sprung to 
> mind, and I want to make sure it's original and not something I read in 
> a paper.  (It seems similar to some things Bouzy has done, but I can't 
> find anything specific.)
> 
> Here's the plan:
> 
> - Maintain a large minimax search tree in memory.  (For reasons that 
> will become clear, we can't use alpha-beta pruning.  I think we could 
> use a transposition table, making the tree a DAG.)
> 
> - At each node in the tree, there is a weight for each possible move.  
> Initially, all of these weights are set to some neutral value, say, 0.5.
> 
> - Perform a single Monte Carlo run, making random moves to the end of 
> the game.  At each node along this path to the end of the game, back up 
> the resulting score.  Also, adjust the weights for the moves made on 
> that path.  Thus, if a move led to a good outcome, it is more likely to 
> be played again.  (The change in weights has to be fairly gradual so 
> that a good move is not eliminated by one or two bad runs.)
> 
> - Keep performing runs until time (or some other resource) runs out.  
> The moves in these runs are random, but favor higher-weight moves.
> 
> Has this been done before?  Is it flawed in some obvious way?

Hi

Your idea seems very similar to that of Crazy Stone:
http://remi.coulom.free.fr/CG2006/
References 12, 19 and 22 in my paper also describe similar algorithms.

Also a new very interesting paper regarding this idea is:
http://zaphod.aml.sztaki.hu/papers/ecml06.pdf

MoGo, the new program by Yizao Wang that participated in the previous 
KGS tournament, uses this UCT algorithm, and seems to perform rather 
well despite its poor-quality random simulations. I am currently running 
experiments with UCT in Crazy Stone, and results are interesting.

Rémi


More information about the computer-go mailing list