[computer-go] Monte Carlo combined with minimax search
Rémi Coulom
Remi.Coulom at univ-lille3.fr
Tue Jul 18 10:22:31 PDT 2006
Peter Drake wrote:
> I've been working with Monte Carlo Go lately. An idea has sprung to
> mind, and I want to make sure it's original and not something I read in
> a paper. (It seems similar to some things Bouzy has done, but I can't
> find anything specific.)
>
> Here's the plan:
>
> - Maintain a large minimax search tree in memory. (For reasons that
> will become clear, we can't use alpha-beta pruning. I think we could
> use a transposition table, making the tree a DAG.)
>
> - At each node in the tree, there is a weight for each possible move.
> Initially, all of these weights are set to some neutral value, say, 0.5.
>
> - Perform a single Monte Carlo run, making random moves to the end of
> the game. At each node along this path to the end of the game, back up
> the resulting score. Also, adjust the weights for the moves made on
> that path. Thus, if a move led to a good outcome, it is more likely to
> be played again. (The change in weights has to be fairly gradual so
> that a good move is not eliminated by one or two bad runs.)
>
> - Keep performing runs until time (or some other resource) runs out.
> The moves in these runs are random, but favor higher-weight moves.
>
> Has this been done before? Is it flawed in some obvious way?
Hi
Your idea seems very similar to that of Crazy Stone:
http://remi.coulom.free.fr/CG2006/
References 12, 19 and 22 in my paper also describe similar algorithms.
Also a new very interesting paper regarding this idea is:
http://zaphod.aml.sztaki.hu/papers/ecml06.pdf
MoGo, the new program by Yizao Wang that participated in the previous
KGS tournament, uses this UCT algorithm, and seems to perform rather
well despite its poor-quality random simulations. I am currently running
experiments with UCT in Crazy Stone, and results are interesting.
Rémi
More information about the computer-go
mailing list