[computer-go] Thoughts about UCT (long with computer Go content)

Magnus Persson magnus.persson at phmp.se
Mon Aug 7 10:19:03 PDT 2006


I have just made a simple implementation of UCT, which Crazystone and the MoGo
uses with great success on boards up to 13x13 (I do not know how they play on
19x19).

UCT is a simple method for building search trees with MonteCarlo simulations.
Each node in the tree remembers the number of wins and number of visits. A
simulation starts at the root and finds a way down to a leaf where it picks a
move that has not been played yet and adds a new node in the tree. From there
on the simulation proceeds with random play without storing the moves, and the
nodes that were visited in the tree are updated with alternating wins and
losses.

The problem with building the trees is the difficult tradeoff between exploring
alternative moves and the current best moves and UCT appears to be the so far
best of doing this in a well balanced way.

I have not done any thorough testing of the new program yet, but I played a few
games on 7x7 and 9x9. The code for the simulations is what I have used for
Valkyria rated 1322 on CGOS (Valkyria uses elementary alphabeta search with
MC-evaluation). It appears to play really nice for a program that played its
first game.

Today I have been thinking about why it works so well and here is my
speculation. My first Monte Carlo project 2 years ago built trees as described
above but I could not come up with a good way of backing up the score and
select the best move. It turned out that backed up scores from all subtrees in
many positions oscillated violently and sometime a good move could be forgotten
completely if its score went down a lot and at least one other move had much
higher score than it at any given time.

Then I turned to alpha-beta search which uses iterative deepening where this is
not a problem anymore (Viking5). The score still varies a lot at different
search depths but since moves are compared at the same depth this is less of a
problem. There is no risk that a good move is forgotten.

UCT works well by selecting moves with the highest average win rate for the
entire search history, ignoring minmax of the children of the node. Rémi Coulom
wrote here earlier that he could not find a better backup scheme. This was odd
to me because he had great success with good backup schemes earlier.

I think here that complete search strategy described above solves the problem of
oscillating minmax win rates, by selecting moves that is robustly good. That is
they are not perhaps the best move but they are moves that are clearly better
than average and has had good win rate during most of the total time spent
searching.

To the observer it may seem that a program playing like this misses a lot of
opportunities. But in the defense of the program I would say it plays wisely,
it does not gamble with moves it is unable to analyze to a stable position.

So with UCT a program sticks to "slightly slow" moves that are often not the
best move but it rarely makes big blunders.

A minmax program of the kind I once tried to program will play a mix of
brilliant tactical moves and complete disasters.

If I am correct UCT programs may appears to play weak in test positions, but
still be able to beat programs that focused on solving such positions.

What makes this interesting is that if UCT makes a program stronger than alpha
beta search using equivalent evaluation, then it may not be so because it
searches more efficiently but because it has a different "philosophy" of
selecting the best move, that may be particular useful for Go.

/MP

-- 
Magnus Persson, 2 Dan
Zapp at KGS
Author of the go program Viking


More information about the computer-go mailing list