[computer-go] Proposed UCT / transposition table implementation

Peter Drake drake at lclark.edu
Mon Dec 4 09:59:49 PST 2006


I've noticed that Orego has done very poorly in the last couple of  
competitions. This is partly due to the improvements in others'  
programs, but I think the fact that Orego currently doesn't have a  
transposition table is crippling. Since I'm rewriting this stuff in  
Java, I'm thinking about how to handle this, and would like to offer  
up the following plan for all of you to poke holes in. I'm not sure  
if I'm reinventing the CrazyStone structure here, but I think this  
one is slightly different.

DATA STRUCTURE

The "tree" is a enormous, preallocated array of Node objects.  
(Because this array is also being used as a hash table, the size of  
the table should be a prime number.) This is really a directed  
acyclic graph (DAG) rather than a tree, because a Node may be a child  
of more than one parent.

Each Node contains the following information:

A+1 "pointers" (in some sense) to child Nodes, where A is the area of  
the board. (The extra pointer is for the pass move.) Some of these  
pointers (initially, all of them) may be the special value UNEXPLORED.

The number of runs through this Node so far.

The number black wins through this Node so far.

A Zobrist hash of the position represented by this Node. This must  
incorporate, in addition to the locations of stones on the board, the  
turn number (e.g., zero for the beginning of the game) and the number  
of consecutive passes.

The turn number, stored explicitly.

A forced leaf turn number, to be explained below.

(Note that, in contrast to the current Orego structure, we won't need  
tails or a free list for this one.)

GENERATING A MONTE CARLO RUN

I want to separate the process of generating a Monte Carlo run from  
the process of modifying the tree for two reasons. First, this will  
make multithreading easier. Second, I will be able to incorporate  
recorded games into the tree by pretending that the program had  
played them.

To generate a Monte Carlo run, I start at the root Node. Each move is  
chosen using UCT, based on UCB1-TUNED as described on p. 5 of the  
recent Gelly et al tech report on MoGo. As in the current Orego, this  
is done with a double-hashing scheme and in a way that always chooses  
an untried move if one exists.

There is one complication here. The number of runs through the  
children might exceed the number of runs through the parent because  
of transpositions. In Gelly's notation, this means Tj(n) may exceed  
n. I THINK I can ignore this, as it will just give "oversampled"  
moves unusually narrow confidence windows.

If we are not at a Node (because we're in an unexplored region of the  
search space), if the current Node has a forced leaf turn number  
greater than or equal to the root turn number, or if any child is  
found that is not at the correct turn number (because of a hash  
collision), the move is chosen randomly.

INCORPORATING A GAME INTO THE TREE

Again, we start at the root Node. We work down the tree, updating the  
run and black win count of each node encountered. If the child  
pointer we would be following is UNEXPLORED, we use the  
aforementioned Zobrist hash (modulo the table size) to find a Node to  
use as the child. This may be a Node that has already been  
initialized, a "fresh" Node (i.e., one that we can overwrite), or a  
Node that we can't overwrite.

If the child has the correct turn number, either we have been down  
this path before or we have found a transposition. In either case,  
continue down the tree.

If the child has a turn number that is lower than that of the root,  
it is an old node and can be overwritten. We reinitialize this one  
node, update its run and black win counts, and stop. Thus, every run  
adds at most one node to the tree.

If the offending child is not so old that it can be overwritten, we  
must leave it alone lest it mess up another part of the tree. At this  
point, the parent's forced leaf turn number is set to the child's  
turn number; all moves from that parent will be random until a later  
point in the game when it is safe to overwrite the offending child.


I welcome your input,

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/





More information about the computer-go mailing list