[Computer-go] some UCT notes
jacques at dybot.com
Tue Oct 25 04:30:52 PDT 2011
What you wrote sounds like you re-discovered the importance of progressive
widening PW ;-)
(See: Coulom, Computing Elo Ratings of Move Patterns in the Game of Go,
Widening of the Monte-Carlo Search Tree)
In 19x19 when I implemented (about 1 year ago) RAVE a progressive widening I
had the first
wins against gnugo in 19x19 (I had some 9x9 wins before). But the reason PW
is so good is
somewhat different when you combine it with RAVE:
When the node has few visits, you only explore (say) 3 moves and those moves
are the 3 best
moves according to some a-priori heuristic, but when you widen the tree, you
do NOT include
the 4th move according to the same criterion, but the best non-explored RAVE
3 nodes are considered for UCT (in the beginning, of course) but ALL nodes
get RAVE updates.
And these RAVE updates are specific for the path in the tree leading to the
node. So all non
explored nodes get high quality RAVE information and when you widen the tree
candidate is a good move for whole board position represented by the node.
The way I implemented PW for the first time is the formula by Hiroshi
(I copy/paste from my notes. It is somewhere in the list.)
(1 - beta) * (win_rate + 0.31 * sqrt( ln(parent_visits) / child_visits)) +
beta (rave_win_rate * 0.31 * sqrt(
ln(rave_parent_visits) / rave_child_visits))
beta = sqrt(100 / (3 * child_visits + 100));
Aya uses Progressive Widening. High order N moves are only considerd.
PW_sort_N = ln(parent_visits/ 40.0) / ln(1.4) +2;
Moves are sorted sometimes by rave value, Criticality, and MC owners.
I also would like to know how to count rave.
UCT searches B(E5),W(D3),B(C5),W(F7), and in this position, playout searches
B(E7),W(E8),B(D8),W(F8),B(D7).. Black win.
In W(D3) positions, Aya updates RAVE and UCT,
I think "Updates C5(RAVE)" is strange, but I could not get good result
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go