[computer-go] An attempt to explain UCT for computer Go

Don Dailey drd at mit.edu
Thu Sep 21 11:48:38 PDT 2006


I'm trying something a bit different than UCT as you describe which gave
me better results in self play tests.   I achieved about 59% results
with my method over about 200 games.   I'm redoing the test to get a
bigger sample and to incorporate some other improvements. 

>From a given node,  I play the move that maximizes (r + f()) / (c + f())

  r is the sum of the results from this move where loss = -1, win = 1
  c is the number of times this move was played
  f() is a function calculated as  6 + pow(t, ALPHA) 
         t is the number of simulations at this node
     ALPHA is a constant where empirically 0.20 works very well.
         the constant 6 is arbitrary, perhaps not even needed.

Perhaps the constant 6 admits that the shape of my probability curve is
not 100% correct?

Simply put,  for selection purposes each move gets some number of wins
credited to it which is a function of how many times the parent node was
seen.   I think it has the same characteristics of UCT as you describe.
It tries just about everything equally at first but is guaranteed to
occasionally sample every move, no matter how weak the move is and f()
grows quickly at first then more slowly.   

I've made some other improvements to Lazarus,  so I'm redoing the match
with everything the same except UTC as Magnus just described and what I
just described above.  I'll report next week on the results, the test
runs pretty slowly and it will take a few days to get a lot of
results.   

I suspect there are a large number of ways to select the "right" move
probabilistically and it's probably a matter of tweaking to make many of
them work well.

- Don


On Thu, 2006-09-21 at 19:52 +0200, Magnus Persson wrote:
> Quoting Arthur Cater <arthur.cater at ucd.ie>:
> 
> > What is "UCT-search", where can I find out about it?
> 
> 
> You should also read this post to this list:
> 
> http://computer-go.org/pipermail/computer-go/2006-July/005737.html
> 
> This is a very short description of how Crazystone uses UCT. The paper others
> already mentioned presents UCT applied in a more general way without 
> references
> to go.
> 
> The author Rémi Coulom discovered that you have to change a constant, 
> because in
> game tree search the winrates measured at each node oscillate a lot, 
> which means
> that one must explore alternatives much more than for applications 
> where you do
> not have to take an opponent into account (who actively tries to lower the
> winrate).
> 
> It might be a little hard to follow so here is an attempt to present UCT in a
> more programmer friendly fashion and less mathematical. Thus I do not follow
> the exact terminology of those others, and I might in fact have misunderstood
> something, but nethertheless it works).
> 
> Basically the starting point is a program that plays random games and 
> evaluates
> the position where no more moves can be played and it searches for moves that
> has the highest chance of winning. To do so it  builds a search tree in memory
> by expanding it with one node for every simulation played. In each node the
> Winrate is updated (Wins/Visits). A naive way of searching would be to select
> the node with the highest Winrate (a practical problem is to keep track of
> black and white in alternating nodes amd assignes Wins and Losses correctly).
> This is what UCT does most of the time, but it also adds a number to the
> Winrate that goes down everytime the node has been visited. The number goes up
> a little every time the parent was visited and this means that at some point
> the WinRate + the value will become higher than for the best node how 
> will have
> a value that goes down everytime it is visited.
> 
> So here is how I calculate the values to pick the next node/move in the tree
> (moves are only random after the tree has been expanded with a new node.) If
> there are children to a node that has not been visited then one of those are
> picked at random and no UCTvalues has to be computed.
> 
> The UCTvalue for each candidate move is computed with
> 
> Winrate + sqrt((2*ln(ParentVisits))/(10*NodeVisits))
> 
> and the candidate move with highest UCTvalue is selected to be played next.
> 
> With UCT one can strike a good balance between searching the best move so far
> and exploring alternative moves. I think the innovation of UCT (to me 
> at least)
> is the logarithm of the number of visits to the parent node for the UCT-Value.
> This value goes up quickly early but then it becomes flat, but never stops
> rising. It goes to infinity although very very slowly. All moves will be
> searched no matter how bad winrates they have, so that UCT-search given close
> to infinite time and memory will find any winning no matter how misleading the
> winrates are initially.
> 
> Note that I wrote "a winning move". It will not pick the "best" move 
> becuase the
> algorithm does not backup score in a traditional way and there is no way of
> defining what best here would be. It rather picks a move that has a high
> winrate early in search and cannot be refuted as being a losing move. This is
> very evident on 7x7 where it often searches very deep in winning 
> positions, but
> the winning sequence is often a little odd to me but still most of time good
> enough to win with 0.5 points. The opponents losing move however are searched
> uniformly so the search approaches the ideal "1 x W x 1 x W x 1 x W x ..."
> search tree.
> 
> My first searching MC-program searched in a completely opposite way. It
> aggressively pruned bad moves in the tree and computed new Min/Max Winrates
> based on what was left in the tree. As a result the Winrates at each node
> oscillated so bad that sometimes the best move was pruned just because by
> chance all other candidates happened to peak at the same time as the best move
> was at the bottom and my attempts to fix that was really awful.
> 
> UCT search is different. Nothing is forgotten so the winrates change slowly as
> soon as the number of simulations has grown large.
> 
> -Magnus
> 
> 
> 
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/



More information about the computer-go mailing list