[computer-go] Depth dependent evaluation effects on monte carlo searches

Rémi Coulom Remi.Coulom at univ-lille3.fr
Sat Jun 9 04:29:43 PDT 2007


Jason House wrote:
> Darren Cook wrote:
>> Hi Jason,
>> In UCT the monte carlo searches (I find it clearer to call them "the
>> playouts") are always run to the end of the game. So they always
>> accurately (well, as accurate as a random playout can be!) take sente in
>> account. Therefore my understanding is that it does not matter whether
>> they start at an even or odd ply in the UCT part of the tree.
>>
>> Darren
>>   
> 
> Let's take a basic example of a leaf node in an MC search tree that 
> hasn't been expanded, but has 4 children.  Let's say that random 
> simulation through the children have winning percentages of {46%, 51%, 
> 47%, 48%}.  Assuming a uniform simulation policy, the winning percentage 
> would be the average of the four, or 48%... but when that node gets 
> expanded, it'll start discovering the winning percentages of the 
> children nodes.  Now when we look at the node that used to be our leaf 
> node and ask what it's winning percentage would be, we come to a 
> different conclusion...  The winning percentage is either 46% or 51% 
> depending on if the color to move.
> 
> In this example, the difference between the 0-ply and the 1-play is 3% 
> (51%-48%) or 2% (48%-46%) depending on what the color to move was.  Does 
> that help?

If I understand what you mean, backing-up the average of playouts from a 
position has a tendency to under-estimate that position from the point 
of view of the player to move, because if the playout would start by the 
best move instead of a random move, they would produce a better value. 
You fear that it may be a problem when evaluating positions at different 
depth parities. But it is not.

The consequence of this bias in UCT, is that when a move has been 
searched for only a few simulations, it tends to be over-evaluated. This 
is not a big problem, because if it is over-evaluated, then it means 
that it will be searched, and the over-evaluation will be corrected 
eventually.

Over-estimating a move is not a big problem in UCT, but under-estimating 
it would have very bad consequence. So, in practice it is better to have 
optimistic move evaluations, especially when they have not been searched 
a lot. I had not understood this when I wrote my Turin paper, where I 
tried to get unbiased estimates of position values. UCT works better 
because it gives optimistic move values.

If you think hard about it, you'll find that the bias in UCT evaluation 
helps the search, even though leaves at are different depths. Doing 
alpha-beta with Monte-Carlo evaluations at the leaves at different 
depths would fail badly (I vaguely remember Bruno Bouzy may have 
reported this phenomenon, already). But alpha-beta's back-up is min-max. 
With the smooth average backup of UCT, everything works very well.

Rémi


More information about the computer-go mailing list