[computer-go] Depth dependent evaluation effects on monte carlo
searches
Rémi Coulom
Remi.Coulom at univ-lille3.fr
Sat Jun 9 04:29:43 PDT 2007
Jason House wrote:
> Darren Cook wrote:
>> Hi Jason,
>> In UCT the monte carlo searches (I find it clearer to call them "the
>> playouts") are always run to the end of the game. So they always
>> accurately (well, as accurate as a random playout can be!) take sente in
>> account. Therefore my understanding is that it does not matter whether
>> they start at an even or odd ply in the UCT part of the tree.
>>
>> Darren
>>
>
> Let's take a basic example of a leaf node in an MC search tree that
> hasn't been expanded, but has 4 children. Let's say that random
> simulation through the children have winning percentages of {46%, 51%,
> 47%, 48%}. Assuming a uniform simulation policy, the winning percentage
> would be the average of the four, or 48%... but when that node gets
> expanded, it'll start discovering the winning percentages of the
> children nodes. Now when we look at the node that used to be our leaf
> node and ask what it's winning percentage would be, we come to a
> different conclusion... The winning percentage is either 46% or 51%
> depending on if the color to move.
>
> In this example, the difference between the 0-ply and the 1-play is 3%
> (51%-48%) or 2% (48%-46%) depending on what the color to move was. Does
> that help?
If I understand what you mean, backing-up the average of playouts from a
position has a tendency to under-estimate that position from the point
of view of the player to move, because if the playout would start by the
best move instead of a random move, they would produce a better value.
You fear that it may be a problem when evaluating positions at different
depth parities. But it is not.
The consequence of this bias in UCT, is that when a move has been
searched for only a few simulations, it tends to be over-evaluated. This
is not a big problem, because if it is over-evaluated, then it means
that it will be searched, and the over-evaluation will be corrected
eventually.
Over-estimating a move is not a big problem in UCT, but under-estimating
it would have very bad consequence. So, in practice it is better to have
optimistic move evaluations, especially when they have not been searched
a lot. I had not understood this when I wrote my Turin paper, where I
tried to get unbiased estimates of position values. UCT works better
because it gives optimistic move values.
If you think hard about it, you'll find that the bias in UCT evaluation
helps the search, even though leaves at are different depths. Doing
alpha-beta with Monte-Carlo evaluations at the leaves at different
depths would fail badly (I vaguely remember Bruno Bouzy may have
reported this phenomenon, already). But alpha-beta's back-up is min-max.
With the smooth average backup of UCT, everything works very well.
Rémi
More information about the computer-go
mailing list