[computer-go] Depth dependent evaluation effects on monte carlo searches

Jason House jason.james.house at gmail.com
Sat Jun 9 17:08:46 PDT 2007


Magnus Persson wrote:
> Quoting Jason House <jason.james.house at gmail.com>:
>
>> Does anyone have any data on just how optimistic or pessimistic the 
>> results
>> would be?  I'd like to use some heuristics that inherit winning 
>> percentages
>> from a parent node to bias the expected winning percentage of the 
>> children
>> nodes... and maybe pruning away portions of a search in a fixed depth 
>> monte
>> carlo search with iterative deepening.
>
> This is confusing to me. Are you asking how UCT behaves in order to 
> implement
> something which is not UCT?

I'd say that I want to understand the nature of monte carlo 
simulations... not specifically UCT.  You are absolutely correct that 
I'm trying to cook up something which is not UCT.


> Anyway, UCT scores has the property that the score of a node changes 
> very slowly
> when it is searched deep. 

For my theoretical analysis, one unfortunate property of UCT is that 
many playouts from a set starting position means that the search tree 
gets expanded and that the measured winning rate is some kind of average 
of elements deeper in the tree.  My original post asked about "monte 
carlo" in an attempt to avoid that issue. 

> But does this mean that sibling scores can be compared
> to each other? I would hesitate here because the scores change as a 
> function of
> the search. In difficult positions  the scores for all good moves are 
> often
> very similar. The scores often move up or down slowly together as 
> function of if
> the position is good or not.

Actually, "the scores moving up or down slowly together as function of 
if the position is good or not" is exactly the type of thing I'm trying 
to gauge.  Essentially, I'd like to predict the performance of an 
unexplored branch based on available data.  Not an attempt to prune it 
away, but rather to figure out when exploration of those branches should 
occur.

> I attach an sgf file where i added the principal variation of Valkyria 
> after the
> black opening move of the center point using 5 minutes of thinking 
> time. For
> each move I give the winning percentage and the number of playouts 
> that passed
> this node. For the second move of black I also give the winning 
> percentage of
> all siblings. The imortant thing here is that many of those move are only
> searched 1000 times whereas the few best move were searched 1000000 
> times, but
> there is still not much of a difference in the actual scores.

Thanks!  Taking a quick look, I see the following winning percentages 
for the principle moves:
B: 54.3%
W:         46%
B: 54.6%
W:         46.3%
B: 54.7%
W:         47.6%
B: 54.9%
W:         47.0%

Looking at a single color, the winning percentage seems to shift by 0.2 
to 0.4%... About what I'd expect to see.  What confuses me though is how 
to interpret the jump back and forth as the color changes (about 8%).  
Are the percentages always the winning percentage for black?  Or is it 
the winning percentage for the color to move?

If it's the winning percentage for the color to move, it seems really 
strange that it'd go up for both colors as the principle variation went on.

If it's the winning percentage for black, why does it vary so drastically?


More information about the computer-go mailing list