[computer-go] More UCT / Monte-Carlo questions

Magnus Persson magnus.persson at phmp.se
Tue Feb 5 09:47:30 PST 2008


I cannot comment on why Orego does things in a certain way, but you  
adress things I guess make everybody doing UCT/MC scratch their heads.

Quoting Mark Boon <tesujisoftware at gmail.com>:

>
> - It computes the UCT value in a completely different way. A comment
> in the code refers to a paper called "Modification of UCT with
> Patterns in Monte-Carlo Go". I haven't studied this yet, but whatever
> it does it apparently doesn't do wonders over the standard C * sqrt (
> (2*ln(N)) / (10*n) ) that I use.

Although MoGo may benefit from a modified UCT it does not mean that  
light playouts do. With light playouts one has to search wide to  
overcome strong bias in every position. With heavy playouts my theory  
right now is that one can go much more seletive. And it also depends  
on other tricks one is using. I am experimenting with Valkyria 3 right  
know where C is optimal as 0.001 right know (but it is biased to  
search wide initially by "assumed priors"-trick). In Valkyria 2 this  
constant is 0.8 and earlier when the playouts were lighter I think it  
was 1.8.

My guess is that what you are doing things right, and should be  
careful adding stuff without testing. But later on you might benefit  
from trying alternatives to UCT.

> - It only initialises the list of untried moves in the tree after a
> node had a minimum run-count of 81 (on 9x9). For the life of me I
> couldn't figure out what the effect of this was or what it actually
> does. I was wondering if this has an effect of what is counted as a
> 'run' but I'm not sure.

In Valkyria nodes are visited 7 times before expanding. Less and more  
than that makes the program weaker. I would like it to be more because  
it would save memory.

> Then I found a paragraph (4.2) in Remi Coulomn's paper about ELO
> raings in patterns. It briefly describes it as "As soon as a number  of
> simulations is equal to the number of points on the board, this  node
> is promoted to internal node, and pruning is applied." I can't  help
> feeling that the implementation in Orego is doing this. But I  can't
> figure out where it does any pruning or applying patterns of  any kind.
> Is there supposedly a general benefit to this even without  pruning or
> patterns? As stated before, at least it doesn't seem to  provide any
> benefit over my more primitive implementation. Maybe  Peter Drake or
> someone else familiar with Orego knows more about this?

I believe CrazyStone calculate "owner-at-end"statistics for each point  
on the board before applaying patterns and determine the move order  
for progressive widening. For the patterns it needs 64 runs to give  
priority to points on the board that is unstable (and with a bias  
towards defending own territory).

> Anyway, reading the same paragraph mentioned above again I was struck
> by another detail I thought surprising: after doing the required
> number of runs, the candidates are pruned to a certain number 'n'
> based on patterns. Does that mean from then on the win-ratio is
> ignored? What if the by far most successful move so far does not  match
> any pattern? Am I misunderstanding something here? The  paragraph is
> very brief and does not elaborate much detail.

There are two possible ways of implementing the tree. One is with  
lightweight nodes containg info only about the move and a pointer to  
all children. The other one is with supernodes that contains the  
information for all moves in a given position. In the former case the  
information for first runs through the node is lost (actually the  
children are random playout moves) in the latter case all information  
is recorded and one should not prune winning nodes. What I mean here  
is that there are two possible interpretations of what terminal node  
might mean, and if Orego does one thing and you something else you are  
bound to be confused.

In valkyria 3 I do move ordering as soon as a supernode is created.  
But as I wrote above move ordering might become even better if it is  
delayed about 50-100 runs if useful statistics can be collected from  
the playouts other than the winrate itself.


> On to my next step I introduced some very basic tactics to save  stones
> with one liberty, capture the opponent's stones with one  liberty and
> capturing the opponent's stones in a ladder. There are  many possible
> choices here. Just doing this near the last move and/or  over the whole
> board. Doing this in the simulation and/or during the  selection.

I think you should only capture 1 liberty stones if the ladder is  
broken. Likewise only save stones that can escape for sure. Perhaps  
you did that but it sounded as if you only applied the ladder reading  
code to capturing stones with two liberties.

My logic is that the program should only play deterministically when a  
move is almost sure to be better than random tenuki. Capturing a  
captured stone is in the worst case as bad as passing.

If you find out that I am wrong in the details here please tell me  
because then I have to make a lot of changes to Valkyria... some of  
these things I did out of mere conviction and I only know for sure  
that the tactics used in total improves the playing strength.

> Just doing this near the last move during simulation caused a slow-
> down of a factor 4 or 5 but improves play considerably. Also doing
> this near the last move during selection doesn't affect speed much  but
> deteriorated play! Doing this first near the last move and then  look
> for tactics over the whole board as a next step affected results
> negatively even more. Number of playouts are still in the same ball-
> park.
>
> Thinking it over, since I don't use this to prune the selection but
> just to order the candidates I could see that after many runs the
> ordering suggested by the tactics get overriden by the UCT selection.
> So I could see the effect of using this for selection reduced  steadily
> with the number of runs through a node. But still I didn't  expect a
> considerable reduction in strength. So what could be  happening here?

I think I missed how you are using move ordering in you program. Basic  
UCT will allocate one run to each child when a node is expanded. If it  
follows the pseudcode on the Sensei Wiki-page then the move ordering  
is random. If you just makes this part ordered then  it will have  
almost no effect on playing strength, because the move ordering will  
only apply to one run for each child move.

If you are using progressive widening or bias the UCT-values with the  
move order then you can have strong positive and/or negative effects.  
If you force the program to allocate a lot of runs to capture dead  
stones, then you lose a lot of runs for every dead block on the board.  
In the worst case scenario on 19x19 in the endgame your program would  
have tons of runs going to meaningless moves in the endgame.

I think my programs suffer from this to some extent I am aiming to  
rethink how move ordering is done. In the last 19x19 tournament on  
KGS, Remi pointed out that several of the weaker MC-programs seemed to  
play a lot weaker as soon as the board was full with possible bad  
captures.

> - I could have a bug.
> - I didn't run enough games (about 50)
> - Using knowledge to order the initial selection is counter- productive
> when not accompanied with pruning.

-Having a stronger program than Orego make me think you have no (less) bugs.
-50 games is very little unless you see a change of winrate over 15-20% or so.
But doing basic captures and saves should be a 30% increase in  
strength against Gnugo for example. Actually I would claim it to be  
the largest improvement you will ever see, because it is so  
fundamental to the game.
-Pruning cannot be bad as long as bad moves are pruned. Too bad there  
are so many moves that cannot easily be pruned...

> The last one I find very hard to believe. Did anyone else run into
> something like this?

Be glad that something works for you at least half of the time. I  
reached the point where 9 out oft 10 changes to the program makes it  
weaker...

> Finally, I also looked a bit at using more threads to make use of  more
> than one processor. I figure this can wait and it's better to  keep
> things simple at this early stage but still it's something I  want to
> keep in mind. When looking at what I need to do to enable  multiple
> threads during search it seems to me I'll be required to  lock
> substantial parts of the UCT-tree. This means traversing the  tree when
> looking for the best node to expand is going to be the main
> bottle-neck. Maybe not with just two to four processors, but I  foresee
> substantial diminishing returns after that. Is this correct?  Is there
> experience with many processors? Maybe a different expansion  algorithm
> will be required?

I think you are correct. What may make you happy is that every time  
you make playouts slower you will also make the bottle-neck less of a  
problem.

This also becomes an increasing problem with longer thinking times  
because the UCT-tree can get really deep at least on small boards.

There is also a fundamental problem with diminsihing returns even if  
the bottle-neck is neglible. With one thread the UCT algorithm picks  
the "best" path through the tree at each run. When the results from  
several threads are pending UCT will not be able to pick the "best"  
path anymore and I think it is unavoidable that at some point the  
benefit of adding more processors will not pay off the cost in $/Elo.

I am sure other programmers have better answers or different opinions  
but I liked the topics you raised so I hope I was not to confusing in  
my answer.

-Magnus




More information about the computer-go mailing list