[computer-go] More UCT / Monte-Carlo questions

Mark Boon tesujisoftware at gmail.com
Tue Feb 5 07:54:03 PST 2008


Although most of my time has been eaten up by implementing/improving  
some general framework parts I did get a chance to play a bit with a  
simple UCT search. Some things that I found puzzled me a bit and I  
hoped someone had an explanation or similar experiences.

I implemented a very basic UCT / MC program first using pseudo- 
liberties. I figured this should be the base-line against which I can  
test some ideas. To test if the program actually worked properly I  
first let it play against Orego. The speed of my playouts are similar  
to Orego so I figured the level of play should be similar. (I  
switched off pondering and multiple-threading in Orego to get an  
apples-to-apples comparison.)

To my surprise my program seemed to be winning the majority of the  
games (after a few dozen games). When looking at Orego's output I  
couldn't help noticing that at the start of the game it prints much  
smaller numbers of 'runs' than my program, whereas by the end of the  
game the numbers are similar. This may be the reason for my program  
performing better. When I looked at the code of Orego I noticed there  
are two main differences:

- It computes the UCT value in a completely different way. A comment  
in the code refers to a paper called "Modification of UCT with  
Patterns in Monte-Carlo Go". I haven't studied this yet, but whatever  
it does it apparently doesn't do wonders over the standard C * sqrt 
( (2*ln(N)) / (10*n) ) that I use.

- It only initialises the list of untried moves in the tree after a  
node had a minimum run-count of 81 (on 9x9). For the life of me I  
couldn't figure out what the effect of this was or what it actually  
does. I was wondering if this has an effect of what is counted as a  
'run' but I'm not sure.

Then I found a paragraph (4.2) in Remi Coulomn's paper about ELO  
raings in patterns. It briefly describes it as "As soon as a number  
of simulations is equal to the number of points on the board, this  
node is promoted to internal node, and pruning is applied." I can't  
help feeling that the implementation in Orego is doing this. But I  
can't figure out where it does any pruning or applying patterns of  
any kind. Is there supposedly a general benefit to this even without  
pruning or patterns? As stated before, at least it doesn't seem to  
provide any benefit over my more primitive implementation. Maybe  
Peter Drake or someone else familiar with Orego knows more about this?

Anyway, reading the same paragraph mentioned above again I was struck  
by another detail I thought surprising: after doing the required  
number of runs, the candidates are pruned to a certain number 'n'  
based on patterns. Does that mean from then on the win-ratio is  
ignored? What if the by far most successful move so far does not  
match any pattern? Am I misunderstanding something here? The  
paragraph is very brief and does not elaborate much detail.

On to my next step I introduced some very basic tactics to save  
stones with one liberty, capture the opponent's stones with one  
liberty and capturing the opponent's stones in a ladder. There are  
many possible choices here. Just doing this near the last move and/or  
over the whole board. Doing this in the simulation and/or during the  
selection.

Just doing this near the last move during simulation caused a slow- 
down of a factor 4 or 5 but improves play considerably. Also doing  
this near the last move during selection doesn't affect speed much  
but deteriorated play! Doing this first near the last move and then  
look for tactics over the whole board as a next step affected results  
negatively even more. Number of playouts are still in the same ball- 
park.

Thinking it over, since I don't use this to prune the selection but  
just to order the candidates I could see that after many runs the  
ordering suggested by the tactics get overriden by the UCT selection.  
So I could see the effect of using this for selection reduced  
steadily with the number of runs through a node. But still I didn't  
expect a considerable reduction in strength. So what could be  
happening here?

- I could have a bug.
- I didn't run enough games (about 50)
- Using knowledge to order the initial selection is counter- 
productive when not accompanied with pruning.

The last one I find very hard to believe. Did anyone else run into  
something like this?

Finally, I also looked a bit at using more threads to make use of  
more than one processor. I figure this can wait and it's better to  
keep things simple at this early stage but still it's something I  
want to keep in mind. When looking at what I need to do to enable  
multiple threads during search it seems to me I'll be required to  
lock substantial parts of the UCT-tree. This means traversing the  
tree when looking for the best node to expand is going to be the main  
bottle-neck. Maybe not with just two to four processors, but I  
foresee substantial diminishing returns after that. Is this correct?  
Is there experience with many processors? Maybe a different expansion  
algorithm will be required?

	Mark





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20080205/df3f9268/attachment.htm


More information about the computer-go mailing list