[computer-go] Experiments with UCT

Don Dailey drd at mit.edu
Tue Jul 25 07:41:49 PDT 2006


The approach I am experimenting with for Lazarus is to always play the
"best" move where "best" is defined as a function of the number of
samples and the score.   The result of each game is -1 or 1.

function f = (r + N) / (s + N)  where r is the tallied up results, s is
the number of simulations and N is a constant.   At any given node in
the tree part of the search Lazarus plays the move with the highest f as
calculated by this function.

But I slowly "degrade" the samples as Lazarus searches.  I degrade the
number of simulation and the results in the same proportion by
multiplying by a fraction very close to 1.0.   So as soon as a move is
completed and the statistics accounted for,  I degrade all siblings
including the move that was played by multiplying by the degrade
constant.  

This allows new lines of play to be discovered much more quickly because
even the very weakest moves are guaranteed to get occasionally sampled.
If a move looks great initially,  but a better move is discovered the
"old" move will degrade gracefully and will not have to overcome a
previously huge sample - thus getting a much faster mini-max effect.  

The degrade operation has the effect of filtering out noise from
previously irrelevant lines that were not good.   Or another way to look
at it is that it emphasizes more recent history.  

The search will not return a move at the root until it is BOTH best and
has the highest "sample" of all the candidate siblings.   There is an
exception to this to handled cases near the end of the game where all
moves are nearly equal - I simply use the move after a long timeout that
has the largest sample.

I am running Lazarus on a slow computer (Celeron chip) but it seems to
be keeping up with the pack of programs in the low to mid 1700's such as
gnugo_3.7.4 and the new program MoGo which has recently improved
significantly.    I have done scalability tests which indicate that
Lazarus responds significantly to processing power, so I think what I am
doing is pretty reasonable although I don't have solid theory behind it,
at least not theory that I fully understand.

I don't claim this is the right way to do it,  I think that probably the
CrazyStone type of approach could be slightly better - something that
explicitly tries to use a more intelligent back-up rule.   

Here are some things I tried:

I tried using a different higher constant N for moves that seem urgent,
such as captures and atari moves.   I felt certain that I should focus
on resolving these moves and it would give an improvement.   But I could
not see any improvement.  I even constructed problems which should be
enhanced by this idea.   My theory about why this doesn't seem to help
is that there are a lot of atari and capture moves which are not
necessarily important and all I am doing is distracting the search from
what it does best - automatically determining which moves to explore and
which not to.

One version of Lazarus dynamically adjusted N in the above formula based
on the largest sampled sibling move.   In this version I did not degrade
samples.  The whole idea of this was to make sure that all moves got
sampled occasionally which doesn't happen if you FIX the constant.  

I think it's important to throw out moves that have no chance of being
more than equal to the best.   However I'm too weak as a go player to
understand how to do this.    I have ladder search routines that try to
be smart and fast and one thing I tried is to not place a stone that is
subject to a quick capture tactic.   It's hard (for me) to find the
right conditions that make this rule reliable.   Sometimes this is just
part of winning a surrounding group for example.   Once I start putting
too many conditions on it then it doesn't get used enough to be much
help.   But Lazarus loses a lot of games because is wastes time placing
a "threatening" stone where it can't survive and sometimes ends up
helping the opponent make an easy eye.

Some of these kinds of move produce serious horizon effects.   Sometimes
you can place a stone that is subject to capture but the opponent is
forced to respond or lose.   The move didn't hurt you even though the
opponent wins your stone because he had that territory anyway and you
forced him to waste a move.   But when it happens inside a tree search
it can be used as a device by the search to delay or mask the
inevitable.  That's why I believe it would be really useful to recognize
and ignore these moves.   I just haven't figured out how.

- Don



  
On Tue, 2006-07-25 at 15:22 +0200, Rémi Coulom wrote:
> Hi,
> 
> I mentioned UCT in one of my previous messages to the list:
> http://zaphod.aml.sztaki.hu/papers/ecml06.pdf
> 
> I tried it in Crazy Stone. I found that the algorithm described in the 
> paper does not work well, but I managed to improve it a lot with a small 
> change: I used 1/sqrt(20) instead of 1/sqrt(2) for the C_p constant. It 
> now seems to work very well.
> 
> Here is a summary of how it works:
>   - Use probability of winning as score, not territory
>   - Use the average outcome as position value
>   - Select the move that maximizes v + sqrt((2*log(t))/(10*n))
> 
> v is the value of the move (average outcome, between 0 and 1), n the 
> number of simulations of this move, and t the total number of 
> simulations at the current position. In case a move has n = 0, it is 
> selected first.
> 
> Here are experiment results with Crazy Stone. 170 games are played 
> against GNU Go 3.6 at level 10, from 85 different starting positions, 
> alternating colors, at various time control (time per game), 1 CPU at 
> 2.2 GHz.
> 
>          version 0005  UCT
>   2 min  40%           46.7%
>   4 min  48.2%         56.6%
>   8 min  52.9%         64.7%
> 16 min  57.4%         67.6%
> 32 min  66.6%         71.6%
> 
> I have tried hard to improve it, but it seems very difficult. Using a 
> more clever backup operator may help, but I have not managed to measure 
> a significant difference yet.
> 
> I thank Yizao for letting me know about UCT. His program, MoGo, seems to 
> be doing very well on CGOS. Maybe Yizao can tell us more about his 
> experiments.
> 
> Rémi
> 
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/



More information about the computer-go mailing list