[computer-go] Experiments with UCT

Rémi Coulom Remi.Coulom at free.fr
Tue Jul 25 06:22:39 PDT 2006


Hi,

I mentioned UCT in one of my previous messages to the list:
http://zaphod.aml.sztaki.hu/papers/ecml06.pdf

I tried it in Crazy Stone. I found that the algorithm described in the 
paper does not work well, but I managed to improve it a lot with a small 
change: I used 1/sqrt(20) instead of 1/sqrt(2) for the C_p constant. It 
now seems to work very well.

Here is a summary of how it works:
  - Use probability of winning as score, not territory
  - Use the average outcome as position value
  - Select the move that maximizes v + sqrt((2*log(t))/(10*n))

v is the value of the move (average outcome, between 0 and 1), n the 
number of simulations of this move, and t the total number of 
simulations at the current position. In case a move has n = 0, it is 
selected first.

Here are experiment results with Crazy Stone. 170 games are played 
against GNU Go 3.6 at level 10, from 85 different starting positions, 
alternating colors, at various time control (time per game), 1 CPU at 
2.2 GHz.

         version 0005  UCT
  2 min  40%           46.7%
  4 min  48.2%         56.6%
  8 min  52.9%         64.7%
16 min  57.4%         67.6%
32 min  66.6%         71.6%

I have tried hard to improve it, but it seems very difficult. Using a 
more clever backup operator may help, but I have not managed to measure 
a significant difference yet.

I thank Yizao for letting me know about UCT. His program, MoGo, seems to 
be doing very well on CGOS. Maybe Yizao can tell us more about his 
experiments.

Rémi



More information about the computer-go mailing list