[computer-go] Experiments with UCT
Rémi Coulom
Remi.Coulom at free.fr
Tue Jul 25 06:22:39 PDT 2006
Hi,
I mentioned UCT in one of my previous messages to the list:
http://zaphod.aml.sztaki.hu/papers/ecml06.pdf
I tried it in Crazy Stone. I found that the algorithm described in the
paper does not work well, but I managed to improve it a lot with a small
change: I used 1/sqrt(20) instead of 1/sqrt(2) for the C_p constant. It
now seems to work very well.
Here is a summary of how it works:
- Use probability of winning as score, not territory
- Use the average outcome as position value
- Select the move that maximizes v + sqrt((2*log(t))/(10*n))
v is the value of the move (average outcome, between 0 and 1), n the
number of simulations of this move, and t the total number of
simulations at the current position. In case a move has n = 0, it is
selected first.
Here are experiment results with Crazy Stone. 170 games are played
against GNU Go 3.6 at level 10, from 85 different starting positions,
alternating colors, at various time control (time per game), 1 CPU at
2.2 GHz.
version 0005 UCT
2 min 40% 46.7%
4 min 48.2% 56.6%
8 min 52.9% 64.7%
16 min 57.4% 67.6%
32 min 66.6% 71.6%
I have tried hard to improve it, but it seems very difficult. Using a
more clever backup operator may help, but I have not managed to measure
a significant difference yet.
I thank Yizao for letting me know about UCT. His program, MoGo, seems to
be doing very well on CGOS. Maybe Yizao can tell us more about his
experiments.
Rémi
More information about the computer-go
mailing list