[computer-go] Experiments with UCT
Wang Yizao
yizao.wang at polytechnique.edu
Wed Jul 26 01:44:09 PDT 2006
Rémi Coulom wrote:
>Yizao Wang wrote:
>
>
>>The test we tried are different from Remi's. (His is interesting: '170 games are played against GNU Go 3.6 at level 10, from 85 different starting positions', does it say that there are games beginning from the middle of 'one game'?) We set MoGo play against gnugo3.6 with default level, each version of MoGo(or with different parameters) plays at least 100 games. Instead of giving the total time as a parameter, we set a certain time for each move when MoGo is thinking.
>>
>>
>
>My starting positions have only one white stone and one black stone. I
>used them only to ensure variety in the games. I used to run matches
>from the empty board before, counting on the randomness of Crazy Stone
>and GNU Go to avoid repeating games. All the games were different, but I
>found that some sequences of moves were very frequent at the beginning.
>I believe that starting from random positions ensures more variety and
>may provides more significant data.
>
>
>
OK. But maybe in this case you should test more games?
>>We get also the results showing the improvement of level with more time given. Some results for MoGo 1.0.0, the version with which it played during the 16th KGS computer go tournament. (This version is very slow and obviously he is much weaker than CrazyStone using UCT.)
>>
>>5sec around 30%
>>20sec around 40%
>>60sec around 50%
>>
>>However, with 120sec for each move, we have not seen any improvement. I do not know if Remi has tried some experiments with more times.
>>
>>
>
>At 32 minutes per game, Crazy Stone takes about 2 minutes per move for
>the first moves. I have also run mathes at 128 minutes per game before
>Turin, and noticed that it improved over games in 32 minutes, but that
>was not with UCT.
>
>A point to take into consideration is that, with 100 games, uncertainty
>margins are rather wide. Even with my 170 games it is not rare to obtain
>an inferior result when doubling the thinking time. That is noise.
>
>
>
Yes, the noise is big, which prevents from evaluating precisely the
progress.
>>Three conjectures are discussed:
>>--120sec is not long enough to see another improvement of 60sec's behavior. Maybe... 5minutes?
>>--Bugs in MoGo.
>>--UCT's application to Go has some problems.
>>
>>
>
>UCT is extremely selective, and that may be a problem. sqrt(log(t)),
>which is the term that forces to select moves with a bad evaluation from
>time to time, grows extremely slowly. The n-armed bandit algorithm UCT
>is based on was designed for problems having stationary distributions,
>which is not the case when searching recursively.
>
>This being said, I found that the extreme selectivity of UCT works
>rather nicely. It helps to search some lines of play very deeply.
>
>By the way, I have though about the following idea: the log(t) term in
>the square root is not even necessary to ensure convergence to the
>optimal play. That's because we are using binary values and Go is
>deterministic. A move is either a win or a loss. So in case the move
>that is currently considered best is a loss, even if its estimated value
>is so high that other moves are not searched at all, its value will
>eventually go down, and a winning move will be searched, if there is
>one. I found that this actually happens when observing games of Crazy
>Stone. So, being extremely selective may be good.
>
>It was not the case in my CG2006 paper, when I was using territory as
>the evaluation. Using probability of winning instead of territory
>completely changes the nature of the search. I believe that the
>selectivity and backup operators I was using then were good for a
>territory-based evaluation, but they are not any more with a
>probability-of-winning-based evaluation, that allows a much higher
>selectivity.
>
>
>
>>The last one is interesting, since the algorithm of MoGo is slightly different from UCT at obtaining the final score(by random simulation), which might give a big bias that could be amplified during the update process in the tree from the deep level to the root.
>>
>>
>
>Do you mean that you use a backup operator a bit like the one I
>described in my CG2006 paper? My experiments with applying such an
>operator have not been very sucessful so far, but I am sure it can be done.
>
>
>
No, I mean, essentially UCT is a determinist algorithm, but in our
program we have a stochastic part(for giving a score at each leaf).
>I think an important aspect of designing a backup operator, that I had a
>little overlooked at the time of my CG2006 paper, is that it is much
>worse to over-estimate the value of a position than to under-estimate
>it. Backing-up the average under-estimates the position, but it is not
>so bad. Under-estimating the position means that the move that leads to
>this position will be over-estimated, which means it will be searched
>more, and the over-estimation will be eventually solved by searching
>more. Over-estimating a position means that the move that leads to the
>position is underestimated, so it may be searched less (almost not at
>all with UCT), so the over-estimation may never be corrected, which may
>lead to catastrophes.
>
>
Interesting.
>Don's idea of forgetting early simulations seems good, and may work with
>UCT to improve the backed up value. I will try it.
>
>Rémi
>
>
Yizao
>_______________________________________________
>computer-go mailing list
>computer-go at computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
>
>
More information about the computer-go
mailing list