[computer-go] Improvement of UCT search algorithm

Łukasz Lew lukasz.lew at gmail.com
Thu Oct 5 05:04:39 PDT 2006


I guess MoGoxyz and Valkyria_UCTx are already doing it, but I would
like to propose one simple improvement for UCT.

In each node UCT maintains (implicitly or explicitly) average of
playout results.
An equation for updating it explicitly could be:

avg := (1 - a')*avg + a' * playout_result

where a' is learning rate:

a' = 1/n

One may obtain better results if he fixes a' at some point instead of
allowing it to converge:

'a = max (epsilon, 1/n)

Of course this way avg will not converge to any value, but it is the
way it should be because playout_results are not generated from
stationary distribution as in bandit problem.

This is especially important when some good refutation is found late
in the search and n is large so the conventional averaging will take a
lot time to "notice" that the refutation is found.

Of course there are better ways to deal with 'a, but firstly I would
like to hear Your opinion.

BTW
Can we hear what are the enhancements of Valkyria_UCT6 / new Mogo?

BTW2
CGOS pairing algorithm rarely pair strong programs against themselves,
especially when there is a bunch of very weak players urging to play
with them. Maybe we could shut down some of them (with rating < 800
and #games > 10000)?

Lukasz Lew


More information about the computer-go mailing list