[computer-go] Improvement of UCT search algorithm
Łukasz Lew
lukasz.lew at gmail.com
Thu Oct 5 05:04:39 PDT 2006
I guess MoGoxyz and Valkyria_UCTx are already doing it, but I would
like to propose one simple improvement for UCT.
In each node UCT maintains (implicitly or explicitly) average of
playout results.
An equation for updating it explicitly could be:
avg := (1 - a')*avg + a' * playout_result
where a' is learning rate:
a' = 1/n
One may obtain better results if he fixes a' at some point instead of
allowing it to converge:
'a = max (epsilon, 1/n)
Of course this way avg will not converge to any value, but it is the
way it should be because playout_results are not generated from
stationary distribution as in bandit problem.
This is especially important when some good refutation is found late
in the search and n is large so the conventional averaging will take a
lot time to "notice" that the refutation is found.
Of course there are better ways to deal with 'a, but firstly I would
like to hear Your opinion.
BTW
Can we hear what are the enhancements of Valkyria_UCT6 / new Mogo?
BTW2
CGOS pairing algorithm rarely pair strong programs against themselves,
especially when there is a bunch of very weak players urging to play
with them. Maybe we could shut down some of them (with rating < 800
and #games > 10000)?
Lukasz Lew
More information about the computer-go
mailing list