[computer-go] How to design the stronger playout policy?
Yamato
yamato_cg at yahoo.co.jp
Sat Jan 5 04:30:47 PST 2008
Gian-Carlo Pascutto wrote:
>What improvements did you try? The obvious one I know are prioritizing
>saving and capturing moves by the size of the string.
>
>Zen appears quite strong on CGOS. Leela using the above system was
>certainly weaker.
I use the static ladder search in playouts. For example, if a move that
matched a 3x3 pattern is capturable in ladder, that is not interesting.
Of course such a rule makes a program slower, but I believe it is an
improvement.
>I finally improved my playouts by using Remi's ELO system to learn a set
>of "interesting" patterns, and just randomly fiddling with the
>probabilities (compressing/expanding) until something improved my
>program in self-play with about +25%. Not a very satisfying method or an
>exceptional result. There could be some other magic combination that is
>even better, or maybe not.
I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.
Would you explain the details of the playout policy?
Do you use only 3x3 patterns?
>What is so frustrating is that the playouts are essentially black magic.
> I know of no way to automatically determine what is good and not
>besides playing about 500 games between 2 strategies. The results are
>very often completely counterintuitive. There is no systematic way to
>improve.
Yes. In addition, the big problem is that testing policies is very time
consuming. I think at least 1000 games that use 3000 or more playouts
per move are needed to judge whether a change is good or bad.
--
Yamato
More information about the computer-go
mailing list