[Computer-go] Evaluating improvements differently

Aja ajahuang at gmail.com
Thu Apr 7 21:54:57 PDT 2011

Currectly, For Erica, I utilize two kinds of testing,
1. I have a lot of tactical positions, mainly collected from Erica’s lost games (KGS games against human players or from the tournaments). Some of them were artificially designed by myself for many specific tactical situations. I let Erica run through these positions to see what is the correct-answer-rate of the new version.
2. The second is actual playing, against an old version Erica or other programs. This step takes a lot of time since I have only a 4-core PC right now. Usually I make sure it’s an improvement on strength firstly by fix-playouts-per-mover, then play fix-time-per-move to bring SPEED into final considerations.  Of course if the improvement is so big by fix-playout-per-move (such as 100 ELO. Yes I “encountered” 100 ELO SOMETIMES) then it is confirmed directly without any further testing. For changes of the playout on 19x19, usually it needs 10k-playouts-per-move (or more) to prove its effect. For changes of the tree, 3k playouts is usually enough.
For now, I stop Erica completely because I am working on my thesis/papers (and looking for a job).
