[computer-go] Where and How to Test the Strong Programs?

Rémi Coulom Remi.Coulom at univ-lille3.fr
Thu Dec 13 11:13:37 PST 2007


Don Dailey wrote:
> It would be great if you would provide recommendations for a simple
> conversion formula when you are ready based on this study.       Also,
> if you have any suggestions in general for CGOS ratings the
> cgos-developers would be willing to listen to your suggestions.
>
> - Don
My suggestion would be to tell programmers to use a different login each 
time they change version or hardware (most do that, already), and use 
bayeselo to rank the programs.

This would be best if combined with a mechanism to recognize that two 
logins are versions of the same program (for instance, if they use the 
same password), and avoid pairing them.

Regarding correspondance with human ranks, and handicap value, I cannot 
tell yet. It is very clear to me that the Elo-rating model is very wrong 
for the game of Go, because strength is not one-dimensional, especially 
when mixing bots and humans. The best way to evaluate a bot in terms of 
human rating is to make it play against humans, on KGS for instance. 
Unfortunately, there is no 9x9 rating there. I will compute 9x9 ratings 
with the KGS data I have.

What I have observed with Crazy Stone is that gaining Elo points against 
humans is more difficult than gaining Elo points against GNU Go, which 
is more difficult than gaining Elo points against MC programs, which is 
more difficult than gaining Elo points against itself. But it is more an 
intuition than a scientific study.

Rémi


More information about the computer-go mailing list