[computer-go] Is Rémi correct?
Don Dailey
drdailey at cox.net
Tue Feb 5 11:45:29 PST 2008
As promised, to answer Rémi, I did a study with mogo vs Gnu at various
levels. There is NO self play involved, Gnugo-3.7.11 is the only
opponent for progressively higher rated version of Mogo.
Here are the raw results so far:
Rank Name Elo + - games score oppo. draws
1 Mogo_10 2319 72 60 500 95% 1800 0%
2 Mogo_11 2284 94 74 259 94% 1800 0%
3 Mogo_09 2234 57 49 500 92% 1800 0%
4 Mogo_08 2124 43 39 500 87% 1800 0%
5 Mogo_07 2016 35 33 500 78% 1800 0%
6 Mogo_06 1961 32 30 500 72% 1800 0%
7 Mogo_05 1814 28 28 500 52% 1800 0%
8 Gnugo-3.7.11 1800 13 13 5259 44% 1823 0%
9 Mogo_04 1711 29 29 500 37% 1800 0%
10 Mogo_03 1534 35 38 500 18% 1800 0%
11 Mogo_02 1281 60 72 500 5% 1800 0%
12 Mogo_01 1004 115 178 500 1% 1800 0%
The issue is whether self-play results distort the rating of programs.
In this case, we are only testing whether it distorts the ratings of
Mogo since no other programs were tested.
In the following table, I played up to 500 games between Gnugo and Mogo
at various levels. The levels are the exact levels that correspond to
the big scalability study. In the middle column I listed the
ratings as computed by bayeselo in games against ONLY Gnugo and set the
default rating of Gnugo to 1800, just as in the study.
Unfortunately, I used level 10 in the gnugo only games but in the big
study we use level 8. It's my understanding there is little difference
between these 2 but we can probably assume Mogo might be a little better
than indicated relative to the big scalability study.
It looks like there indeed is a lot of distortion at the low end of the
scale. Mogo seems much stronger at low levels than the larger
scalability study indicated.
At the higher levels, we also get a mismatch, where Mogo's rating
doesn't seem as high when playing only Gnugo. This is as Rémi
claims.
One thing to note is that at higher levels it's more difficult to get an
accurate rating. Mogo_10 is winning 95% of it's games against Gnugo,
and an extra win or loss every few games can make a lot of difference.
However I am inclined to believe this is real since it seems to hold for
several upper levels. At level 7 it's only 42 ELO, but at levels
beyond this it's over 100 ELO.
I've never doubted that there is some intransivity between programs, but
I am a little surprised that it is this much. Even if the comparison is
slightly unfair due to Mogo playing a stronger version of Gnugo in this
study, it's still seems like it must be at least 100 ELO.
vers vs Gnu Study
---- ------ -----
01 1004 688
02 1281 1093
03 1534 1331
04 1711 1554
05 1814 1751
06 1961 1971
07 2016 2058
08 2124 2270
09 2234 2347
10 2319 2470
My suggestion to improve this situation is to play a few thousands games
against a well rated Gnugo and set up mogo as a second anchor.
- Don
More information about the computer-go
mailing list