[computer-go] Re: Explanation to MoGo paper wanted.

chrilly c.donninger at wavenet.at
Thu Jul 5 23:23:34 PDT 2007


> I think one of the problems is in testing. Currently we have almost
> no way to judge whether a improvement is good or bad, other than
> playing a lot of games against GNU Go. It takes very long time and
> seems inefficient. Moreover, even it may not be a very good method.
> GNU Go often cannot respond to an obvious bad move correctly, so
> pruning such moves decrease the winning rate.
>
This is THE problem in game programming. To measure progress. Usually an 
improvement is worth 10 Elo. It takes about 1000 games to determine with 
statistical significance such an improvement. Usually one does not make 1000 
games, 100 games are already quite a lot. One chooses often not the best but 
the most lucky version. If one version has an especially good result I rerun 
the test-matches under different conditions (time setting).
Only if the results are repeatable, the version is considered best.
If an improvement is worth 100 Elo, there is no need for extensive testing. 
One sees this immediatly. In fact also smaller improvements are in the end 
chosen by intuition/feeling.

In Go things are insofar worse as there is only one standard sparring 
partner, Gnu-Go. This creates severe inbreeding effects. In chess there was 
a similar problem. There were more strong opponents around, but over the 
years they become very similar. Suddenly there was a new programm, Rybka, 
which plays different and  all the inbreedings have a lot of difficulties.

I think there is no better way. One can do some pre-filtering with test 
positions. If a version is especially bad in these tests, one can ignore it. 
But being good in test positions and in games are different things.

Erdstrahlen:
Jan Louwman was a fanatic tester. His small house was full of 
board-computers. He played by hand 20 games at once (we are in the pre-PC 
computer chess times).
He always reported spectacular results for the programms of Ed Schroeder. 
But when the programms went to market, nobody could replicate Jans results. 
The programms were strong, but not spectacular. Thomas Mally of the Viennes 
chess magazine Module explained this with the different natural radiation 
(German "Erdstrahlen") in Rotterdam and elsewere. Eds programm were 
optimized for this "Erdstrahlen". The "Erdstrahlen-Theorie" become a running 
joke in the chess-community.  Whenever 2 testers reported quite different 
result, it was "explained" by the different amout of "Erdstrahlen".

It is impossible to play by hand 1000 games for each version. Jan usually 
played with 30 sec. or 1 min/move. It would have taken forever. His 
spectacular version was just a very lucky one. If you play enough, you 
always get one. But his testing was certainly a significant contribution to 
the development of Rebel. And it was a very good medicine for Jan. He would 
have died much earlier without this testing.

Chrilly



More information about the computer-go mailing list