[Computer-go] Assessing Improvements

Petr Baudis pasky at ucw.cz
Sat Feb 19 13:58:53 PST 2011

On Sat, Feb 19, 2011 at 11:14:16AM -0800, Steve Safarik wrote:
> Suppose I develop what I think is an improved feature, for example a better influence function or some other.  I'd like to hear people's thoughts on how to best & most quickly determine if it is in fact an improvement.  Do I just take my new function and replace the equivalent function in something like Fuego, then have the two engines start playing games?  My impression is that would be a rather slow way to get enough games to be of significance.  Is there a better way to compare two engines?  If that is indeed the method people generally use, how much time do you allow per move or game, and can you tell me your general experiences with doing this?  Thanks.

  Avoid self-play if possible. Sometimes, it can be useful, but you need
to be very careful when interpreting it.

  It is best to pick a reference opponent; gnugo is by far most popular,
it is not too strong but it has significantly different style and
totally different weaknesses compared to MCTS programs. Therefore, you
can avoid many blindspots you would get caught in otherwise. Of course,
gnugo is much weaker than modern programs, but I find that simply giving
gnugo loads of komi keeps accuracy at satisfying level.

  Another factor is time per game. Some people advise for very fast
games and/or limitation by playout. I do not think either is good idea.
Your idea is no good if the program beats gnugo in 5% more cases, but
takes 20% longer per move. And I find that MCTS at least in Pachi
behaves *much* differently in case of very short and reasonable time
limits. In general, with very short time limits, heuristics have much
higher effect than tree search.

  Currently, I'm using 10:00 or 8:20 SD for 19x19 testing, giving Pachi
no komi as white. In the past, I was using something like 4:30 SD, but I
have found that I'm getting quite different picture in many cases when
testing with longer time limits. Yes, you need to wait much longer
and/or use much more hardware (I'm making use of idling department
workstations at nights) to get statistically significant results, but
that's one of the burdens of a Go programmer. :-)

				Petr "Pasky" Baudis
Computer science education cannot make an expert programmer any more
than studying brushes and pigment can make an expert painter. --esr

More information about the Computer-go mailing list