[Computer-go] 7.0 Komi and weird deep search result

Don Dailey dailey.don at gmail.com
Thu Apr 7 12:06:11 PDT 2011

On Thu, Apr 7, 2011 at 2:01 PM, Brian Sheppard <sheppardco at aol.com> wrote:

> Measuring small differences is a big problem for me. I would like to have
> better tools here.

As far as I can tell, you MUST test in a similar way to this (which I will
describe next) in order to really make significant progress quickly.
 Anything else is just fishing around and is subject to a lot of wasted

Here is the general outline, and all the top chess program authors are
following similar schemes and I will give you my thoughts on using this same
method in computer go and the pitfalls at the end.

I had to build my own tester for this, for use with chess but it could
easily be done with go.   I have a 6 core i7-980X and my tester is
non-graphical (GUI is a huge slowdown) and can run any number of games
simultaneously.    This is an absolutely necessary tool for making progress
for me.

I also find I get more games per minute if I set my tester to run 12 games
(cpus = 12 in my config file),  even though the machine only has 6 physical
cores.   Ponder is a waste of time (that can be tuned separately)  and so is
MP (that should be tested and tuned separately.)

The other thing is that you have to test much faster than you would really
like to.   I want my program to play chess at long time controls well, but
due to the limitations of my ability to test,  the majority of tests are
done at time controls where each side is given about 15 seconds for the
entire game.  We actually use fischer time controls most of the time.   That
is actually more like 8 or 9 seconds in reality since I am pushing the test
instances to 12 on a 6 core machine.     So at this time control I can get
24 games per minute.

Believe it or not,  these results usually match longer time control results.
   However,  once I have versions that I believe to be good based on fast
tests,  I run a longer test which takes a day or two to get the number of
samples I want.    I require 10,000 games samples before making decisions
although I will stop tests early when it's clear that the version in
question is not going to make it.

A lot of version only win by 1 or 2 ELO.   It's important for statistically
valid reasons not to make judgement based on intermediate results - the full
10,000 games must be run before I accept a version as my new best version.
(The error margins have different meanings if you don't set the number of
games in advance.   Obviously,  it is wrong to stop the test when you "like"
what you see but keep it going when you don't like it,  you would be
unconsciously manipulating the results.)

Ok,  what about Go?

Let's assume we are talking about 19x19 Go.   You will by necessity have to
test much slower with Go than chess.   You have to evaluate for yourself the
trade-offs.     In order to get the number of samples you need,  you will
have to test much faster than you really want to.     We have found that
even in chess, some of our ideas do not scale as well as others.  In extreme
cases some ideas test very well on short searches but go the other way on
long time control tests.  This is probably going to be more of a factor in
Go.     But this is something that will require some time to gain some
experience with and over time you will learn which TYPES of changes scale.

The question you have to constantly ask is whether to go for large samples
or high quality.   The answer is that you almost certainly have to do both.
  Unless you have large samples the results are not very valid,  but they
might not be valid if you don't run at long enough time controls either - so
it's something you have to learn for your program.

Don't be too sure that really short time controls are not valid however.   I
was VERY surprised to find (at least in chess) that the correlation is
extremely high.   If it plays well at game in 5 seconds it will play well at
game in 5 minutes.

I suspect that in 19x19 you will need to triple what I use in chess but I'll
bet that even game in 10 seconds will return more or less valid results.

One other thing is that you need to minimize overheads such as startup time
as much as reasonably possible.  If your program needs 5 seconds
to initialize before starting a game,  you are playing that price over and
over again.

I know many will not like to hear this,  but you HAVE to run a lot of games
and it may require more patience than you are willing to give.    However,
it may not be so bad if you are in the phase of development where you are
making 20-50 ELO improvements at a time.    I use 10,000 games but 5,000 may
be a good compromise.    The ISSUE for ME is that I can come up with ideas
and implement them far faster than I can test them - so you have to be picky
about what to put through this process.

This IS the big secret to recent advancement in computer chess,  and it will
work for Go too.     How many paper on computer go have you read where some
idea was tried,  then it was tested against a version without the idea in a
100 game match?    I generally assume the paper authors are idiots when I
see that,  unless they point out the flaws themselves or the result is


> For instance, I am trying to measure whether a particular rule is an
> improvement, where with the rule it wins 60.5%, and without 60.0%. You need
> a staggering number of games to establish confidence. Yet this is the small,
> 5 to 10 Elo gain that Don referred to.
> I hoped to isolate cases where the *move* differs between versions, and
> then analyze (perhaps using a standard oracle like Fuego) whether those
> moves are plusses or minuses. But this is MCTS, and the program does not
> always play the same way even in the same position.
> -----Original Message-----
> From: computer-go-bounces at dvandva.org [mailto:
> computer-go-bounces at dvandva.org] On Behalf Of "Ingo Althöfer"
> Sent: Thursday, April 07, 2011 12:25 PM
> To: Aja; computer-go at dvandva.org
> Subject: Re: [Computer-go] 7.0 Komi and weird deep search result
> Hello Aja,
> > ... Also, I will be more careful in measuring the
> > improvement, as exampled a lot in your description (my supervisor
> > Remi Coulom also repeatedly corrects me at this point).
> Remi was a strong computer chess programmer ("The Crazy Bishop")
> before he went to computer go. There is for instance a photo of
> him at the WCCC 2004, in the middle of the report
> http://www.chessbase.de/newsroom2.asp?newsid=3350
> Ingo.
> > Aja
> >   ----- Original Message -----
> >   From: Don Dailey
> >   To: Aja
> >   Cc: computer-go at dvandva.org
> >   Sent: Wednesday, April 06, 2011 9:51 PM
> >   Subject: Re: [Computer-go] 7.0 Komi and weird deep search result
> >
> >
> >
> >
> >   On Wed, Apr 6, 2011 at 3:32 AM, Aja <ajahuang at gmail.com> wrote:
> >
> >     Hi Don,
> >     Thanks for your penetrating ideas. Yes, I would like to reconsider my
> > feeling and hope that it doesn’t misguide anyone.
> >
> >     We both know the recent controversy between Fruit and Rybka (or
> Fabien
> > and Vasik), but of course it’s not the issue here right now. Just want
> > to mention in passing that Fabien said he might develop a Go program in
> the
> > next few years, so we can expect for another open-source strong program.
> >
> >
> >   I hope he does,  but of course I did not bring this up to talk about
> the
> > controversy,  just the reality that computer chess software is marching
> on
> > at a remarkable pace and this was an excellent example to illustrate
> that.
> >
> >
> >     It’s just my guess that it’s very hard for current MCTS to surpass
> > amateur 5d or 6d. One main reason is it’s difficult to solve a lot of
> > different semeai and life-and-death instances in pro level, even if the
> > program is running on a super big hardware (by this point I was impressed
> by
> > Olivier’s talk in a conference of Taiwan, in which he gave an “easy”
> > semeai example that Mogo cannot solve with very larger number of
> simulations).
> >
> >
> >   I want to point out that in computer chess that this same exact thing
> > was often done not so many years ago.   A relatively simple position
> would be
> > presented that humans easily understood,  but seemed completely out of
> > reach for computer chess programs to understand.   It was easy to see
> that
> > computers would need some ridiculous breakthroughs to be able to
> understand
> > such positions and the conclusion was that computers probably would never
> be
> > close to the top humans in chess.
> >
> >
> >   It's my view that such illustrations tended to cause people to draw the
> > wrong conclusions and sent people off in the wrong direction,  looking
> for
> > non-existent breakthroughs and concluding that incremental progress was a
> > completely foolish way to proceed.
> >
> >
> >   I believe we (as humans) lack a bit of imagination when it comes to
> > these sort of things.   For example the 4 minute mile was consider
> > physiologically unattainable a few years before the first one was run -
> in other words
> > it was hard to imagine it ever happening.    It's often difficult for us
> to
> > imagine things that are too different from what we are currently
> > experiencing (especially  once we decide it is "hard.")     Maybe part of
> the
> > problem is that we live in an instant gratification society and no longer
> think
> > in terms of hard work and gradual progress,  we want an instant
> > "breakthrough."
> >
> >
> >   Progress is a funny thing if you put numbers on it.  If you get 1%, it
> > doesn't seem like hardly anything. But if you add 1% to that, then 1%
> again,
> >  it's like compound interest in a bank and you look back over just a few
> > of these and are surprised by how much progress you make.
> >
> >
> >   I have been surprised that in chess the point of diminishing returns is
> > farther away that it seems and I'm sure in GO it is even more so by a
> large
> > degree.   In other words ELO progress in software has been more or less
> > steady,  not slowing to a crawl.   Yes, it is punctuated with small
> spikes
> > but seen over anything more than a couple of years it's remarkably
> smooth.
> > As evidence of that,  the program Houdini recently was released that is
> at
> > least 50 ELO over it's nearest competitor,  but you can be sure that is
> > only a temporary situation - it will look like a weak program in 2 or 3
> > years.
> >
> >     Another aspect is that it’s extremely hard for MCTS to
> > consider/argue for few points in early stages on 19x19 (because it only
> sees winning
> > rate and dynamic komi is far from enough to fix it) and that is exactly
> what
> > pros are very able to.
> >
> >
> >   The only thing you are telling me is that we picked a hard problem.
> > There is nothing here inherently unsolvable,  we are  just impatient and
> > cannot imagine (yet) how we are going to solve this.
> >
> >
> >   I have discovered that in computer chess (which I have been into for
> > decades) the "unsolvable" problems didn't really make that much
> difference in
> > the short term.     The solutions come at a natural rate and until
> programs
> > get a lot better in other areas you will find that some of these
> "glaring"
> > weaknesses do not make much different in terms of how strong the program
> > is at the moment even when it seems like its a huge deal.    These
> > weaknesses gradually start making a huge difference  when the program is
> really good
> > and we tend to judge programs more by their weaknesses than their
> > strengths.    So when we see something "ugly" it makes us think the
> program cannot
> > be as strong as it actually has proved to be.    And computer program
> have
> > strengths and weakness in different proportions than we do so this tends
> to
> > distort our own views of how good or bad they play.
> >
> >
> >   An example in computer chess is basic endgame knowledge.   It's really
> > ugly to see a chess program trade down from a won ending to a draw
> because
> > it doesn't understand that certain simple endings cannot be won despite
> > being a piece up. Years ago, after seeing glaring example of this
> horrible
> > weakness, I took some time and implemented a large number of scoring
> > corrections to deal with this as well as putting in king and pawn versus
> king perfect
> > play database.   I patted myself on the back and expect to see a decent
> > ELO gain.    However even on modern programs this probably does not add
> more
> > than 2 or 3 ELO and I'm being generous.     If you show a grandmaster
> some
> > of these glitches he might conclude that your program plays like an
> amateur
> > (in fact when programs first became master strength many strong human
> > players would see one of the "ugly" moves and conclude that the program
> could
> > not play a move like that and even be "expert" strength, let alone master
> > strength.)
> >
> >
> >   I'm not saying these are not real problem in computer go,  but the
> point
> > is that there a large number of problems that altogether define exactly
> > where we stand right now and we just have to start making dents (which we
> > actually have been doing to a remarkable degree if you would only look
> more
> > carefully.)    The bigger problems are just going to take longer to fix
> than
> > the lesser problems.      Also,  I believe we have to get over this
> notion
> > that we have to "fix it" completely.  We probably will not fix it
> suddenly
> > with a one line program change,  but we can and will find ways to
> minimize
> > the problems, and it may be gradual and incremental.
> >
> >
> >   In your example you rightly note that program do very well when in
> their
> > "sweet spot",  when there are clearly defined goals that affect winning
> > percentages.     In computer chess it used to be believed that no amount
> of
> > searching could improve the programs "horrible" positional play and that
> > computers only played well if there were immediate tactical
> considerations,
> > otherwise they quickly went wrong.     That turned out not to be true, it
> > was just not clearly understood at the time because we were looking at
> the
> > problem through our own biased eyes and seeing the ugly things.     The
> truth
> > of the matter is that the tree search and playouts works well in all
> > positions but some more than others and we will find ways to clearly
> improve the
> > situation in the future with incremental progress (not major
> > breakthroughs.)    Also, we have some clearly wrong things that we will
> fix (like eye
> > definitions we have are approximations and are sometimes broken.)
> >
> >
> >   I'll say it again,  I think computer go is still in it's infancy and we
> > are still looking for the big fixes and have not yet come to fully
> > appreciate the immense practical power of incremental improvements over
> time.
> > When the problem looks big we feel like small improvements are a waste of
> time
> > but nothing is farther from the truth.
> >
> >
> >
> >
> >     The progress in hardware by Mogo, Fuego and pachi is well-known and
> > impressive, so that I don’t think the amazing progress in computer Go is
> > mainly due to software. Both hardware and software are important in
> making a
> > strong Go program for now, as far as I can see.
> >
> >
> >   I think the pattern is the same as it happened in computer chess.   But
> > I personally believe that software will be a much bigger contributor to
> > progress in the future (even if you ignore the "slowdown" of Moores law.)
> >
> >
> >
> >
> >     I hope your prediction is right: “without anything really major (but
> > no doubt some new small ideas) we are going to see your KGS 5 and 6 dan
> > and much higher in 5 to 10 years.” If not, then we will have a lot of
> > “interesting” work to do, no matter testing methodology, engineering or
> > academic etc.
> >
> >
> >   I am convinced I am right on this one.   I have absolutely nothing
> > against finding major breakthroughs of course but I think what we will
> call
> > "breakthroughs" will be things that add up to 50 ELO or less.    In chess
> it
> > was things like check extensions,  null move pruning, futility pruning,
>  LMR
> > etc.    We called null move pruning a "major breakthrough" but when it
> was
> > first used it added something like 40 or 50 ELO to the strength of a
> chess
> > program.    Is that a major breakthrough?  You don't notice 50 ELO right
> > away by watching it play because it still will lose 43 percent of the
> games
> > and thus still get outplayed in many games,  but  It depends on your
> > definition of "major" I guess.     In go that would be something like 1/2
> dan.
> > I do think there will be a few of these kinds of breakthroughs.    What
> > happens is that these good ideas need to get refined and improved too.
> I
> > think we get much more out of null move pruning than we used to.   LMR
> when
> > first implemented does not give chess program hardly any gain until it's
> > done just right.   But when refined it's pretty huge.   My first LMR
> > implementation was only about 20, now it's like 100 ELO or more.
> >
> >
> >   Thanks for listening to me ramble on about this ...
> >
> >
> >   Don
> >
> >
> >
> >     Aja
> >
> --
> GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
> gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20110407/3180ebfa/attachment.html>

More information about the Computer-go mailing list