[Computer-go] Evaluating improvements differently
dailey.don at gmail.com
Thu Apr 7 14:03:36 PDT 2011
On Thu, Apr 7, 2011 at 4:55 PM, Álvaro Begué <alvaro.begue at gmail.com> wrote:
> Just to be clear, the whole point of what I said is that it's easier
> to quantify the quality of the scores returned by the search than to
> quantify the quality of the moves. I know people have used problem
> sets before with little success, and what I am suggesting is
> different, even though it might sound similar.
Yes, and I agree with you. In fact I think this, if done right, will give
better results than just using problem sets.
The big question for me is whether it can replace massive testing by playing
actual games. It seems very unlikely to me that you can measure
something indirectly and get as good a result as if you measured it
directly. It's like trying to determine how far someone can shotput by
seeing how much weight they can bench-press and then extrapolating from
Of course it does not HAVE to be as good as automated game play if it's
almost as good and can be done MUCH faster - that it what we are hoping
> On Thu, Apr 7, 2011 at 4:48 PM, Don Dailey <dailey.don at gmail.com> wrote:
> > On Thu, Apr 7, 2011 at 4:27 PM, terry mcintyre <terrymcintyre at yahoo.com>
> > wrote:
> >> To address Don's points, here are a few ideas:
> >> One, pick situations where the choice of good moves is limited. These
> >> probably be "all or nothing" battles. If you pick move A, you win.
> >> Otherwise, you lose, tough luck.
> > The test will have some value, but this does not simulate what actually
> > happens in games, only a subset of that. It may be for example that in
> > positions where there are 10 plausible moves is where a lot of the ELO
> > comes from. I don't really know, this is just an example.
> > My own speculation on this is that by far the most relevant test is the
> > kinds of moves you need to AVOID playing. Go (and just about every
> > game) is more about avoiding stupid moves and errors than it is in
> > some profound moves. You don't "produce" wins, you take them when
> > is opportunity and opportunity is provided by your opponent.
> > Of course the program has to be skilled enough to find good moves when
> > exist, but the biggest problem in computer go is weak moves by the
> > (where weak means giving the opponent opportunities.)
> > The computer goes wrong by far the most where there is not a specific
> > or target that it can understand - that's when it plays a "weak" move
> that a
> > strong player will pounce on.
> > In tennis, I try to think in terms of getting the ball over the net so
> > it's the opponents turn to make a mistake. This is an
> > but it is essentially how it works. Each point in tennis is lost by
> > of the players, it's never "won" but of course we try to hit great shots
> > that the opponent has a much greater chance to "lose" the point.
> > trying to hit those great shots is what causes you to lose a lot of
> > :-)
> > Don
> >> Two, test for proper followup. It's not enough to determine A, then
> >> about the battle and play somewhere in another corner. If this is an
> >> all-or-nothing battle, the program must continue to play the proper
> >> to every threat, until the job is done.
> >> For extra credit, test what happens if the opponent continues to thrash
> >> after the situation becomes hopeless - the program should not waste a
> >> in that area if it is not needed, but should take sente elsewhere.
> >> In short, the program must master not only move A, but B, C, D, E, ... Z
> >> and one should test different move orders for the opponent, including
> >> in other parts of the board.
> >> Having mastered a few such test cases, now introduce a ko or seki to
> >> the analysis a bit more complex. Rinse and repeat.
> >> Terry McIntyre <terrymcintyre at yahoo.com>
> >> Unix/Linux Systems Administration
> >> Taking time to do it right saves having to do it twice.
> >> ________________________________
> >> From: Don Dailey <dailey.don at gmail.com>
> >> To: computer-go at dvandva.org
> >> Sent: Thu, April 7, 2011 3:33:36 PM
> >> Subject: Re: [Computer-go] Evaluating improvements differently
> >> On Thu, Apr 7, 2011 at 2:27 PM, Álvaro Begué <alvaro.begue at gmail.com>
> >> wrote:
> >>> Hi,
> >>> I haven't spent any time in go programming recently, but a few months
> >>> ago I thought of a method to evaluate proposed improvements that might
> >>> be much better than playing a gazillion games. A search results in two
> >>> things: A move and a probability of winning (or a score that can be
> >>> mapped into a probability of winning, but let's ignore that issue for
> >>> now). Evaluating whether the moves picked by a strategy are good is
> >>> really hard, but evaluating whether the estimate of a probability of
> >>> winning is a good estimate seems much easier.
> >>> For instance, take a database of games played by strong players.
> >>> Extract a few positions from each game. Run your engine for some fixed
> >>> amount of time on each position, and measure how well it predicted the
> >>> winner after each position (cross entropy is probably the correct
> >>> measure to use). Do this before and after the proposed modification
> >>> and compare.
> >>> Of course one has to be careful to pick reasonably well-played games
> >>> (games played by top engines with more time per move than you'll use
> >>> to evaluate your engine seems good enough, and will result in a much
> >>> cleaner database than collecting games played by humans) and to have a
> >>> large enough and varied enough set of positions. Also, one should
> >>> worry about over-fitting for those particular positions, but one could
> >>> use another set of positions for confirmation. These problems all seem
> >>> manageable to me.
> >>> It is possible that certain improvements can really only be measured
> >>> by playing games (time control comes to mind), but I would imagine
> >>> that for a large class of things this procedure can give you useful
> >>> results in much more reasonable times.
> >>> Your thoughts are appreciated.
> >> I think this should be tried, but I also believe you will be bitterly
> >> disappointed. The holy grail for me is to find some way to quickly
> >> versions and I have found that NO test does that better than playing
> >> The problem is that a real game is full of complex variables that
> >> interact in strange and wonderful ways.
> >> The reason a program wins or loses games is not as simple as just saying
> >> it played a bad move, or it failed to see this or that . It's rather
> >> like the stock market. Have you noticed that at the end of the day
> >> someone reports on the stock market and always gives a reason why this
> >> that stock (or the entire nasdaq) rose or fell? They like to pretend
> >> they understand why but in reality it's just a wild guess. It's
> >> way with why you win and lose games. The reasons are very complicated
> >> you cannot just count the good moves and bad moves to get a global
> >> nor can you construct a good problem suite that will tell you how good
> >> program is (unless you are willing to accept about 2 or 3 Dan worth of
> >> error.)
> >> I am always interested in ideas like what you propose here and I think
> >> should be experimented with - I'm just very skeptical. Been there,
> >> that.
> >> Years ago there were tests based on counting how many moves a player
> >> that matched games of top grandmasters. You would hide the answer,
> pick a
> >> move, then see if it matched the move "Bobby Fischer" would play.
> >> your skill would be computed based on how many of these you got "right."
> >> However recent studies show that 2 equally stronger players (computers
> >> this case) can vary significantly in choice of moves and in fact a
> >> similarity test was constructed that showed that different equally
> >> programs played much more differently than 2 versions of the SAME
> >> where one version was older and significantly weaker. So the choice
> >> move had a lot more to do with style than it did strength. That is
> >> very opposite of most peoples intuition.
> >> A program can play very well but be weak because of the occasional
> >> horrible move, or it can play very "average" without making major
> >> but still missing a lot of "brilliancies." There is complex
> >> here that determines the actual RESULTS it will get in competition.
> >> However I still like your idea and if you (or anyone else) tries it I am
> >> very interested in your experiences with it. I am always hoping
> >> better will come along ...
> >> By the way, there is also the notion of weighted problem sets. I
> >> your idea has similarities. You try to solve problems but some
> >> get much different "scores" than others. This is an attempt to
> >> naive problem testing. Another idea that has been tried is to test as
> >> programs as possible of widely varying strength on a zillion problems
> >> then fit a scoring method that predicts strength based on this.
> >> Don
> >>> Álvaro.
> >>> _______________________________________________
> >>> Computer-go mailing list
> >>> Computer-go at dvandva.org
> >>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> _______________________________________________
> >> Computer-go mailing list
> >> Computer-go at dvandva.org
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> > _______________________________________________
> > Computer-go mailing list
> > Computer-go at dvandva.org
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> Computer-go mailing list
> Computer-go at dvandva.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go