[computer-go] thoughts on 100,000 cgos games
Don Dailey
drd at mit.edu
Sun May 7 17:44:49 PDT 2006
Any kind of learning needs a "training signal" - some kind of way of
determining what to reward and what to punish.
One training signal is the result of the game. If white wins the game,
for instance, we have a little piece of useful information. The trick
is what to do with this information. What immediately comes to mind is
to consider that the moves of the winner are likely to be better on the
whole that the moves of the loser. Now we can set up some sort of
punishment/reward system to encourage the goal program to "play more
like a winner."
Monte Carlo style Go programs are examples of programs that do this,
although we don't think of them as learning programs. Play 10,000
random games, and reward the play of the 10,000 winners.
It's amazing to me how well this works. If you play a random game, it
doesn't seem like there should be much useful information contained in
that game - but there is! If you keep statistics on the moves played
by the 10,000 winning programs, and then simply play the move that was
played most often by the "winners", you get a quite reasonable playing
program (at least by the standards of programs playing on CGOS.)
David is apparently envisioning a learning system where he rewards
SlugGo for winning and punishes SlugGo for losing.
Another way of doing the same is to not concern yourself with WHO won,
but punish losing behavior (as judged by the result of a game) and
reward winning behavior. If David could do this, he wouldn't need to
be concerned with WHO won or lost the game - he could just consider
SlugGo losses since they contain more information content from the point
of view of a training signal. (Since SlugGo is a strong program - any
losses are a goldmine of information.)
But Davids original complaint was that he couldn't do training because
he didn't know who the opponent was (I don't understand why he thinks
this but he does - I can send him the games WITH opponent information in
the SGF files.)
- Don
On Mon, 2006-05-08 at 01:00 +0100, Aidan Karley wrote:
> In article <EC419E85-AF89-4060-ACFB-8EC0B8294B18 at mac.com>, David G
> Doshay wrote:
> > I can think of no solid way to benefit from an automated approach
> > given only the information that I lost all of these games.
> >
> Would concentrating on those lost games to process out
> additional metrics help? For example, classifying the lost games by
> number of groups (winner), ratio of #groups(winner) : #groups(loser),
> points of territory/group, number of eyes/group. Wouldn't that
> (potentially) give you more information about the types of error you're
> making, e.g. allowing the opponent to split your groups while
> connecting their groups (emphasised by the #groups ratio above), or
> allowing groups to be squeezed too much (reduced points of territory
> per group metric)?
> Has anyone tried defining such metrics, and proposed some styles
> of play that can be differentiated on the basis of such metrics? Would
> such metrics, if they work, be a useful contribution to the information
> theory aspects of computer go, rather than specifically assisting
> development of a particular program.
>
More information about the computer-go
mailing list