[Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

Justin .Gilmer jmgilmer at gmail.com
Tue Feb 23 07:41:55 PST 2016


I made a similar attempt as Alvaro to predict final ownership. You can find
the code here: https://github.com/jmgilmer/GoCNN/. It's trained to predict
final ownership for about 15000 professional games which were played until
the end (didn't end in resignation). It gets about 80.5% accuracy on a held
out test set, although the accuracy greatly varies based on how far through
the game you are. Can't say how well it would work in a go player.
-Justin

On Tue, Feb 23, 2016 at 7:00 AM, <computer-go-request at computer-go.org>
wrote:

> Send Computer-go mailing list submissions to
>         computer-go at computer-go.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://computer-go.org/mailman/listinfo/computer-go
> or, via email, send a message with subject or body 'help' to
>         computer-go-request at computer-go.org
>
> You can reach the person managing the list at
>         computer-go-owner at computer-go.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Computer-go digest..."
>
>
> Today's Topics:
>
>    1. Re: Congratulations to Zen! (Robert Jasiek)
>    2. Move evalution by expected value, as product of expected
>       winrate and expected points? (Michael Markefka)
>    3. Re: Move evalution by expected value, as product of expected
>       winrate and expected points? (Álvaro Begué)
>    4. Re: Move evalution by expected value, as product of expected
>       winrate and expected points? (Robert Jasiek)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 22 Feb 2016 19:13:20 +0100
> From: Robert Jasiek <jasiek at snafu.de>
> To: computer-go at computer-go.org
> Subject: Re: [Computer-go] Congratulations to Zen!
> Message-ID: <56CB4FC0.4010801 at snafu.de>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Aja, sorry to bother you with trivialities, but how does Alphago avoid
> power or network failures and such incidents?
>
> --
> robert jasiek
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 23 Feb 2016 11:36:57 +0100
> From: Michael Markefka <michael.markefka at gmail.com>
> To: computer-go at computer-go.org
> Subject: [Computer-go] Move evalution by expected value, as product of
>         expected winrate and expected points?
> Message-ID:
>         <
> CAJg7PAPU_gbHvNy3Cv+D-p238_HkQkV5pOJxozjLy4nSqAsmPg at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hello everyone,
>
> in the wake of AlphaGo using a DCNN to predict expected winrate of a
> move, I've been wondering whether one could train a DCNN for expected
> territory or points successfully enough to be of some use (leaving the
> issue of win by resignation for a more in-depth discussion). And,
> whether winrate and expected territory (or points) always run in
> parallel or whether there are diverging moments.
>
> Computer Go programs play what are considered slack or slow moves when
> ahead, sometimes being too conservative and giving away too much of
> their potential advantage. If expected points and expected winrate
> diverge, this could be a way to make the programs play in a more
> natural way, even if there were no strength increase to be gained.
> Then again there might be a parameter configuration that might yield
> some advantage and perhaps this configuration would need to be
> dynamic, favoring winrate the further the game progresses.
>
>
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
>
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
>
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along
> the game is?
>
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
>
>
> Any takers for some experiments?
>
>
> -Michael
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 23 Feb 2016 06:44:04 -0500
> From: Álvaro Begué <alvaro.begue at gmail.com>
> To: computer-go <computer-go at computer-go.org>
> Subject: Re: [Computer-go] Move evalution by expected value, as
>         product of expected winrate and expected points?
> Message-ID:
>         <CAF8dVMWLPQBhD-Q07YeLZwqV9M9JCW+_VbSRVp=
> evj9CN6WAKA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I have experimented with a CNN that predicts ownership, but I found it to
> be too weak to be useful. The main difference between what Google did and
> what I did is in the dataset used for training: I had tens of thousands of
> games (I did several different experiments) and I used all the positions
> from each game (which is known to be problematic); they used 30M positions
> from independent games. I expect you can learn a lot about ownership and
> expected number of points from a dataset like that. Unfortunately,
> generating such a dataset is infeasible with the resources most of us have.
>
> Here's an idea: Google could make the dataset publicly available for
> download, ideally with the final configurations of the board as well. There
> is a tradition of making interesting datasets for machine learning
> available, so I have some hope this may happen.
>
> The one experiment I would like to make along the lines of your post is to
> train a CNN to compute both the expected number of points and its standard
> deviation. If you assume the distribution of scores is well approximated by
> a normal distribution, maximizing winning probability can be achieved by
> maximizing (expected score) / (standard deviation of the score). I wonder
> if that results in stronger or more natural play than making a direct model
> for winning probability, because you get to learn more about each position.
>
> Álvaro.
>
>
>
> On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
> michael.markefka at gmail.com> wrote:
>
> > Hello everyone,
> >
> > in the wake of AlphaGo using a DCNN to predict expected winrate of a
> > move, I've been wondering whether one could train a DCNN for expected
> > territory or points successfully enough to be of some use (leaving the
> > issue of win by resignation for a more in-depth discussion). And,
> > whether winrate and expected territory (or points) always run in
> > parallel or whether there are diverging moments.
> >
> > Computer Go programs play what are considered slack or slow moves when
> > ahead, sometimes being too conservative and giving away too much of
> > their potential advantage. If expected points and expected winrate
> > diverge, this could be a way to make the programs play in a more
> > natural way, even if there were no strength increase to be gained.
> > Then again there might be a parameter configuration that might yield
> > some advantage and perhaps this configuration would need to be
> > dynamic, favoring winrate the further the game progresses.
> >
> >
> > As a general example for the idea, let's assume we have the following
> > potential moves generated by our program:
> >
> > #1: Winrate 55%, +5 expected final points
> > #2: Winrate 53%, +15 expected final points
> >
> > Is the move with higher winrate always better? Or would there be some
> > benefit to choosing #2? Would this differ depending on how far along
> > the game is?
> >
> > If we knew the winrate prediction to be perfect, then going by that
> > alone would probably result in the best overall performance. But given
> > some uncertainty there, expected value could be interesting.
> >
> >
> > Any takers for some experiments?
> >
> >
> > -Michael
> > _______________________________________________
> > Computer-go mailing list
> > Computer-go at computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://computer-go.org/pipermail/computer-go/attachments/20160223/700a08a3/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Tue, 23 Feb 2016 12:54:22 +0100
> From: Robert Jasiek <jasiek at snafu.de>
> To: computer-go at computer-go.org
> Subject: Re: [Computer-go] Move evalution by expected value, as
>         product of expected winrate and expected points?
> Message-ID: <56CC486E.1030507 at snafu.de>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 23.02.2016 11:36, Michael Markefka wrote:
> > whether one could train a DCNN for expected territory
>
> First, some definition of territory must be chosen or stated. Second,
> you must decide if territory according to this definition can be
> determined by a neural net meaningfully at all. Third, if yes, do it.
>
> Note that there are very different definitions of territory. The most
> suitable definition for positional judgement (see Positional Judgement 1
> - Territory) is sophisticated and requires a combination of expert rules
> (specifying for what to detemine, and how to read to determine it) and
> reading.
>
> A weak definition could predict whether a particular intersections will
> be territory in the game end's scoring position. Such can be fast for MC
> or NN, and maybe such is good enough as a very rough approximation for
> programs. For humans, such is very bad because it neglects different
> degrees of safety of (potential) territory and the strategic concepts of
> sacrifice and exchange.
>
> I have also suggested other definitions, but IMO they are less
> attractive for NN.
>
> --
> robert jasiek
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
> ------------------------------
>
> End of Computer-go Digest, Vol 73, Issue 42
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20160223/b82110f5/attachment.html>


More information about the Computer-go mailing list