[Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

"Ingo Althöfer" 3-Hirn-Verlag at gmx.de
Tue Feb 23 07:56:22 PST 2016


My 1.5 cent:

David Fotland has a nice score-estimator in his (old) ManyFaces bot.
The score estimator is still from the days before the Monte Carlo
version.

Perhaps, David can improve on this estimator with help of
CNNs.

Ingo.
 
 

Gesendet: Dienstag, 23. Februar 2016 um 16:41 Uhr
Von: "Justin .Gilmer" <jmgilmer at gmail.com>
An: computer-go at computer-go.org
Betreff: Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?

I made a similar attempt as Alvaro to predict final ownership. You can find the code here: https://github.com/jmgilmer/GoCNN/. It's trained to predict final ownership for about 15000 professional games which were played until the end (didn't end in resignation). It gets about 80.5% accuracy on a held out test set, although the accuracy greatly varies based on how far through the game you are. Can't say how well it would work in a go player. 
-Justin
 
On Tue, Feb 23, 2016 at 7:00 AM, <computer-go-request at computer-go.org[computer-go-request at computer-go.org]> wrote:Send Computer-go mailing list submissions to
        computer-go at computer-go.org[computer-go at computer-go.org]

To subscribe or unsubscribe via the World Wide Web, visit
        http://computer-go.org/mailman/listinfo/computer-go[http://computer-go.org/mailman/listinfo/computer-go]
or, via email, send a message with subject or body 'help' to
        computer-go-request at computer-go.org[computer-go-request at computer-go.org]

You can reach the person managing the list at
        computer-go-owner at computer-go.org[computer-go-owner at computer-go.org]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Computer-go digest..."


Today's Topics:

   1. Re: Congratulations to Zen! (Robert Jasiek)
   2. Move evalution by expected value, as product of expected
      winrate and expected points? (Michael Markefka)
   3. Re: Move evalution by expected value, as product of expected
      winrate and expected points? (Álvaro Begué)
   4. Re: Move evalution by expected value, as product of expected
      winrate and expected points? (Robert Jasiek)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 Feb 2016 19:13:20 +0100
From: Robert Jasiek <jasiek at snafu.de[jasiek at snafu.de]>
To: computer-go at computer-go.org[computer-go at computer-go.org]
Subject: Re: [Computer-go] Congratulations to Zen!
Message-ID: <56CB4FC0.4010801 at snafu.de[56CB4FC0.4010801 at snafu.de]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Aja, sorry to bother you with trivialities, but how does Alphago avoid
power or network failures and such incidents?

--
robert jasiek


------------------------------

Message: 2
Date: Tue, 23 Feb 2016 11:36:57 +0100
From: Michael Markefka <michael.markefka at gmail.com[michael.markefka at gmail.com]>
To: computer-go at computer-go.org[computer-go at computer-go.org]
Subject: [Computer-go] Move evalution by expected value, as product of
        expected winrate and expected points?
Message-ID:
        <CAJg7PAPU_gbHvNy3Cv+D-p238_HkQkV5pOJxozjLy4nSqAsmPg at mail.gmail.com[CAJg7PAPU_gbHvNy3Cv%2BD-p238_HkQkV5pOJxozjLy4nSqAsmPg at mail.gmail.com]>
Content-Type: text/plain; charset=UTF-8

Hello everyone,

in the wake of AlphaGo using a DCNN to predict expected winrate of a
move, I've been wondering whether one could train a DCNN for expected
territory or points successfully enough to be of some use (leaving the
issue of win by resignation for a more in-depth discussion). And,
whether winrate and expected territory (or points) always run in
parallel or whether there are diverging moments.

Computer Go programs play what are considered slack or slow moves when
ahead, sometimes being too conservative and giving away too much of
their potential advantage. If expected points and expected winrate
diverge, this could be a way to make the programs play in a more
natural way, even if there were no strength increase to be gained.
Then again there might be a parameter configuration that might yield
some advantage and perhaps this configuration would need to be
dynamic, favoring winrate the further the game progresses.


As a general example for the idea, let's assume we have the following
potential moves generated by our program:

#1: Winrate 55%, +5 expected final points
#2: Winrate 53%, +15 expected final points

Is the move with higher winrate always better? Or would there be some
benefit to choosing #2? Would this differ depending on how far along
the game is?

If we knew the winrate prediction to be perfect, then going by that
alone would probably result in the best overall performance. But given
some uncertainty there, expected value could be interesting.


Any takers for some experiments?


-Michael


------------------------------

Message: 3
Date: Tue, 23 Feb 2016 06:44:04 -0500
From: Álvaro Begué <alvaro.begue at gmail.com[alvaro.begue at gmail.com]>
To: computer-go <computer-go at computer-go.org[computer-go at computer-go.org]>
Subject: Re: [Computer-go] Move evalution by expected value, as
        product of expected winrate and expected points?
Message-ID:
        <CAF8dVMWLPQBhD-Q07YeLZwqV9M9JCW+_VbSRVp=evj9CN6WAKA at mail.gmail.com[evj9CN6WAKA at mail.gmail.com]>
Content-Type: text/plain; charset="utf-8"

I have experimented with a CNN that predicts ownership, but I found it to
be too weak to be useful. The main difference between what Google did and
what I did is in the dataset used for training: I had tens of thousands of
games (I did several different experiments) and I used all the positions
from each game (which is known to be problematic); they used 30M positions
from independent games. I expect you can learn a lot about ownership and
expected number of points from a dataset like that. Unfortunately,
generating such a dataset is infeasible with the resources most of us have.

Here's an idea: Google could make the dataset publicly available for
download, ideally with the final configurations of the board as well. There
is a tradition of making interesting datasets for machine learning
available, so I have some hope this may happen.

The one experiment I would like to make along the lines of your post is to
train a CNN to compute both the expected number of points and its standard
deviation. If you assume the distribution of scores is well approximated by
a normal distribution, maximizing winning probability can be achieved by
maximizing (expected score) / (standard deviation of the score). I wonder
if that results in stronger or more natural play than making a direct model
for winning probability, because you get to learn more about each position.

Álvaro.



On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <
michael.markefka at gmail.com[michael.markefka at gmail.com]> wrote:

> Hello everyone,
>
> in the wake of AlphaGo using a DCNN to predict expected winrate of a
> move, I've been wondering whether one could train a DCNN for expected
> territory or points successfully enough to be of some use (leaving the
> issue of win by resignation for a more in-depth discussion). And,
> whether winrate and expected territory (or points) always run in
> parallel or whether there are diverging moments.
>
> Computer Go programs play what are considered slack or slow moves when
> ahead, sometimes being too conservative and giving away too much of
> their potential advantage. If expected points and expected winrate
> diverge, this could be a way to make the programs play in a more
> natural way, even if there were no strength increase to be gained.
> Then again there might be a parameter configuration that might yield
> some advantage and perhaps this configuration would need to be
> dynamic, favoring winrate the further the game progresses.
>
>
> As a general example for the idea, let's assume we have the following
> potential moves generated by our program:
>
> #1: Winrate 55%, +5 expected final points
> #2: Winrate 53%, +15 expected final points
>
> Is the move with higher winrate always better? Or would there be some
> benefit to choosing #2? Would this differ depending on how far along
> the game is?
>
> If we knew the winrate prediction to be perfect, then going by that
> alone would probably result in the best overall performance. But given
> some uncertainty there, expected value could be interesting.
>
>
> Any takers for some experiments?
>
>
> -Michael
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org[Computer-go at computer-go.org]
> http://computer-go.org/mailman/listinfo/computer-go[http://computer-go.org/mailman/listinfo/computer-go]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20160223/700a08a3/attachment-0001.html[http://computer-go.org/pipermail/computer-go/attachments/20160223/700a08a3/attachment-0001.html]>

------------------------------

Message: 4
Date: Tue, 23 Feb 2016 12:54:22 +0100
From: Robert Jasiek <jasiek at snafu.de[jasiek at snafu.de]>
To: computer-go at computer-go.org[computer-go at computer-go.org]
Subject: Re: [Computer-go] Move evalution by expected value, as
        product of expected winrate and expected points?
Message-ID: <56CC486E.1030507 at snafu.de[56CC486E.1030507 at snafu.de]>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 23.02.2016 11:36, Michael Markefka wrote:
> whether one could train a DCNN for expected territory

First, some definition of territory must be chosen or stated. Second,
you must decide if territory according to this definition can be
determined by a neural net meaningfully at all. Third, if yes, do it.

Note that there are very different definitions of territory. The most
suitable definition for positional judgement (see Positional Judgement 1
- Territory) is sophisticated and requires a combination of expert rules
(specifying for what to detemine, and how to read to determine it) and
reading.

A weak definition could predict whether a particular intersections will
be territory in the game end's scoring position. Such can be fast for MC
or NN, and maybe such is good enough as a very rough approximation for
programs. For humans, such is very bad because it neglects different
degrees of safety of (potential) territory and the strategic concepts of
sacrifice and exchange.

I have also suggested other definitions, but IMO they are less
attractive for NN.

--
robert jasiek


------------------------------

Subject: Digest Footer

_______________________________________________
Computer-go mailing list
Computer-go at computer-go.org[Computer-go at computer-go.org]
http://computer-go.org/mailman/listinfo/computer-go

------------------------------

End of Computer-go Digest, Vol 73, Issue 42
*******************************************_______________________________________________ Computer-go mailing list Computer-go at computer-go.org http://computer-go.org/mailman/listinfo/computer-go[http://computer-go.org/mailman/listinfo/computer-go]



More information about the Computer-go mailing list