[Computer-go] Move evalution by expected value, as product of expected winrate and expected points?
dave.devos at planet.nl
dave.devos at planet.nl
Tue Feb 23 12:36:27 PST 2016
If you accumulate end scores of playout results, you can make a histogram by plotting the frequency of a score f(s) as a function of the score. The winrate is the sum(f(s)) where s > 0. The average score is sum(s * f(s)) / sum(s) summed over all s.
When the distibution can be approximated by a normal distribution, it may not matter much whether you choose to maximize winrate or average score.
But in general, the distribution could be a multimodal distribution (in fact I think it always is, unless you solved the game à la Conway). In that case, the average score may not be a very reliable representation of the situation. For example, having a 99% chance of losing by 0.5 points combined with a 1% chance of winning by 100 points might give you the impression that are winning by 0.5 points (which would be the average score), while in reality you have only a 1% of winning (which would be the winrate).
Dave de Vos
Van : alvaro.begue at gmail.com
Datum : 23/02/2016 12:44
Aan : computer-go at computer-go.org
Onderwerp : Re: [Computer-go] Move evalution by expected value, as product of expected winrate and expected points?
I have experimented with a CNN that predicts ownership, but I found it to be too weak to be useful. The main difference between what Google did and what I did is in the dataset used for training: I had tens of thousands of games (I did several different experiments) and I used all the positions from each game (which is known to be problematic); they used 30M positions from independent games. I expect you can learn a lot about ownership and expected number of points from a dataset like that. Unfortunately, generating such a dataset is infeasible with the resources most of us have.
Here's an idea: Google could make the dataset publicly available for download, ideally with the final configurations of the board as well. There is a tradition of making interesting datasets for machine learning available, so I have some hope this may happen.
The one experiment I would like to make along the lines of your post is to train a CNN to compute both the expected number of points and its standard deviation. If you assume the distribution of scores is well approximated by a normal distribution, maximizing winning probability can be achieved by maximizing (expected score) / (standard deviation of the score). I wonder if that results in stronger or more natural play than making a direct model for winning probability, because you get to learn more about each position.
On Tue, Feb 23, 2016 at 5:36 AM, Michael Markefka <michael.markefka at gmail.com> wrote:
in the wake of AlphaGo using a DCNN to predict expected winrate of a
move, I've been wondering whether one could train a DCNN for expected
territory or points successfully enough to be of some use (leaving the
issue of win by resignation for a more in-depth discussion). And,
whether winrate and expected territory (or points) always run in
parallel or whether there are diverging moments.
Computer Go programs play what are considered slack or slow moves when
ahead, sometimes being too conservative and giving away too much of
their potential advantage. If expected points and expected winrate
diverge, this could be a way to make the programs play in a more
natural way, even if there were no strength increase to be gained.
Then again there might be a parameter configuration that might yield
some advantage and perhaps this configuration would need to be
dynamic, favoring winrate the further the game progresses.
As a general example for the idea, let's assume we have the following
potential moves generated by our program:
#1: Winrate 55%, +5 expected final points
#2: Winrate 53%, +15 expected final points
Is the move with higher winrate always better? Or would there be some
benefit to choosing #2? Would this differ depending on how far along
the game is?
If we knew the winrate prediction to be perfect, then going by that
alone would probably result in the best overall performance. But given
some uncertainty there, expected value could be interesting.
Any takers for some experiments?
Computer-go mailing list
Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go