[Computer-go] Move evalution by expected value, as product of expected winrate and expected points?
michael.markefka at gmail.com
Tue Feb 23 02:36:57 PST 2016
in the wake of AlphaGo using a DCNN to predict expected winrate of a
move, I've been wondering whether one could train a DCNN for expected
territory or points successfully enough to be of some use (leaving the
issue of win by resignation for a more in-depth discussion). And,
whether winrate and expected territory (or points) always run in
parallel or whether there are diverging moments.
Computer Go programs play what are considered slack or slow moves when
ahead, sometimes being too conservative and giving away too much of
their potential advantage. If expected points and expected winrate
diverge, this could be a way to make the programs play in a more
natural way, even if there were no strength increase to be gained.
Then again there might be a parameter configuration that might yield
some advantage and perhaps this configuration would need to be
dynamic, favoring winrate the further the game progresses.
As a general example for the idea, let's assume we have the following
potential moves generated by our program:
#1: Winrate 55%, +5 expected final points
#2: Winrate 53%, +15 expected final points
Is the move with higher winrate always better? Or would there be some
benefit to choosing #2? Would this differ depending on how far along
the game is?
If we knew the winrate prediction to be perfect, then going by that
alone would probably result in the best overall performance. But given
some uncertainty there, expected value could be interesting.
Any takers for some experiments?
More information about the Computer-go