[computer-go] Abstract analysis of Monte Carlo playout
Erik van der Werf
erikvanderwerf at gmail.com
Sat Jul 28 02:23:08 PDT 2007
Hi Antti,
I had a quick look at your numbers. Maybe I misunderstood something,
but at first glance there appears to be a parity effect (an even
number of 100% blunder moves always get it right).
How do the statistics look if the game length is odd?
If it matters, maybe you should sample over a reasonable distribution
of game lengths or otherwise just average odd and even.
Erik
On 7/28/07, Antti Huima <antti.huima at conformiq.com> wrote:
>
> Hi,
>
> there was some time ago discussion about whether it pays off to improve the
> quality of an MC play-out agent or not, and how important it is to keep it
> "balanced", so I performed the following abstract experiment:
>
> Assume that we start from a position that is game-theoretic win for Black.
> If we play out moves from this position--say for instance 100--then every
> move can either switch the game-theoretic value of the present position
> (blunder) or not (in general a correct move). Of course the only way to
> switch the game-theoretic value by a player is by blundering in a position
> that is won for the player and ending up in a position that is lost for the
> same player: there is no way to "blunder" a lost position into a won one.
>
> I implemented a simple C program that calculates the probability of ending
> up with correct game-theoretic value at the end of the simulation when the
> probability of blundering, when possible, is given as a function of the move
> number. Here are some results (explanations below):
>
> Game length 100, simulations 1000000
> 0% flat | 100.00%
> 1% flat | 99.01%
> 2% flat | 98.04%
> 5% flat | 95.27%
> 10% flat | 90.97%
> 20% flat | 83.28%
> 50% flat | 66.74%
> 80% flat | 55.59%
> 90% flat | 52.65%
> 95% flat | 51.58%
> 98% flat | 57.02%
> 99% flat | 68.53%
> 99.5% flat | 80.37%
> 99.8% flat | 90.97%
> 99.9% flat | 95.21%
> 100% flat | 100.00%
> Linear ramp up | 50.17%
> Linear ramp down | 99.03%
> Squared ramp up | 50.17%
> Squared ramp down | 99.99%
> Squared ramp up, inverted | 98.09%
> Squared ramp down, inverted | 49.97%
> Spike | 0.00%
> Spike with 10%/10% noise | 52.30%
> Spike with 10%/0% noise | 9.95%
> Spike with 0%/10% noise | 52.34%
>
> Each row represents one million play-outs. The left column is the
> probability function (how probable it is to blunder) and the right column is
> the probability that we get the "correct" result at the end of a play-out.
> Here are the descriptions of the functions:
>
> - N% flat means that the move is correct with probability N% and a blunder
> with probability (1-N%), when possible (you can't blunder if you are in a
> lost position)
> - Linear ramp up means that the probability is 100% * (k/N) where k is the
> move number, i.e. moves tend to get better and better by the end of the game
> - Linear ramp down is 100% * (1-k/N), i.e. inverted
> - Squared ramp up is 100% * (k/N)^2
> - Squared ramp down is 100% * (1 - (k/N)^2)
> - Squared ramp up and down inverted are obtained by 100% - X where X is the
> squared ramp
> - Spike means that black makes one blunder in the middle but all other
> moves are correct
> - Spike 10%/10% noise is 10% correct move in the middle move and 90%
> elsewhere
> - Spike 10%/0% noise is 10% correct move in the middle and 100% elsewhere
> - Spike 0%/10% noise is 0% correct move in the middle and 90% elsewhere
>
> And here some analysis:
>
> - Obviously a move generated that blunders always with probability 1/2 when
> possible is a great basis for MC analysis because it ends up with correct
> game-theoretic value with 67% probability
>
> - It is also obvious that of the ones sampled above, the worst probability
> patterns are rising ramps, i.e. playout agents that play badly in the
> beginning but get better and better towards the end of the game. For these
> agents the end result is basically just random noise. The reason is, I
> believe, that first both players blunder all the time and the game-theoretic
> value remains always won for Black (two blunders --> Black still winning),
> but when the blunder probability starts to drop, first the result becomes
> more or less random, and then the dropping probability "locks" the
> game-theoretic value to the random value.
>
> Finally, to those who question these numbers, here some intuitive
> explanation of the mechanics behind:
>
> Suppose you play correctly with probability 50% and you start with Black's
> move from a position that is win for Black.
>
> With probability 50% you play correct, White answers whatever, but you have
> still a won position (White cannot turn lost position into won by playing a
> move.)
>
> With probability 50% you play incorrect, and the position is now won for
> White. But White also blunders now with probability 50%, so you get another
> 25% probability to have won position after the two plys.
>
> So even though you the playout agent has only 50% probability of playing
> correctly, the probability that after 2 plys the position is still won is
> 75%!
>
> All the best,
>
>
> --
> Antti Huima
>
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list