Re: [computer-go] Simplified MC evaluator ¿explained?

Jason House jason.james.house at gmail.com
Sat Apr 7 08:20:52 PDT 2007


Jacques Basaldúa wrote:
> Daniel Liu wrote:
>
>> An imperfect evaluation has errors. Is the exact value of the error 
>> known? No.
>
> I have an idea on that I will try to explain:
>
> Given any finite combinatorial game where the ending nodes
> have two possible values: win/loss, any node has a "winning rate" (I 
> ignore if there is a better name for that.) defined
> as: (# of subpaths ending in win)/(# of total subpaths)
> Let's call it W.
>
> (This winning rate can be a simplified version of what
> Erik calls "underlying ground-truths".)
>
> Assuming two simplifying hypotheses:
>
> 1. The playouts are uniformly random.
> 2. Both players have the same number of legal moves (or any unbalanced 
> numbers compensate in the long term).
>
> The unknown probability p of wining a random playout is
> a function of W.
>
> The _observed_ proportion after n random experiments, p-hat
> is an _unbiased_ estimator of p whose confidence intervals
> are the intervals for a binomial proportion as stated earlier
> in this list.
>
> Abusing of language, we can call p and estimator of W. (It is
> not really an estimator because it cannot be computed directly.)
>
> Now, the most interesting part: p is a _biased_ "estimator" of W
> and it is biased towards 1/2 as long as the expected value of the 
> noise is zero (= playouts are not "biased"). The higher the
> noise (or _the longer the playout_) the more biased it is.
>
> In short:
> 1. We measure p-hat which is an unbiased estimator of p with
> known error distribution.
>
> 2. p is a biased towards 1/2 estimator of W. Knowing the variance (or 
> replacing it by an estimator measured from experiments), we can model 
> the bias.
>
> Simplifying to understand why it is biased toward 1/2
> add random noise distributed as N(0, e) to p.
>
> With small noise:  p + N(0, 0.01) gives very similar results to p
>
> With big noise:  p + N(0, 100) gives very similar results to 1/2 for 
> any p in [0,1]
>
> This is a rather theoretical post of small practical use, but
> it helps explaining the effect of the longer playouts => higher 
> variance => more biass towards 1/2.
>
> Jacques.
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
I don't understand your post.
bias = E(p_hat) - p = E(p+N) - p = E(p)+E(N) - p = p + 0 - p = 0

I'm thinking that maybe you're clipping p+N to [0,1]?  Maybe my biggest 
confusion is how you're actually arriving at p+N in a meaningful way.  A 
single MC playout corresponds to a bernouli trial with probability p.  
Even with many trials, the noise becomes binomial, and asymptotically 
approaches the normal distribution.


More information about the computer-go mailing list