[computer-go] MC - Estimating a moves true probability of winning

Jason House jason.james.house at gmail.com
Thu Mar 1 17:53:34 PST 2007


I respond to various items below.  Sections of the original e-mail that 
I'm not responding to were completely deleted.

Jacques Basaldúa wrote:
> Hello Jason
>
> I think what you are trying to do can be done more easily.

I guess the key question is "what am I trying to do?".

In UCT, the next move to simulate is chosen based off of an estimated 
probability of winning.  Correcting bias in that estimate should lead to 
better sampling.

If one abruptly stops all simulations and picks the "best" move based on 
this estimated probability, I think this may give the optimal answer... 
choosing the move with the highest expectation of winning the game.  
It's important to note that with a peaked distribution of moves' 
probabilities of winning, the estimated probability of winning will rise 
slowly.  That means that moves with only a few simulations will never be 
chosen over moves with (a good win rate and) a lot of simulations.

> C. To compare if a move is better than another, you have
> to compare _confidence intervals_. I.e. the interval in which
> p (the unknown probability) lies computed from your
> observed p-hat, n and a desired confidence level, say 95%.
> These intervals can be computed with methods you can
> find searching for "Confidence Interval for a Binomial
> Proportion". The most used are Wilson and Agresti-Coull
> intervals. These intervals include continuity correction
> as you mention in your post. Other ways of comparing
> proportions are: The difference between proportions, the
> relative risk and the odds ratio to name a few. My "Bible"
> for this is a book called "Categorical Data Analysis" from
> Alan Agresti published by Wiley & Sons.

With lots and lots of simulations, this could lead to a prediction such 
as move a is better than move b with 95% confidence.  If a bot wants to 
prove with high confidence that the move that it has selected is better 
than all others, I suspect it may have to do lots and lots of 
simulations and would be impractical.  I think that was the same point 
you made later in your e-mail.  I welcome someone proving us wrong ;)

An alternate but related approach is "move a is better than move b with 
a p-value of xx%".  Of course, I'm also not too sure on how to use that 
result.

> > To use these results, you must make some assumption
> > about the underlying distribution of a move's probability
> > of winning.
>
> That's the good news. You don't. There is no need to
> understand what complex mechanism produces p. Only
> that: same position == same p.

If you take a good look at your tests, they will make very specific null 
hypothesis which in effect make at least some assumption about the 
underlying distributions (or try to wash away all effects with the 
central limit theorem).

>
> *Do not* expect a sound statistical analysis to tell you
> the best move, unless it is very obvious, n is immense or
> your confidence level is extremely low. But, if you are
> lucky, it will tell you what moves are clearly bad and can
> be safely (= with a given confidence) pruned out.
>

I agree with that.


More information about the computer-go mailing list