[computer-go] MC - Estimating a moves true probability of winning
Jason House
jason.james.house at gmail.com
Thu Mar 1 17:53:34 PST 2007
I respond to various items below. Sections of the original e-mail that
I'm not responding to were completely deleted.
Jacques Basaldúa wrote:
> Hello Jason
>
> I think what you are trying to do can be done more easily.
I guess the key question is "what am I trying to do?".
In UCT, the next move to simulate is chosen based off of an estimated
probability of winning. Correcting bias in that estimate should lead to
better sampling.
If one abruptly stops all simulations and picks the "best" move based on
this estimated probability, I think this may give the optimal answer...
choosing the move with the highest expectation of winning the game.
It's important to note that with a peaked distribution of moves'
probabilities of winning, the estimated probability of winning will rise
slowly. That means that moves with only a few simulations will never be
chosen over moves with (a good win rate and) a lot of simulations.
> C. To compare if a move is better than another, you have
> to compare _confidence intervals_. I.e. the interval in which
> p (the unknown probability) lies computed from your
> observed p-hat, n and a desired confidence level, say 95%.
> These intervals can be computed with methods you can
> find searching for "Confidence Interval for a Binomial
> Proportion". The most used are Wilson and Agresti-Coull
> intervals. These intervals include continuity correction
> as you mention in your post. Other ways of comparing
> proportions are: The difference between proportions, the
> relative risk and the odds ratio to name a few. My "Bible"
> for this is a book called "Categorical Data Analysis" from
> Alan Agresti published by Wiley & Sons.
With lots and lots of simulations, this could lead to a prediction such
as move a is better than move b with 95% confidence. If a bot wants to
prove with high confidence that the move that it has selected is better
than all others, I suspect it may have to do lots and lots of
simulations and would be impractical. I think that was the same point
you made later in your e-mail. I welcome someone proving us wrong ;)
An alternate but related approach is "move a is better than move b with
a p-value of xx%". Of course, I'm also not too sure on how to use that
result.
> > To use these results, you must make some assumption
> > about the underlying distribution of a move's probability
> > of winning.
>
> That's the good news. You don't. There is no need to
> understand what complex mechanism produces p. Only
> that: same position == same p.
If you take a good look at your tests, they will make very specific null
hypothesis which in effect make at least some assumption about the
underlying distributions (or try to wash away all effects with the
central limit theorem).
>
> *Do not* expect a sound statistical analysis to tell you
> the best move, unless it is very obvious, n is immense or
> your confidence level is extremely low. But, if you are
> lucky, it will tell you what moves are clearly bad and can
> be safely (= with a given confidence) pruned out.
>
I agree with that.
More information about the computer-go
mailing list