[Computer-go] Monte-carlo simulations vs. MCMC
Kahn Jonas
jonas.kahn at math.u-psud.fr
Sat Nov 2 04:07:56 PDT 2013
As RĂ©mi said, MCTS has little to do with MCMC, except in the trivial
sense where iid sampling is a degenerate Markov chain.
What is done in MCTS is that the target distribution is directly
sampled. The playout (a Markov chain, but with a role that has nothing
to do with MCMC) is just a somewhat complicated way to get this exact
sample.
In MCMC, we have no (easy) way to sample the target distribution $p$.
But we can build a Markov chain whose stationary distribution is $p$.
Typically we know the likelihood ratios P(A)/P(B) for different outcomes
(like we are three times more likely to win than to draw), but we do not
know the normalisation (does this mean I have 10% chance draw, 30% win,
or 20% draw, 60% win?).
We then run the Markov chain for long enough, and we hope that we arrive
at a point that is distributed almost as if we had sampled it directly
from $p$.
Of course this is something we will never do when the state space has
only three outcomes (loss, draw, win), since it is very easy in that
case to get the normalisation.
Maybe it helps to notice one major difference between the Markov chains
in MCTS and MCMC:
in MCMC the Markov chain lives in the same space as the final
distribution. The possible states are only loss, draw, win. In MCTS the
Markov chain lives on the whole board, and transitions are the moves...
Jonas
More information about the Computer-go
mailing list