[Computer-go] Geiringer-like Theorem for MCTS
pasky at ucw.cz
Tue Nov 8 15:37:14 PST 2011
Recently, I have noticed an interesting fresh submission on arxiv.org:
* It makes an analogy between a population of genetic algorithm and a set
of simulations performed by MCTS; gene code of an individual corresponds
to sequence of moves made in a simulation
* Translates genetic schemes to wildcards on sequences of moves (could
be used to e.g. match all simulations contributing a given RAVE value)
* Applies a crossover genetic operator on MCTS simulations (basically
move permutation and combination of moves from two simulations; e.g.
the RAVE value is described by a set of individuals with cross-over
applied such that the move is played now instead of anytime later).
Some general arguments on the validity of crossover in MCTS context
(that didn't entirely convince me) are given
* It then establishes a "Greiringer-like theorem", first considered
in the context of genetic algorithms, that gives a bound on the limit
frequency of occurence of a given scheme in "multiplied population"
(with new crossover-based simulations) based on an initial population
* This might be useful when trying to collect more complicated
move/action schemes from simulations while avoiding combinatorial
I don't think the paper is written very clearly, and this is an early
pre-print. Still, it sounds quite interesting - AIUI, it could give some
base for rigorous examination of RAVE, perhaps to introduce something
like UCB for RAVE, and the authors also promise some original
algorithms, perhaps using even more general information sharing than
RAVE (the RAVE analogies I have drawn in this mail are entirely my own
and may not be correct).
The paper looks daunting, with 54 pages pretty heavy on formalism,
but only smaller part of the paper is required to understand the result
(just not if it's correct :). I admit that I'm still chewing through the
middle of section 4, but IMHO people with strong background in machine
learning and/or theoretic analysis of genetic algorithms will have much
So if anyone managed to really understand the paper, I have a couple of
* Do you think the results feel plausible?
* Do you think it contributes anything actually immediately useful?
* How tight the bound actually is? I did not get far enough in the
paper yet to actually understand the formula.
* Is the RAVE interpretation that I outlined above correct at all?
* Any application to Go that occurs to you?
Petr "Pasky" Baudis
The goal of Computer Science is to build something that will
last at least until we've finished building it.
More information about the Computer-go