[Computer-go] Monte-Carlo Tree Search in other games
Olivier Teytaud
olivier.teytaud at lri.fr
Wed Nov 3 04:08:38 PDT 2010
> Do you have an example of such games, and how to use MCTS for it? I have
> trouble picturing it.
>
I hope I can write it clearly :-)
For the application on a specific game, as I am not the only one involved in
that,
I'll wait for the "official" publication of this (probably soon) - there's
such a game in which humans are stronger than computers
and for which our MCTS performs well (not yet compared against the best
humans, but such games will come soon). The game
is an internet game with ELO scale, unfortunately it's less famous than Go
(by far!), but it's a nice challenge (and mathematically,
it is a good opportunity as an example of game with simultaneous actiosn -
or, equivalently, as an example of game in which there
is hidden information which becomes visible after a small number of time
steps; but maybe I should not discuss too many things in the same post :-)
). Whereas games with hidden information lead to huge complexity classes,
games with hidden information which becomes periodically visible (with small
period...), are tractable (I do not write here the complexity classes
involved, as it depends on whether we consider succinct representations or
not, but let's forget computational complexity for the moment, it's
sufficiently complicated without that :-) )
For the use of MCTS in games with simultaneous actions:
- I guess you know what is a Nash equilibrium
- it works for matrix games, but also for stochastic games with finitely
many strategies (Nash equilibria do not necessarily exist
in the case of infinitely many strategies, even in the zero-sum case)
- you can use the concept of Nash for a one-player game (just consider that
the opponent has only one strategy)
- then in a game represented as a tree, you can have
* nodes in which one player takes a decison
* random nodes (not a problem for MCTS / UCT)
* nodes in which the two players take a decision (simultaneously,
privately)
- and:
* simple MCTS rules (argmax of a simple score like
(nbwins+1)/(nbSims+2) ) if the rewards are discrete (e.g. 0, 0.5 and 1)
and the optimal strategies lead to deterministic rewards;
in my humble opinion it's better than UCB1 in such cases
* UCB1 finds a Nash of a single player game and is better than rules
above (I think, not intensively tested...) in cases with noise
(whatever maybe the source of this noise - stochastic game,
or mixed optimal strategies as one might meet in games with
simultaneous actions or partial information even if the game
is deterministic).
* EXP3 finds a Nash of a two player game (and INF is a bit faster,
removes the log term in the number of iterations,
see Audibert&Bubeck's paper) ==> here the Nash is usually
mixed and EXP3-like algorithms find it (see the paper
by Grigoriadis et al in 1994 also for such a bandit
algorithm).
==> so you can build a MCTS with
- UCB1 as a bandit for nodes which are above nodes with noise (either
noise from the game, if the game
is stochastic, or stochasticity due to simultaneous actions later in
the game (with simultaneous actions Nash equilibria are usually mixed)),
- EXP3 or INF for nodes with simultaneous actions (and here, you should
NOT use the maximum reward for choosing your move - your
solution should be stochastic, with proba proportional to empirical
frequencies, as in fictitious play algorithms)
(variants are possible for games with hidden information also, but it's more
complicated and I'm not aware of tests, I'm on the simultaneous case for the
moment :-) )
I hope this is nearly (or, even better, completely :-) ) clear...
Best regards,
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20101103/4377c63e/attachment.html>
More information about the Computer-go
mailing list