[computer-go] producing a good probability distribution over
legal moves
Don Dailey
drd at mit.edu
Thu May 17 12:07:02 PDT 2007
On Thu, 2007-05-17 at 12:17 -0400, George Dahl wrote:
> Imagine if you had a monte carlo program that took almost no time to
> run. You would use it to do "heavy" playouts for another monte carlo
> program to make it even stronger.
I tried something like this as a test with simple monte carlo. I
called it recursive monte carlo.
This was not UCT, it was just the simple statistic monte/carlo where you
choose the single move that had the most success in totally random
play-outs.
This started when I noticed that even running 1 or 2 simulations gave
far better play than random. So my idea was that instead of choosing
moves randomly in the play-outs, I would choose each move in the
playout by doing another set of play-outs - applying the same simulation
idea recursively.
I didn't expect it to be fast enough to be practical, but I thought that
at least I could see if the idea was feasible. If I'm doing 1000
play-outs, the version that does the sub-play-outs should be far
stronger than the one that didn't even if it's a lot slower.
Instead, it was much slower and played worse. I didn't pursue this any
farther, but I suspect that it should have worked. I think the problem
is that I wasn't getting fair representation of each move. The
sub-playouts were causing some moves not to get sampled much for
instance.
I am guessing that if I used such a technique for the play-out portion
of the monte-carlo search, it would be incredibly strong for a given
level if I only considered the number of play-outs of the outer-level.
- Don
More information about the computer-go
mailing list