[computer-go] Monte-Carlo for Tactical Search

Magnus Persson magnus.persson at phmp.se
Fri Sep 1 01:49:17 PDT 2006


Quoting Darren Cook <darren at dcook.org>:

>> Global search is fine, as long as it does something more than just
>> playing out games until the end - of course, even that might be a
>> misconception of mine.
>> And while others are working to improve the search in general, I'd like
>> to think about the things needed in an evaluation function, ...
>
> This is interesting because my viewpoint is very different: I think
> global search should always search to the end of the game.

Here we might have problem with terminology. I would like to keep apart global
search from the simulations in monte carlo go. Global search searches 
positions
systematically from the root. A position is a leaf in the global search 
when all
further moves are part of independent random searches i.e the simulations.

> The reason why you *have* to search to the end of the game is that
> evaluation before then is *too* difficult.

I agree. Even if you have a program that makes a lot of local tactical search
and other traditional evaluation things correctly it is very hard to make a
correct evaluation of the global situation because 1 + 1 does not equal 2 in
go. I tried to do that for about 15 years.

Here is my "proof" why Monte Carlo go work. If a move is good against perfect
play in go, then it is probably also pretty good against a random player even
if I myself is a random player. But sometimes a bad move is good against a
random player so one cannot expect pure MC-evaluation 1ply to be solution
itself, that is the evaluation is only positively correlated with perfect
evaluation. (actually the best move against random play is almost always a bad
move, but since MC lets both players play randomly this is not a big problem.
Well perhaps can one increase the strength of an mc program with having better
moves for the opponent but still play random for the programs color. If a move
is good against a better than random player even I myslf play randomly 
then the
move must be really good...)

But global search of any evaluation-function is stronger than 1-ply search.
So global search with MC-evaluation is a good thing.

The practical problem then how good it is and how good it can become.

Right now I am thinking on: Exactly where in the global search tree should the
next simulation be played? What do we do with the information we gain 
from each
simulation to change the state of the global search. Can information gained in
the global search help to improve the simulations? At all parts of search go
knowledge can probably be applied in many ways. But how and what?

Sofar the UCT-method is very simple and elegant, but I have not yet tried to
improve it. That is Valkyria reached a rating of 1700 with a global 
search that
only prunes the moves that are pruned in the simulation, and otherwise expands
the tree without using any knowledge at all.

The go knowledge that goes into my MC-programs is certainly the same 
knowledge I
would want to have in an traditional evaluation function. Tactical move 
ordering
is the same after the last random move, as it is in traditional local tactical
search. The difference is that the are often completely different trade-offs,
so the algorithms are different.

Finally I think it is possible to make a MC-program that plays stronger 
than any
current program on 19x19, because global search does not need to be deep to be
effective *if* the evaluation part is good enough. A MC-program that goes just
1-ply deep often behaves *as if* it was searching much deeper, that is the
beauty of it.

Yet another long post but I am going offline now for the weekend.

-Magnus


More information about the computer-go mailing list