[computer-go] Monte Carlo combined with minimax search

Chaslot G (MICC) G.Chaslot at MICC.unimaas.nl
Sat Jul 22 06:53:25 PDT 2006


> It sounds like there are two major details to address: how are values
backed up the tree and how do you choose a move (balancing exploration
and exploitation)?

 

Hi Peter,

 

In this context, I propose one algorithm for move-selection and for
back-up operation. It can be found in this paper:

http://www.cs.unimaas.nl/g.chaslot/mcscg.pdf

 

Back up operation:

 

Remi proposed a rule that makes some transition from the mean value to
the max. In this rule the weight of the mean only depends on the number
of simulations made.  

Indeed, it is clear that in the beginning the backed-up value should be
the mean, and that after a certain time, it should be the max value.

However, the number of simulations made is not always relevant to build
the transition from one to the other.

For instance, as soon as a move is proved to be better than all the
other moves, the back-up rule should be equivalent to a max, whatever
the number of simulations made. 

Another example is that if there are some very bad moves, they will low
down the mean value. Then it would be better to compute a "mean" which
considers only the good moves.

The algorithm that I propose realizes a transition between the mean and
the max, which takes into account the number of games played, and also a
"confidence" value for each move. In particular, the rule I propose
deals well with the two examples above. 

 

Move-Selection: 

The selection algorithm proposed seems to be more exploitative than
Remi's, but more explorative than UCT. I will do some comparisons soon,
on the same kind of test bed proposed in the paper, but for a deeper
depth.

Given a selection algorithm, one can easily make it more or less
exploitative, for instance by using a softmax on top of it (several
other methods are possible). But until which point this method would
enhance the level of the program?

 

I am currently working on the integration of domain-dependant knowledge
into the Monte-Carlo games (patterns mostly, and also some Atari
heuristic like these proposed by Remi). After that I will do some
experiments on increasing or decreasing the exploration-exploitation of
different move-selection algorithms. I will also be working on improving
the "confidence value" used by the back up strategy.

 

Best,

 

Guillaume

 

 

-----Original Message-----
From: computer-go-bounces at computer-go.org
[mailto:computer-go-bounces at computer-go.org] On Behalf Of Peter Drake
Sent: Tuesday, July 18, 2006 8:55 PM
To: computer-go
Subject: Re: [computer-go] Monte Carlo combined with minimax search

 

It sounds like there are two major details to address: how are values
backed up the tree and how do you choose a move (balancing exploration
and exploitation)?

As Coulom notes, a reasonable answer to the first question is to take
the average of the mean and the max of the node's children. (Mean alone
ignores the intelligent choice to be made, while max alone chooses the
luckiest move.) If more and more runs are spent on the best move, this
approaches max as confidence improves.

The second problem is a variant of the k-armed bandit problem. Several
relatively simple approaches exist, including softmax. There is a
question of what value to give unexplored branches-perhaps the average
of the explored branches?

Peter Drake

Assistant Professor of Computer Science

Lewis & Clark College

http://www.lclark.edu/~drake/

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20060722/1b237c56/attachment.htm


More information about the computer-go mailing list