[computer-go] Monte Carlo combined with minimax search

Peter Drake drake at lclark.edu
Tue Jul 18 11:54:52 PDT 2006


It sounds like there are two major details to address: how are values  
backed up the tree and how do you choose a move (balancing  
exploration and exploitation)?

As Coulom notes, a reasonable answer to the first question is to take  
the average of the mean and the max of the node’s children. (Mean  
alone ignores the intelligent choice to be made, while max alone  
chooses the luckiest move.) If more and more runs are spent on the  
best move, this approaches max as confidence improves.

The second problem is a variant of the k-armed bandit problem.  
Several relatively simple approaches exist, including softmax. There  
is a question of what value to give unexplored branches—perhaps the  
average of the explored branches?

Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20060718/56b350f5/attachment.htm


More information about the computer-go mailing list