[computer-go] Monte Carlo combined with minimax search
Peter Drake
drake at lclark.edu
Tue Jul 18 11:54:52 PDT 2006
It sounds like there are two major details to address: how are values
backed up the tree and how do you choose a move (balancing
exploration and exploitation)?
As Coulom notes, a reasonable answer to the first question is to take
the average of the mean and the max of the node’s children. (Mean
alone ignores the intelligent choice to be made, while max alone
chooses the luckiest move.) If more and more runs are spent on the
best move, this approaches max as confidence improves.
The second problem is a variant of the k-armed bandit problem.
Several relatively simple approaches exist, including softmax. There
is a question of what value to give unexplored branches—perhaps the
average of the explored branches?
Peter Drake
Assistant Professor of Computer Science
Lewis & Clark College
http://www.lclark.edu/~drake/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20060718/56b350f5/attachment.htm
More information about the computer-go
mailing list