[computer-go] Heuristics for MC/UCT with all-or-nothing payouts

Peter Drake drake at lclark.edu
Sun Jun 10 09:57:33 PDT 2007


On Jun 10, 2007, at 7:00 AM, Eric Boesch wrote:

> The UCT heuristic of trying every child of a node once before trying
> any child twice is reasonable when the payoff distribution is unknown.
> Why try the lever that paid $5 a second time if there might be another
> lever that pays $1,000,000? But when the set of possible payoffs is
> known to be {1, -1} -- you either win or you lose -- it makes no sense
> to abandon a move with a perfect winning record.

I believe you automatically stick with always-winning moves if you  
give untried moves a value of 1.0, as Gelly et al did in their 2006  
paper (table 7). An always-winning move has a UCT value of 1.0 plus  
the radius of the confidence interval, so it will be retried before  
trying a new move.

> Before doing that, it might make sense to expend some effort on
> reexamining the most promising candidates so far, to sort the really
> good candidates from the ones that started out on a lucky streak and
> to sort the robust moves from the ones that only work against inferior
> replies. As a dead-simple heuristic, one might, when there some

Some degree of this comes for free by not creating a child for a node  
until it has a certain number of runs (e.g., the board area).

> It is also easy to imagine why MC programmers in games with {1, -1}
> payoff sets would discover that the choose-best-average-payoff
> heuristic is not as good as the choose-the-move-with-the-most-wins
> heuristic when selecting the best move to play. The older Mogo paper
> mentioned some plausible reasons, and it's easy to imagine others. For
> typical distributions, a 10-2 record is better than a 6-1 record,
> which is better than a 1-0 record.

If I recall old experimental results properly, this doesn't work as  
well as one might expect. The problem is that 5-0 looks worse than  
51-49.

I (and others) have come up with a lot of the same ideas you have  
discussed here. It's frustrating that just about every clever idea  
either (a) has already been explored and incorporated into the strong  
programs or (b) doesn't help because UCT automatically does it to  
some degree.

Peter Drake
http://www.lclark.edu/~drake/




More information about the computer-go mailing list