[computer-go] Heuristics for MC/UCT with all-or-nothing payouts

Eric Boesch ericboesch at gmail.com
Sun Jun 10 17:29:34 PDT 2007


On 6/10/07, Peter Drake <drake at lclark.edu> wrote:
> On Jun 10, 2007, at 7:00 AM, Eric Boesch wrote:
[..snipped...]
> > It is also easy to imagine why MC programmers in games with {1, -1}
> > payoff sets would discover that the choose-best-average-payoff
> > heuristic is not as good as the choose-the-move-with-the-most-wins
> > heuristic when selecting the best move to play. The older Mogo paper
> > mentioned some plausible reasons, and it's easy to imagine others. For
> > typical distributions, a 10-2 record is better than a 6-1 record,
> > which is better than a 1-0 record.
>
> If I recall old experimental results properly, this doesn't work as
> well as one might expect. The problem is that 5-0 looks worse than
> 51-49.

If you ever get 5-0 and 51-49 as siblings, that's a potential problem.
With vanilla MC/UCT, that will never happen (after the first of the 49
losses, that child would be ignored until every other child had lost
once), which is why the most-visited was good enough for Mogo and why
I settled for it. But now that you mention it, there's no reason to be
lazy (the heuristic only gets applied once per possible move per turn,
so you don't need simplicity for speed's sake), and with variants, you
really could get 5-0 and 51-49 as siblings. Maybe a better compromise
between most-wins and highest-winning-percentage would be to create
lower confidence bounds by negating the confidence interval term of
the UCB1 formula. Hopefully that would lead to comparisons that are at
least defensible in just about every case -- 10-1 > 2-0, 25-0 > 70-30,
20-1 > 22-2, etc., some of which are comparisons that might arise even
in vanilla MC/UCT.

> I (and others) have come up with a lot of the same ideas you have
> discussed here. It's frustrating that just about every clever idea
> either (a) has already been explored and incorporated into the strong
> programs or (b) doesn't help because UCT automatically does it to
> some degree.

As I said, I'm not claiming to have invented anything here; it's just
hard for me to keep track of where these things originated and what
the usual terms for these things are. If the repeat-winners approach
was first used by the Mogo team, then thanks for clarifying that. If
you want to claim any of the other notions I mentioned, go ahead.

As for experiencing limited improvements by tweaking the formulas, it
would hardly be surprising if there are limits to how much an MC/UCT
(or BAST or whatever) type program can be improved by just swapping
out one formula and replacing it with another one using the same
variables, since the formulas are pretty good to begin with, and the
more glaring weaknesses of vanilla MC/UCT, which the strongest MC
programs already address in other ways (though I think there is still
room for improvement in their search localization in particular, not
that I have any clear notion of how the improvement could be achieved
without jeopardizing the programs' existing strengths), can't be fixed
by just tweaking formulas.


More information about the computer-go mailing list