[Computer-go] PUCT formula

Gian-Carlo Pascutto gcp at sjeng.org
Fri Mar 9 01:47:52 PST 2018


On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall the
> result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but
after tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the
exploration term by the policy prior forever. In the original formula,
it is a diminishing term.

-- 
GCP


More information about the Computer-go mailing list