[Computer-go] AlphaGo Zero
alvaro.begue at gmail.com
Wed Oct 18 17:37:34 PDT 2017
It might be a mistake, but on page 30 the paper has a formula for Elo that
is off by a factor of log(10) = 2.3026 with respect to the standard
formula, which means their Elo differences might be inflated. But I suspect
they just meant to have used "10^" instead of "exp" on the paper, and they
probably computed Elo correctly.
On Wed, Oct 18, 2017 at 5:39 PM, Gian-Carlo Pascutto <gcp at sjeng.org> wrote:
> On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> > This paper is required reading. When I read this team’s papers, I think
> > to myself “Wow, this is brilliant! And I think I see the next step.”
> > When I read their next paper, they show me the next *three* steps.
> Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it
> was somewhat obvious that just self-playing that should lead to an
> improved policy and value net.
> And before someone accuses me of Captain Hindsighting here, this was
> pointed out on this list:
> It looks to me like the real devil is in the details. Don't use a
> residual stack? -600 Elo. Don't combine the networks? -600 Elo.
> Bootstrap the learning? -300 Elo
> We made 3 perfectly reasonable choices and somehow lost 1500 Elo along
> the way. I can't get over that number, actually.
> Getting the details right makes a difference. And they're getting them
> right, either because they're smart, because of experience from other
> domains, or because they're trying a ton of them. I'm betting on all 3.
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go