[Computer-go] AlphaGo Zero

Gian-Carlo Pascutto gcp at sjeng.org
Wed Oct 18 14:39:58 PDT 2017

On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I think
> to myself “Wow, this is brilliant! And I think I see the next step.”
> When I read their next paper, they show me the next *three* steps.

Hmm, interesting way of seeing it. Once they had Lee Sedol AlphaGo, it
was somewhat obvious that just self-playing that should lead to an
improved policy and value net.

And before someone accuses me of Captain Hindsighting here, this was
pointed out on this list:

It looks to me like the real devil is in the details. Don't use a
residual stack? -600 Elo. Don't combine the networks? -600 Elo.
Bootstrap the learning? -300 Elo

We made 3 perfectly reasonable choices and somehow lost 1500 Elo along
the way. I can't get over that number, actually.

Getting the details right makes a difference. And they're getting them
right, either because they're smart, because of experience from other
domains, or because they're trying a ton of them. I'm betting on all 3.


More information about the Computer-go mailing list