[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Brian Sheppard sheppardco at aol.com
Thu Dec 7 04:20:52 PST 2017

The conversation on Stockfish's mailing list focused on how the match was imbalanced.

- AZ's TPU hardware was estimated at several times (7 times?) the computational power of Stockfish's.
- Stockfish's transposition table size (1 GB) was considered much too small for a 64 core machine.
- Stockfish's opening book is disabled, whereas AZ has, in effect, memorized a huge opening book.
- The match was against SF 8 (one year old) rather than the latest dev version.

To this I would add that the losses of Stockfish that I played through seemed to be largely self-similar, so it is possible that Stockfish has a relatively limited number of weaknesses that AZ does not, but the format of the match amplifies the issue.

So the attitude among the SF core is pretty competitive. Which is great news for continued development.

My concern about many of these points of comparison is that they presume how AZ scales. In the absence of data, I would guess that AZ gains much less from hardware than SF. I am basing this guess on two known facts. First is that AZ did not lose a game, so the upper bound on its strength is perfection. Second, AZ is a knowledge intensive program, so it is counting on judgement to a larger degree.

But I could be wrong. Maybe AZ falls apart tactically without 80K pops. There is no data, so all WAGs are valid.

-----Original Message-----
From: Computer-go [mailto:computer-go-bounces at computer-go.org] On Behalf Of Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 4:13 AM
To: computer-go at computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 rating point edge! I think the
> Stockfish open source project improved Stockfish by ~20 rating points in
> the last year.

It's about 40-45 Elo FWIW.

> AZ would dominate the current TCEC. 

I don't think you'll get to 80 knps with a regular 22 core machine or
whatever they use. Remember that AZ hardware is about 16 x 1080 Ti's.
You'll lose that (70 - 40 = 30 Elo) advantage very, very quickly.

IMHO this makes it all the more clear how silly it is that so much
attention is given to TCEC with its completely arbitrary hardware choice.

> The Stockfish team will have some self-examination going forward for
> sure. I wonder what they will decide to do.

Probably the same the Zen team did. Ignore a large part of the result
because people's actual computers - let alone mobile phones - can't run
a neural net at TPU speeds.

The question is if resizing the network makes the resulting program more
competitive, enough to overcome the speed difference. And, aha, in which
direction are you going to try to resize? Bigger or smaller?

Computer-go mailing list
Computer-go at computer-go.org

More information about the Computer-go mailing list