[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Brian Sheppard sheppardco at aol.com
Thu Dec 7 18:38:16 PST 2017


AZ scalability looks good in that diagram, and it is certainly a good start, but it only goes out through 10 sec/move. Also, if the hardware is 7x better for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the curve for SF by 7x? Or some combination? Or take the data at face value?

I just noticed that AZ has some losses when the opening was forced into specific variations as in Table 2. So we know that AZ is not perfect, but 19 losses in 1200 games is hard to extrapolate. (Curious: SF was a net winner over AZ with White in a B40 Sicilian, the only position/color combination out of 24 in which SF had an edge.)

-----Original Message-----
From: Computer-go [mailto:computer-go-bounces at computer-go.org] On Behalf Of Rémi Coulom
Sent: Thursday, December 7, 2017 11:51 AM
To: computer-go at computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

>My concern about many of these points of comparison is that they presume how AZ scales. In the absence of data, I would guess that AZ gains much less from hardware than SF. I am basing this guess on >two known facts. First is that AZ did not lose a game, so the upper bound on its strength is perfection. Second, AZ is a knowledge intensive program, so it is counting on judgement to a larger degree.

Doesn't Figure 2 in the paper indicate convincingly that AZ scales better than Stockfish?

Rémi
_______________________________________________
Computer-go mailing list
Computer-go at computer-go.org
http://computer-go.org/mailman/listinfo/computer-go



More information about the Computer-go mailing list