[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Brian Sheppard sheppardco at aol.com
Wed Dec 6 13:29:13 PST 2017

The chess result is 64-36: a 100 rating point edge! I think the Stockfish open source project improved Stockfish by ~20 rating points in the last year. Given the number of people/computers involved, Stockfish’s annual effort level seems comparable to the AZ effort.


Stockfish is really, really tweaked out to do exactly what it does. It is very hard to improve anything about Stockfish. To be clear: I am not disparaging the code or people or project in any way. The code is great, people are great, project is great. It is really easy to work on Stockfish, but very hard to make progress given the extraordinarily fine balance of resources that already exists.  I tried hard for about 6 months last year without any successes. I tried dozens (maybe 100?) experiments, including several that were motivated by automated tuning or automated searching for opportunities. No luck.


AZ would dominate the current TCEC. Stockfish didn’t lose a game in the semi-final, failing to make the final because of too many draws against the weaker players.


The Stockfish team will have some self-examination going forward for sure. I wonder what they will decide to do.


I hope this isn’t the last we see of these DeepMind programs.


From: Computer-go [mailto:computer-go-bounces at computer-go.org] On Behalf Of Richard Lorentz
Sent: Wednesday, December 6, 2017 12:50 PM
To: computer-go at computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm


One chess result stood out for me, namely, just how much easier it was for AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with black (3 wins, 47 draws, 0 losses).

Maybe we should not give up on the idea of White to play and win in chess!

On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:


DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero method. 

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm 
https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf <https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=> &d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e= 

AlphaZero(Chess) outperformed Stockfish after 4 hours, 
AlphaZero(Shogi) outperformed elmo after 2 hours. 

Search is MCTS. 
AlphaZero(Chess) searches     80,000 positions/sec. 
Stockfish        searches 70,000,000 positions/sec. 
AlphaZero(Shogi) searches     40,000 positions/sec. 
elmo             searches 35,000,000 positions/sec. 

Hiroshi Yamashita 

Computer-go mailing list 
Computer-go at computer-go.org <mailto:Computer-go at computer-go.org>  
https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo <https://urldefense.proofpoint.com/v2/url?u=http-3A__computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=> &d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171206/2197532b/attachment.html>

More information about the Computer-go mailing list