[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Brian Sheppard sheppardco at aol.com
Wed Dec 6 14:46:51 PST 2017


Requiring a margin > 55% is a defense against a random result. A 55% score in a 400-game match is 2 sigma.

But I like the AZ policy better, because it does not require arbitrary parameters. It also improves more fluidly by always drawing training examples from the current probability distribution, and when the program is close to perfect you would be able to capture the lest 5% of skill.

I am not sure what to make of the AZ vs AGZ result. Mathematically, there should be a degree of training sufficient for AZ to exceed any fixed level of skill, such as AGZ's 40/40 level. So there must be a reason why DeepMind did not report such a result, but it unclear what that is.

-----Original Message-----
From: Computer-go [mailto:computer-go-bounces at computer-go.org] On Behalf Of Darren Cook
Sent: Wednesday, December 6, 2017 12:58 PM
To: computer-go at computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

> Mastering Chess and Shogi by Self-Play with a General Reinforcement 
> Learning Algorithm https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update the neural net, rather than require a new network to beat it 55% of the time to be used. (That struck me as strange at the time, when reading the AlphaGoZero paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing to the 20-block, 3-day version. Not the 40-block, 40-day version that was even stronger.

As papers rarely show failures, can we take it to mean they couldn't out-perform their best go bot, do you think? If so, I wonder how hard they tried?

In other words, do you think the changes they made from AlphaGo Zero to Alpha Zero have made it weaker (when just viewed from the point of view of making the strongest possible go program).

Darren
_______________________________________________
Computer-go mailing list
Computer-go at computer-go.org
http://computer-go.org/mailman/listinfo/computer-go



More information about the Computer-go mailing list