[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Darren Cook darren at dcook.org
Wed Dec 6 09:57:42 PST 2017


> Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> https://arxiv.org/pdf/1712.01815.pdf

One of the changes they made (bottom of p.3) was to continuously update
the neural net, rather than require a new network to beat it 55% of the
time to be used. (That struck me as strange at the time, when reading
the AlphaGoZero paper - why not just >50%?)

The AlphaZero paper shows it out-performs AlphaGoZero, but they are
comparing to the 20-block, 3-day version. Not the 40-block, 40-day
version that was even stronger.

As papers rarely show failures, can we take it to mean they couldn't
out-perform their best go bot, do you think? If so, I wonder how hard
they tried?

In other words, do you think the changes they made from AlphaGo Zero to
Alpha Zero have made it weaker (when just viewed from the point of view
of making the strongest possible go program).

Darren


More information about the Computer-go mailing list