[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Petr Baudis pasky at ucw.cz
Wed Dec 6 12:19:52 PST 2017


On Wed, Dec 06, 2017 at 09:57:42AM -0800, Darren Cook wrote:
> > Mastering Chess and Shogi by Self-Play with a General Reinforcement
> > Learning Algorithm
> > https://arxiv.org/pdf/1712.01815.pdf
> 
> One of the changes they made (bottom of p.3) was to continuously update
> the neural net, rather than require a new network to beat it 55% of the
> time to be used. (That struck me as strange at the time, when reading
> the AlphaGoZero paper - why not just >50%?)

  Yes, that also struck me.  I think it's good news for the community to
see it reported that this works, as it makes the training process much
more straightforward.  They also use just 800 simulations, another good
news.  (Both were one of the first tradeoffs I made in Nochi.)

  Another interesting tidbit: they use the TPUs to also generate the
selfplay games.

> The AlphaZero paper shows it out-performs AlphaGoZero, but they are
> comparing to the 20-block, 3-day version. Not the 40-block, 40-day
> version that was even stronger.
> 
> As papers rarely show failures, can we take it to mean they couldn't
> out-perform their best go bot, do you think? If so, I wonder how hard
> they tried?

  IMHO the most likely explanation is that this research has been going
on for a while and when they started in this direction, that early
version was their state-of-art baseline.  This kind of chronology, with
the 40-block version being almost "a last-minute addition", is imho
apparent even in the text of the Nature paper.

  Also, the 3-day version simply had roughly similar training time
available as AlphaZero did.

-- 
					Petr Baudis, Rossum
	Run before you walk! Fly before you crawl! Keep moving forward!
	If we fail, I'd rather fail really hugely.  -- Moist von Lipwig


More information about the Computer-go mailing list