[Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Gian-Carlo Pascutto gcp at sjeng.org
Thu Dec 7 01:04:42 PST 2017

On 06-12-17 21:19, Petr Baudis wrote:
> Yes, that also struck me.  I think it's good news for the community
> to see it reported that this works, as it makes the training process
> much more straightforward.  They also use just 800 simulations,
> another good news.  (Both were one of the first tradeoffs I made in
> Nochi.)

The 800 simulations are a setting that works over all 3 games. It's not
necessarily as good for 19x19 Go (more legal moves than the other games,
so less deep trees).

As for both the lack of testing and this parameter, someone has remarked
on github that the DeepMind hardware is fixed, so this also represents a
tuning between the speed of the learning machine and the speed of the
self-play machines.

In my experience, just continuing to train the network further (when no
new data is batched in) often regresses the performance by 200 or more
Elo. So it's not clear this step is *entirely* ignorable unless you have
already tuned the speed of the other two aspects.

> Another interesting tidbit: they use the TPUs to also generate the 
> selfplay games.

I think this was already known.


More information about the Computer-go mailing list