[Computer-go] Zero performance

Gian-Carlo Pascutto gcp at sjeng.org
Fri Oct 20 23:53:32 PDT 2017

On 20/10/2017 22:48, fotland at smart-games.com wrote:
> The paper describes 20 and 40 block networks, but the section on
> comparison says AlphaGo Zero uses 20 blocks. I think your protobuf
> describes a 40 block network. That's a factor of two 😊

They compared with both, the final 5180 Elo number is for the 40 block
one. For the 20 block one, the numbers stop around 4300 Elo.
See for example:


A factor of 2 isn't much, but sure, it seems sensible to start with the
smaller one, given how intractable the problem looks right now.

> Your time looks reasonable when calculating the time to generate the
> 29M games at about 10 seconds per move. This is only the time to
> generate the input data. Do you have an estimate of the additional
> time it takes to do the training? It's probably small in comparison,
> but it might not be.

So far I've assumed that it's zero, because it can happen in parallel
and the time to generate the self-play games dominates. From the revised
hardware estimates, we can also see that the training machines used 64
GPUs, which is a lot smaller than the 1500+ TPU estimate for the
self-play machines.

Training on the GTX 1080 Ti does 4 batches of 32 positions per second.
They use 2048 position batches, and train for 1000 batches before
checkpointing. So the GTX can produce a checkpoint every 4.5 hours [1].
Testing that over 400 games takes 8.5 days (400 x 200 x 9.3s).

So again, it totally bottlenecks on playing games, not on training. At
least, if the improvement is big, one needn't play the 400 games out,
but SPRT termination can be used.

[1] To be honest, this seems very fast - even starting from 0 such a big
network barely advances in 1000 iterations (or I misinterpreted a
training parameter). But I guess it's important to have a very fast -
learn knowledge - use new knowledge - feedback cycle.


More information about the Computer-go mailing list