[Computer-go] Zero performance
gcp at sjeng.org
Fri Oct 20 23:53:32 PDT 2017
On 20/10/2017 22:48, fotland at smart-games.com wrote:
> The paper describes 20 and 40 block networks, but the section on
> comparison says AlphaGo Zero uses 20 blocks. I think your protobuf
> describes a 40 block network. That's a factor of two 😊
They compared with both, the final 5180 Elo number is for the 40 block
one. For the 20 block one, the numbers stop around 4300 Elo.
See for example:
A factor of 2 isn't much, but sure, it seems sensible to start with the
smaller one, given how intractable the problem looks right now.
> Your time looks reasonable when calculating the time to generate the
> 29M games at about 10 seconds per move. This is only the time to
> generate the input data. Do you have an estimate of the additional
> time it takes to do the training? It's probably small in comparison,
> but it might not be.
So far I've assumed that it's zero, because it can happen in parallel
and the time to generate the self-play games dominates. From the revised
hardware estimates, we can also see that the training machines used 64
GPUs, which is a lot smaller than the 1500+ TPU estimate for the
Training on the GTX 1080 Ti does 4 batches of 32 positions per second.
They use 2048 position batches, and train for 1000 batches before
checkpointing. So the GTX can produce a checkpoint every 4.5 hours .
Testing that over 400 games takes 8.5 days (400 x 200 x 9.3s).
So again, it totally bottlenecks on playing games, not on training. At
least, if the improvement is big, one needn't play the 400 games out,
but SPRT termination can be used.
 To be honest, this seems very fast - even starting from 0 such a big
network barely advances in 1000 iterations (or I misinterpreted a
training parameter). But I guess it's important to have a very fast -
learn knowledge - use new knowledge - feedback cycle.
More information about the Computer-go