[Computer-go] Zero is weaker than Master!?
sligocki at gmail.com
Tue Oct 24 17:14:57 PDT 2017
Also (if I'm understanding the paper correctly) 20 blocks ~= 40 layers
because each "block" has two convolution layers:
Each residual block applies the following modules sequentially to its input:
> (1) A convolution of 256 filters of kernel size 3×3 with stride 1
> (2) Batch normalization
> (3) A rectifier nonlinearity
> (4) A convolution of 256 filters of kernel size 3×3 with stride 1
> (5) Batch normalization
> (6) A skip connection that adds the input to the block
> (7) A rectifier nonlinearity
On Tue, Oct 24, 2017 at 5:10 PM, Xavier Combelle <xavier.combelle at gmail.com>
> How is it a fair comparison if there is only 3 days of training for Zero ?
> Master had longer training no ? Moreover, Zero has bootstrap problem
> because at the opposite of Master it don't learn from expert games
> which means that it is likely to be weaker with little training.
> Le 24/10/2017 à 20:20, Hideki Kato a écrit :
> > David Silver told Master used 40 layers network in May.
> > According to new paper, Master used the same architecture
> > as Zero. So, Master used 20 blocks ResNet.
> > The first instance of Zero, 20 blocks ResNet version, is
> > weaker than Master (after 3 days training). So, with the
> > same layers (a fair comparison) Zero is weaker than
> > Master.
> > Hideki
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go