[Computer-go] Creating the playout NN

Petr Baudis pasky at ucw.cz
Sun Jun 12 01:39:03 PDT 2016


On Sun, Jun 12, 2016 at 10:51:37AM +0300, Petri Pitkanen wrote:
> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick <stefan.kaitschick at hamburg.de>:
> 
> > If I understood it right, the playout NN in AlphaGo was created by using
> > the same training set as the one used for the large NN that is used in the
> > tree. There would be an alternative though. I don't know if this is the
> > best source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf
> > The idea is to teach a shallow NN to mimic the outputs of a deeper net.
> > For one thing, this seems to give better results than direct training on
> > the same set. But also, more importantly, this could be done after the
> > large NN has been improved with selfplay.
> > And after that, the selfplay could be restarted with the new playout NN.
> > So it seems to me, there is real room for improvement here.
> 
> Would the expected improvement be reduced training time or improved
> accuracy?

Neither - faster runtime move scoring procedure, i.e. more board
positions scored throughout the game, plus also latency reduction
(i.e. board scoring available sooner after the move is expanded,
i.e. less playouts made without the NN scoring in the last few
moves).

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton



More information about the Computer-go mailing list