[Computer-go] Creating the playout NN

Michael Markefka satorian at gmail.com
Sun Jun 12 04:22:09 PDT 2016


I don't remember the content of the paper and currently can't look at the
PDF, but one possible explanation could be that a simple model trained
directly maybe regularizes differently from one trained on the best-fit
pre-smoothed output of a deeper net. The second could perhaps offer better
local optimization and regularization at higher accuracy with equal
parameter count.
Am 12.06.2016 13:05 schrieb "Álvaro Begué" <alvaro.begue at gmail.com>:

> I don't understand the point of using the deeper network to train the
> shallower one. If you had enough data to be able to train a model with many
> parameters, you have enough to train a model with fewer parameters.
>
> Álvaro.
>
>
> On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka <
> michael.markefka at gmail.com> wrote:
>
>> Might be worthwhile to try the faster, shallower policy network as a
>> MCTS replacement if it were fast enough to support enough breadth.
>> Could cut down on some of the scoring variations that confuse rather
>> than inform the score expectation.
>>
>> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick
>> <skaitschick at gmail.com> wrote:
>> > I don't know how the added training compares to direct training of the
>> > shallow network.
>> > It's prob. not so important, because both should be much faster than the
>> > training of the deep NN.
>> > Accuracy should be slightly improved.
>> >
>> > Together, that might not justify the effort. But I think the fact that
>> you
>> > can create the mimicking NN, after the deep NN has been refined with
>> self
>> > play, is important.
>> >
>> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen <
>> petri.t.pitkanen at gmail.com>
>> > wrote:
>> >>
>> >> Would the expected improvement be reduced training time or improved
>> >> accuracy?
>> >>
>> >>
>> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick
>> >> <stefan.kaitschick at hamburg.de>:
>> >>>
>> >>> If I understood it right, the playout NN in AlphaGo was created by
>> using
>> >>> the same training set as the one used for the large NN that is used
>> in the
>> >>> tree. There would be an alternative though. I don't know if this is
>> the best
>> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf
>> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper
>> net.
>> >>> For one thing, this seems to give better results than direct training
>> on the
>> >>> same set. But also, more importantly, this could be done after the
>> large NN
>> >>> has been improved with selfplay.
>> >>> And after that, the selfplay could be restarted with the new playout
>> NN.
>> >>> So it seems to me, there is real room for improvement here.
>> >>>
>> >>> Stefan
>> >>>
>> >>> _______________________________________________
>> >>> Computer-go mailing list
>> >>> Computer-go at computer-go.org
>> >>> http://computer-go.org/mailman/listinfo/computer-go
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> Computer-go at computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> >
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > Computer-go at computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20160612/0c414b3d/attachment.html>


More information about the Computer-go mailing list