[Computer-go] Creating the playout NN

Jim O'Flaherty jim.oflaherty.jr at gmail.com
Sun Jun 12 13:55:37 PDT 2016


The purpose is to see if there is some sort of "simplification" available
to the emerged complex functions encoded in the weights. It is a typical
reductionist strategy, especially where there is an attempt to converge on
human conceptualization. Given the complexity of the nuances in Go, my
intuition says that it will show excellent improvement in short term play
at the cost of nuance in longer term play.

On Sun, Jun 12, 2016 at 6:05 AM, Álvaro Begué <alvaro.begue at gmail.com>
wrote:

> I don't understand the point of using the deeper network to train the
> shallower one. If you had enough data to be able to train a model with many
> parameters, you have enough to train a model with fewer parameters.
>
> Álvaro.
>
>
> On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka <
> michael.markefka at gmail.com> wrote:
>
>> Might be worthwhile to try the faster, shallower policy network as a
>> MCTS replacement if it were fast enough to support enough breadth.
>> Could cut down on some of the scoring variations that confuse rather
>> than inform the score expectation.
>>
>> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick
>> <skaitschick at gmail.com> wrote:
>> > I don't know how the added training compares to direct training of the
>> > shallow network.
>> > It's prob. not so important, because both should be much faster than the
>> > training of the deep NN.
>> > Accuracy should be slightly improved.
>> >
>> > Together, that might not justify the effort. But I think the fact that
>> you
>> > can create the mimicking NN, after the deep NN has been refined with
>> self
>> > play, is important.
>> >
>> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen <
>> petri.t.pitkanen at gmail.com>
>> > wrote:
>> >>
>> >> Would the expected improvement be reduced training time or improved
>> >> accuracy?
>> >>
>> >>
>> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick
>> >> <stefan.kaitschick at hamburg.de>:
>> >>>
>> >>> If I understood it right, the playout NN in AlphaGo was created by
>> using
>> >>> the same training set as the one used for the large NN that is used
>> in the
>> >>> tree. There would be an alternative though. I don't know if this is
>> the best
>> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf
>> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper
>> net.
>> >>> For one thing, this seems to give better results than direct training
>> on the
>> >>> same set. But also, more importantly, this could be done after the
>> large NN
>> >>> has been improved with selfplay.
>> >>> And after that, the selfplay could be restarted with the new playout
>> NN.
>> >>> So it seems to me, there is real room for improvement here.
>> >>>
>> >>> Stefan
>> >>>
>> >>> _______________________________________________
>> >>> Computer-go mailing list
>> >>> Computer-go at computer-go.org
>> >>> http://computer-go.org/mailman/listinfo/computer-go
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> Computer-go at computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> >
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > Computer-go at computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20160612/e923fd9b/attachment.html>


More information about the Computer-go mailing list