[Computer-go] AGZ Policy Head

David Wu lightvector at gmail.com
Fri Dec 29 09:27:23 PST 2017


As far as a purely convolutional approach, I think you *can* do better by
adding some global connectivity.

Generally speaking, there should be some value in global connectivity for
things like upweighting the probability of playing ko threats anywhere on
the board when there is an active ko anywhere else on the board. If you
made the whole neural net purely convolutional, then of course with enough
convolutional layers the neural net could still learn to distribute this
the "there is an important ko on the board" property everywhere, but it
would take more many layers.

I've actually experimented with this recently in training my own policy net
- for example one approach is to have an special residual block just before
the policy head:
* Compute a convolution (1x1 or 3x3) of the trunk with C channels for a
small C, result shape 19x19xC.
* Average-pool the results down to 1x1xC.
* Multiply by CxN matrix to turn that into 1x1xN where N is the number of
channels in the main trunk of the resnet, broadcast up to 19x19xN, and add
back into the main trunk (e.g. skip connection).
Apply your favorite activation function at appropriate points in the above.

There are other possible architectures for this block too, I actually did
something a bit more complicated but still pretty similar. Anyways, it
turns out that when I visualize the activations on example game situations,
I find the that the neural net actually does use one of the C channels for
"is there a ko fight" which makes it predict ko threats elsewhere on the
board! Some of the other average-pooled channels appear to be used for
things like detecting game phase (how full is the board?), and detecting
who is ahead (perhaps to decide when to play risky or safe - it's
interesting that the neural net has decided this is important given that
it's a pure policy net and is trained to predict only moves, not values).

Anyways, for AGZ's case, it seems weird to only have 2 filters feeding into
the fully connected, that seems like too few to encode much useful logic
like this. I'm also mystified at this architecture.


On Fri, Dec 29, 2017 at 7:50 AM, Rémi Coulom <remi.coulom at free.fr> wrote:

> I also wonder about this. A purely convolutional approach would save a lot
> of weights. The output for pass can be set to be a single bias parameter,
> connected to nothing. Setting pass to a constant might work, too. I don't
> understand the reason for such a complication.
>
> ----- Mail original -----
> De: "Andy" <andy.olsen.tx at gmail.com>
> À: "computer-go" <computer-go at computer-go.org>
> Envoyé: Vendredi 29 Décembre 2017 06:47:06
> Objet: [Computer-go] AGZ Policy Head
>
>
>
> Is there some particular reason AGZ uses two 1x1 filters for the policy
> head instead of one?
>
>
> They could also have allowed more, but I guess that would be expensive? I
> calculate that the fully connected layer has 2*361*362 weights, where 2 is
> the number of filters.
>
>
> By comparison the value head has only a single 1x1 filter, but it goes to
> a hidden layer of 256. That gives 1*361*256 weights. Why not use two 1x1
> filters here? Maybe since the final output is only a single scalar it's not
> needed?
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171229/e4ae41d5/attachment.html>


More information about the Computer-go mailing list