[Computer-go] mini-max with Policy and Value network
valkyria at phmp.se
valkyria at phmp.se
Tue May 23 09:05:54 PDT 2017
>> (3) CNN cannot learn exclusive-or function due to the ReLU
>> activation function, instead of traditional sigmoid (tangent
>> hyperbolic). CNN is good at approximating continuous (analog)
>> functions but Boolean (digital) ones.
> Are you sure about that? I can imagine using two ReLU units to
> construct a sigmoid-like step function, so I'd think a multi-layer net
> should be fine (just like with ordinary perceptrons).
No, this is incorrect. A perceptron (a single layer neural network)
cannot do XOR.
The whole point of 2+ layer networks was to overcome this basic
weakness. A two layer network with infinite number of neurons in the
layers can approximate any function.
But early on it turned out that learning was unstable and-or extremely
slow for multilayer networks so the theoretical capacity was not
Now with deep learning we know that with correct training, a lot of data
and hardware (or patience) neural networks can learn almost anything.
It is probably correct that smooth functions are easier to approximate
with a neural network, than high dimensional non-continuous functions.
I am training my networks on a single CPU thread so I have the benefit
of following the learning process of NNOdin slowly. I have seen a lot of
problems with the network but after some weeks of training they go away.
It is interesting to see how its playing style changes. For a while it
would rigididly play very local shapes but now it seems to start to take
lie and death of large groups into account. Or maybe it lets the MC
playout have more impact on the decisions made, by searching more
effectively. Some weeks ago it would barely win against gnugo, and it
won by just playing standard shapes until it got lucky. In the last
couple of days it seems to surround and cut off gnugo's groups and kill
them big as a strong player would.
So what do I want to say. So far i learned that the policy network will
blindly play whatever shapes it finds good and ignore most alternative
moves. So there is indeed a huge problem of "holes" in the policy
function. But for Odin at least I do not know which holes will be a
problem as the network matures with more learning. My plan is then to
fix holes by making the MC evaluation strong.
More information about the Computer-go