[Computer-go] mini-max with Policy and Value network

valkyria at phmp.se valkyria at phmp.se
Tue May 23 09:05:54 PDT 2017


>> (3) CNN cannot learn exclusive-or function due to the ReLU
>> activation function, instead of traditional sigmoid (tangent
>> hyperbolic).  CNN is good at approximating continuous (analog)
>> functions but Boolean (digital) ones.
> 
> Are you sure about that? I can imagine using two ReLU units to
> construct a sigmoid-like step function, so I'd think a multi-layer net
> should be fine (just like with ordinary perceptrons).

No, this is incorrect. A perceptron (a single layer neural network) 
cannot do XOR.
The whole point of 2+ layer networks was to overcome this basic 
weakness. A two layer network with infinite number of neurons in the 
layers can approximate any function.

But early on it turned out that learning was unstable and-or extremely 
slow for multilayer networks so the theoretical capacity was not 
practical.

Now with deep learning we know that with correct training, a lot of data 
and hardware (or patience) neural networks can learn almost anything.

It is probably correct that smooth functions are easier to approximate 
with a neural network, than high dimensional non-continuous functions.

I am training my networks on a single CPU thread so I have the benefit 
of following the learning process of NNOdin slowly. I have seen a lot of 
problems with the network but after some weeks of training they go away. 
It is interesting to see how its playing style changes. For a while it 
would rigididly play very local shapes but now it seems to start to take 
lie and death of large groups into account. Or maybe it lets the MC 
playout have more impact on the decisions made, by searching more 
effectively. Some weeks ago it would barely win against gnugo, and it 
won by just playing standard shapes until it got lucky. In the last 
couple of days it seems to surround and cut off gnugo's groups and kill 
them big as a strong player would.

So what do I want to say. So far i learned that the policy network will 
blindly play whatever shapes it finds good and ignore most alternative 
moves. So there is indeed a huge problem of "holes" in the policy 
function. But for Odin at least I do not know which holes will be a 
problem as the network matures with more learning. My plan is then to 
fix holes by making the MC evaluation strong.

Best
Magnus


More information about the Computer-go mailing list