[Computer-go] mini-max with Policy and Value network
Erik van der Werf
erikvanderwerf at gmail.com
Tue May 23 04:47:59 PDT 2017
On Tue, May 23, 2017 at 10:51 AM, Hideki Kato <hideki_katoh at ybb.ne.jp>
> (1) To solve L&D, some search is necessary in practice. So, the
> value net cannot solve some of them.
> (2) The number of possible positions (input of the value net) in
> real games is at least 10^30 (10^170 in theory). If the value
> net can recognize all? L&Ds depend on very small difference of
> the placement of stones or liberties. Can we provide necessary
> amount of training data? Have the network enough capacity?
> The answer is almost obvious by the theory of function
> approximation. (ANN is just a non-linear function
A similar argument can be made for natural neural nets, but we know humans
are able to come up with reasonable solutions. I suppose a pure neural net
approach would require some form of recursion, but when combined with a
search, and rolling out the decision process to some sufficiently high
number of max steps, apparently it's not that important.. Also, I suspect
that nearly all positions can only be reached in real games by inferior
moves from both sides. All that may be needed is some crude means to steer
away from chaos (and even if one would start in chaos, humans probably
wouldn't do well either).
(3) CNN cannot learn exclusive-or function due to the ReLU
> activation function, instead of traditional sigmoid (tangent
> hyperbolic). CNN is good at approximating continuous (analog)
> functions but Boolean (digital) ones.
Are you sure about that? I can imagine using two ReLU units to construct a
sigmoid-like step function, so I'd think a multi-layer net should be fine
(just like with ordinary perceptrons).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go