[Computer-go] mini-max with Policy and Value network
gcp at sjeng.org
Tue May 23 03:56:53 PDT 2017
On 23-05-17 10:51, Hideki Kato wrote:
> (2) The number of possible positions (input of the value net) in
> real games is at least 10^30 (10^170 in theory). If the value
> net can recognize all? L&Ds depend on very small difference of
> the placement of stones or liberties. Can we provide necessary
> amount of training data? Have the network enough capacity?
> The answer is almost obvious by the theory of function
> approximation. (ANN is just a non-linear function
DCNN clearly have some ability to generalize from learned data and
perform OK even with unseen examples. So I don't find this a very
compelling argument. It's not like Monte Carlo playouts are going to
handle all sequences correctly either.
Evaluations are heuristic guidance for the search, and a help when the
search terminates in an unresolved position. Having multiple independent
ones improves the accuracy of the heuristic - a basic ensemble.
> (3) CNN cannot learn exclusive-or function due to the ReLU
> activation function, instead of traditional sigmoid (tangent
> hyperbolic). CNN is good at approximating continuous (analog)
> functions but Boolean (digital) ones.
Are you sure this is correct? Especially if we allow leaky ReLU?
More information about the Computer-go