[Computer-go] mini-max with Policy and Value network

Gian-Carlo Pascutto gcp at sjeng.org
Tue May 23 03:56:53 PDT 2017


On 23-05-17 10:51, Hideki Kato wrote:
> (2) The number of possible positions (input of the value net) in 
> real games is at least 10^30 (10^170 in theory).  If the value 
> net can recognize all?  L&Ds depend on very small difference of 
> the placement of stones or liberties.  Can we provide necessary 
> amount of training data?  Have the network enough capacity?  
> The answer is almost obvious by the theory of function 
> approximation.  (ANN is just a non-linear function 
> approximator.)

DCNN clearly have some ability to generalize from learned data and
perform OK even with unseen examples. So I don't find this a very
compelling argument. It's not like Monte Carlo playouts are going to
handle all sequences correctly either.

Evaluations are heuristic guidance for the search, and a help when the
search terminates in an unresolved position. Having multiple independent
ones improves the accuracy of the heuristic - a basic ensemble.

> (3) CNN cannot learn exclusive-or function due to the ReLU 
> activation function, instead of traditional sigmoid (tangent 
> hyperbolic).  CNN is good at approximating continuous (analog) 
> functions but Boolean (digital) ones.

Are you sure this is correct? Especially if we allow leaky ReLU?

-- 
GCP


More information about the Computer-go mailing list