[Computer-go] mini-max with Policy and Value network

Hideki Kato hideki_katoh at ybb.ne.jp
Tue May 23 08:19:01 PDT 2017


Gian-Carlo Pascutto: <a0b16b16-6591-a195-1f93-93dbe88332d3 at sjeng.org>:
>On 23-05-17 10:51, Hideki Kato wrote:
>> (2) The number of possible positions (input of the value net) in 
>> real games is at least 10^30 (10^170 in theory).  If the value 
>> net can recognize all?  L&Ds depend on very small difference of 
>> the placement of stones or liberties.  Can we provide necessary 
>> amount of training data?  Have the network enough capacity?  
>> The answer is almost obvious by the theory of function 
>> approximation.  (ANN is just a non-linear function 
>> approximator.)
>
>DCNN clearly have some ability to generalize from learned data and
>perform OK even with unseen examples. So I don't find this a very
>compelling argument. It's not like Monte Carlo playouts are going to
>handle all sequences correctly either.

CNN can generalize if global shapes can be built from smaller 
local shapes.  L&D of a large group is an exception because it's 
too sensitive for the detail of the position (ie, can be very 
global).  We can't have much expects on such generalization in 
L&D. 

By our experiments, value net thinks a group is living if it has 
a large enough space.  That's all. 
#Actually, this is an opposit.  Value net thinks a group is dead  
if and only if it has short liberties.  Some nakade shapes can be 
solved if outer libeties are almost filled.

Additionally, value net frequently thinks false eyes as true, 
especially on the first lines.  (This problem can also be very 
global and very hard to be solved with no search.)

Value net itself cannot manage L&D correctly but allows so deeper 
search that this problem is hidden (ie, hard to be known).

>Evaluations are heuristic guidance for the search, and a help when the
>search terminates in an unresolved position. Having multiple independent
>ones improves the accuracy of the heuristic - a basic ensemble.

Value net approximates "true" value function of Go very 
coarsely.  Rollouts (MC simulations) fill the detail.  This could 
be a best ensemble.

>>(3) CNN cannot learn exclusive-or function due to the ReLU 
>>activation function, instead of traditional sigmoid (tangent 
>> hyperbolic).  CNN is good at approximating continuous (analog) 
>> functions but Boolean (digital) ones.
>
>Are you sure this is correct? Especially if we allow leaky ReLU?

Do you know the success of "DEEP" CNN comes from the use of 
ReLU?  Sigmoid easily vanishes gradient while ReLU not.  However, 
ReLU cannot represent sharp edges while sigmoid can.  DCNN (with 
ReLU) approximates functions in a piece-wise-linear style.

Hideki
ReLU) approximates functions in a piece-wise-linear style.

Hideki
-- 
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>


More information about the Computer-go mailing list