[Computer-go] CNN for winrate and territory

Detlef Schmicker ds2 at physik.de
Sun Feb 8 02:22:29 PST 2015


Hi,

I am working on a CNN for winrate and territory:

approach:
  - input 2 layers for b and w stones
  - 1. output: 1 layer territory (0.0 for owned by white, 1.0 for owned 
by black (because I missed TANH in the first place I used SIGMOID))
  - 2. output: label for -60 to +60 territory leading by black
the loss of both outputs is trained

the idea is, that this way I do not have to put komi into input and make 
the winrate from the statistics of the trained label:

e.g. komi 6.5: I sum the probabilites from +7 to +60 and get something 
like a winrate

I trained with 800000 positions with territory information through 500 
playouts from oakfoam, which I symmetrized by the 8 transformation 
leading to >6000000 positions. (It is expensive to produce the positions 
due to the playouts....)

The layers are the same as the large network from Christopher Clark 
<http://arxiv.org/find/cs/1/au:+Clark_C/0/1/0/all/0/1>, Amos Storkey 
<http://arxiv.org/find/cs/1/au:+Storkey_A/0/1/0/all/0/1> : 
http://arxiv.org/abs/1412.3409


I get reasonable territory predictions from this network (compared to 
500 playouts of oakfoam), the winrates seems to be overestimated. But 
anyway, it looks as it is worth to do some more work on it.

The idea is, I can do the equivalent of lets say 1000 playouts with a 
call to the CNN for the cost of 2 playouts some time...


Now I try to do a soft turnover from conventional playouts to CNN 
predicted winrates within the framework of MC.

I do have some ideas, but I am not happy with them.

Maybe you have better ones :)


Thanks a lot

Detlef

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20150208/ce681998/attachment.html>


More information about the Computer-go mailing list