[Computer-go] Value Network

Hiroshi Yamashita yss at bd.mbn.or.jp
Fri Mar 4 07:23:00 PST 2016


I tried to make Value network.

"Policy network + Value network"  vs  "Policy network"
Winrate  Wins/Games
 70.7%    322 / 455,    1000 playouts/move
 76.6%    141 / 184,   10000 playouts/move

It seems more playouts, more Value network is effetctive. Games
 is not enough though. Search is similar to AlphaGo. Mixing
 parameter lambda is 0.5. Search is synchronous. Using one GTX 980.
In 10000 playouts/move, Policy network is called 175 times,
 Value network is called 786 times. Node Expansion threshold is 33.

Value network is
  13 layers, 128 filters. (5x5_128, 3x3_128 x10, 1x1_1, fully connect, tanh)
Policy network is
  12 layers, 256 filters. (5x5_256, 3x3_256 x10, 3x3_1), Accuracy is 50.1%

For Value network, I collected 15804400 positions from 987775 games.
Games are from
  tygem 9d,      22477 games http://baduk.sourceforge.net/TygemAmateur.7z
  KGS 4d over, 1450946 games http://www.u-go.net/gamerecords-4d/
  (except handicaps games).
And select 16 positions randomly from one game. One game is divided
 16 game stage, and select one of each. 1st and 9th position are
 rotated in same symmetry. Then Aya searches with 500 playouts,
 with Policy network. And store winrate (-1 to +1). Komi is 7.5.
 This 500 playouts is around 2730 BayesElo on CGOS.

I did some of this on Amazon EC2 g2.2xlarge, 11 instances. It took
 2 days, and costed $54. Spot instance is reasonable. However
 g2.2xlarge(GRID K520), is 3x slower than GTX 980. My Pocicy
 network(12L 256F) takes 5.37ms(GTX 980), and 15.0ms(g2.2xlarge).
Test and Traing loss are 0.00923 and 0.00778. I think there is
 no big overfitting.

Value network is effective, but Aya has still fatal semeai weakness.

Hiroshi Yamashita

More information about the Computer-go mailing list