[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13

Hiroshi Yamashita yss at bd.mbn.or.jp
Thu Nov 17 13:38:51 PST 2016


Aya reaches pro level on GoQuest 9x9 and 13x13.
Aya got highest rating in 9x9, and highest best rating in 13x13.
GoQuest is Go App for Android, iPhone and Browser.

In 9x9 and 13x13, Aya uses Policy network and Value network.
Policy net is same as 19x19.
It is trained by GoGoD 78000 games, using 8 symmetries, 120,000,000 positions.
It took one month with a GTX 980. Accuracy is 51.0%.
12 Layers, 128 Filters.
128 5x5 x1, 128 3x3 x10, 128 3x3 x1
Features are 49 channels.
Network is fully convolution, so it can be used 9x9 and 13x13.

DCNN without search is +580(19x1), +448(13x13) and +393(9x9) stronger than
 GNU Go.(CGOS BayesElo)

DCNN_AyaF128a510x1  2193
Gnugo-3.7.10-a1  1800

DCNN_AyaF128a510x1  2248
Gnugo-3.7.10-a1  1800

DCNN_AyaF128a510x1  2380
Gnugo-3.7.10-a1  1800

Value Net is 32 Filters, 14 Layers.
32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
Features are 50 channels.
Learning positions are made by Aya's selfplay. 9x9 is 2,200,000 games,
 13x13 is 1,000,000 games. 16 position are selected from one game.
9x9   is 2000 playout/move. komi 7.0. (CGOS 2290).
13x13 is  500 playout/move. Only root is created by Policy Net. komi 7.5. (CGOS 2433).
In 9x9, opening book from GoQuest 8607 games is used.
In 13x13, first 16 moves are selected from Policy net probability.

At first, I used playout winrate for training data. If 24 move's Black winrate
 is 59%, set 0.59. But it is weaker than using game result 0 or 1.

Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
0.634  using game result. 0 or 1
0.552  using game result. Cubic approximation.
0.625  using game result. Linear approximation.
0.641  using game result. 0 or 1, dropout, half, all layers
0.554  using playout winrate

Linear approximation is, if game ends 60 moves, and result is W win(0.0),
  then 30 moves position's value is (0.25).
Linear approximation reduces training loss though. (from 0.37 to 0.08.
 19x19, B win +1.0, W win -1.0.)

Policy + Value vs Policy, 1000 playouts/move, 13x13, komi 7.5
0.735 1000 playouts/move, 994 games

Compared with 9x9, it seems stronger selfplay makes stronger value net.

I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
 GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
 are selected from a game. Like Detlef's idea, I also use game result.
 I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
 If B+ and 1000 playouts at final position is over +0.60, I use it.

Policy + Value vs Policy, 19x19, komi 7.5, Filter  32, Layer 14
0.640  1000 playouts/move, 995 games
0.654  1000 playouts/move, 500 games, explicit symmetry ensemble(Value net only)
0.635  1000 playouts/move, 818 games, Linear approximation

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layer 14
0.667   500 playouts/move, 501 games.
0.664  2000 playouts/move, 530 games.

Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layler 14, using 2000 playouts winrate
0.694  1000 playouts/move, 572 games
0.771 10000 playouts/move, 332 games

Recently I found Black winrate is low in KGS games. Because there are
 many komi 0.5 games, and in komi 0.5, White tends to win. Maybe I need
 to reduce some White win games.

19x19 Black winrate 0.418, komi 7.5,  30,840,000 positions, GoGoD, KGD 4d, tygem
13x13 Black winrate 0.485, komi 7.5,  16,790,000 positions, selfplay, 500 playout/move
9x9   Black winrate 0.514, komi 7.0,  33,760,000 positions, selfplay, 2000 playout/move, draw is 0.5

Using Policy + Value(Filter 32), Aya reaches 7d on KGS.
Machine is W3680 3.3GHz, 6 cores, a GTX 980
  AyaMC 4d
  AyaMC 6d  with Policy
  AyaMC 7d  with Policy and Value, handicaps <= 3, no dynamic komi.

GoQuest ranking, Bot is not listed. "spaceman" is OHASHI Hirofumi 6p.
13x13   http://wars.fm/go13#users/0
  9x9   http://wars.fm/go9#users/0
AyaZBot http://wars.fm/go9#user/:ayazbot

Aya's GoQuest rating
                9x9    13x13
:AyaXBot       2322     2407   10000 playout/move, only root node is Policy
:AyaZBot       2466     2361   year 2014
:AyaZBot       2647     2711   Policy+Value, W3680 3.3GHz, 6 core, a GTX 980
:CrazyStoneBot 2592            year 2014

  GoQuest time setting is 5 minutes + add 3 sec/move in 13x13.
  Computers have an advantage on this setting.

I wrote an article how to make Poilicy and Value network.
I'm afraid it is in Japanese. But some of links are maybe useful.
This includes Aya's network definition.

Open source
Ray-nn, Ray with Policy and Value net, CGOS 2900
Dark Forest
DeltaGo, trace AlphaGo Policy SL. Accuracy 54%. DCNN is calculated on CPU.

Hiroshi Yamashita

More information about the Computer-go mailing list