[Computer-go] Aya reaches pro level on GoQuest 9x9 and 13x13
yss at bd.mbn.or.jp
Thu Nov 17 13:38:51 PST 2016
Aya reaches pro level on GoQuest 9x9 and 13x13.
Aya got highest rating in 9x9, and highest best rating in 13x13.
GoQuest is Go App for Android, iPhone and Browser.
In 9x9 and 13x13, Aya uses Policy network and Value network.
Policy net is same as 19x19.
It is trained by GoGoD 78000 games, using 8 symmetries, 120,000,000 positions.
It took one month with a GTX 980. Accuracy is 51.0%.
12 Layers, 128 Filters.
128 5x5 x1, 128 3x3 x10, 128 3x3 x1
Features are 49 channels.
Network is fully convolution, so it can be used 9x9 and 13x13.
DCNN without search is +580(19x1), +448(13x13) and +393(9x9) stronger than
GNU Go.(CGOS BayesElo)
Value Net is 32 Filters, 14 Layers.
32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
Features are 50 channels.
Learning positions are made by Aya's selfplay. 9x9 is 2,200,000 games,
13x13 is 1,000,000 games. 16 position are selected from one game.
9x9 is 2000 playout/move. komi 7.0. (CGOS 2290).
13x13 is 500 playout/move. Only root is created by Policy Net. komi 7.5. (CGOS 2433).
In 9x9, opening book from GoQuest 8607 games is used.
In 13x13, first 16 moves are selected from Policy net probability.
At first, I used playout winrate for training data. If 24 move's Black winrate
is 59%, set 0.59. But it is weaker than using game result 0 or 1.
Policy + Value vs Policy, 1000 playouts/move, 1000 games. 9x9, komi 7.0
0.634 using game result. 0 or 1
0.552 using game result. Cubic approximation.
0.625 using game result. Linear approximation.
0.641 using game result. 0 or 1, dropout, half, all layers
0.554 using playout winrate
Linear approximation is, if game ends 60 moves, and result is W win(0.0),
then 30 moves position's value is (0.25).
Linear approximation reduces training loss though. (from 0.37 to 0.08.
19x19, B win +1.0, W win -1.0.)
Policy + Value vs Policy, 1000 playouts/move, 13x13, komi 7.5
0.735 1000 playouts/move, 994 games
Compared with 9x9, it seems stronger selfplay makes stronger value net.
I also made 19x19 Value net. 19x19 learning positions are from KGS 4d over,
GoGoD, Tygem and 500 playouts/move selfplay. 990255 games. 32 positions
are selected from a game. Like Detlef's idea, I also use game result.
I trust B+R and W+R games with komi 5.5, 6.5 and 7.5. In other games,
If B+ and 1000 playouts at final position is over +0.60, I use it.
Policy + Value vs Policy, 19x19, komi 7.5, Filter 32, Layer 14
0.640 1000 playouts/move, 995 games
0.654 1000 playouts/move, 500 games, explicit symmetry ensemble(Value net only)
0.635 1000 playouts/move, 818 games, Linear approximation
Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layer 14
0.667 500 playouts/move, 501 games.
0.664 2000 playouts/move, 530 games.
Policy + Value vs Policy, 19x19, komi 7.5, Filter 128, Layler 14, using 2000 playouts winrate
0.694 1000 playouts/move, 572 games
0.771 10000 playouts/move, 332 games
Recently I found Black winrate is low in KGS games. Because there are
many komi 0.5 games, and in komi 0.5, White tends to win. Maybe I need
to reduce some White win games.
19x19 Black winrate 0.418, komi 7.5, 30,840,000 positions, GoGoD, KGD 4d, tygem
13x13 Black winrate 0.485, komi 7.5, 16,790,000 positions, selfplay, 500 playout/move
9x9 Black winrate 0.514, komi 7.0, 33,760,000 positions, selfplay, 2000 playout/move, draw is 0.5
Using Policy + Value(Filter 32), Aya reaches 7d on KGS.
Machine is W3680 3.3GHz, 6 cores, a GTX 980
AyaMC 6d with Policy
AyaMC 7d with Policy and Value, handicaps <= 3, no dynamic komi.
GoQuest ranking, Bot is not listed. "spaceman" is OHASHI Hirofumi 6p.
Aya's GoQuest rating
:AyaXBot 2322 2407 10000 playout/move, only root node is Policy
:AyaZBot 2466 2361 year 2014
:AyaZBot 2647 2711 Policy+Value, W3680 3.3GHz, 6 core, a GTX 980
:CrazyStoneBot 2592 year 2014
GoQuest time setting is 5 minutes + add 3 sec/move in 13x13.
Computers have an advantage on this setting.
I wrote an article how to make Poilicy and Value network.
I'm afraid it is in Japanese. But some of links are maybe useful.
This includes Aya's network definition.
Ray-nn, Ray with Policy and Value net, CGOS 2900
DeltaGo, trace AlphaGo Policy SL. Accuracy 54%. DCNN is calculated on CPU.
More information about the Computer-go