[Computer-go] Converging to 57%

Robert Waite winstonwaite at gmail.com
Mon Aug 22 23:40:25 PDT 2016


I had subscribed to this mailing list back with MoGo... and remember
probably arguing that the game of go wasn't going to be beat for years and
years. I am a little late to the game now but was curious if anyone here
has worked with supervised learning networks like in the AlphaGo paper.

I have been training some networks along the lines of the AlphaGo paper and
the DarkForest paper.. and a couple others... and am working with a single
660gtx. I know... laugh... but its a fair benchmark and i'm being cheap for
the moment.

Breaking 50% accuracy is quite challenging... I have attempted many
permutations of learning algorithms... and can hit 40% accuracy in perhaps
4-12 hours... depending on the parameters set. Some things I have tried are
using default AlphaGo but wtih 128 filters, 32 minibatch size and .01
learning rate, changing the optimizer from vanilla SGD to Adam or RMSProp.
Changing batching to match DarkForest style (making sure that a minibatch
contains samples from game phases... for example beginning, middle and
end-game).Pretty much everything seems to converge at a rate that will
really stretch out. I am planning on picking a line and going with it for
an extended training but was wondering if anyone has ever gotten close to
the convergence rates implied by the DarkForest paper.

For comparison... Google team had 50 gpus, spend 3 weeks.. and processed
5440M state/action pairs. The FB team had 4 gpus, spent 2 weeks and
processed 150M-200M state/action pairs. Both seemed to get to around 57%
accuracy with their networks.

I have also been testing them against GnuGo as a baseline.. and find that
GnuGo can be beaten rather easily with very little network training... my
eye is on Pachi... but have to break 50% accuracy i think to even worry
about that.

Have also played with reinforcement learning phase... started with learning
rate of .01... which i think was too high.... that does take quite a bit of
time on my machine.. so didnt play too much with it yet.

Anyway.... does anyone have any tales of how long it took to break 50%?
What is the magic bullet that will help me converge there quickly!

Here is a long-view graph of various attempts:

https://drive.google.com/file/d/0B0BbrXeL6VyCUFRkMlNPbzV2QTQ/view

Red and Blue lines are from another member that ran 32 in a minibatch, .01
learning rate and 128 filters in the middle layers vs. 192. They had 4 k40
gpus I believe. They also used 40000 training pairs to 40000 validation
pairs... so I imagine that is whey they had such a spread. There is a jump
in the accuracy which was when learning rate was decreased to .001 I
believe.

Closer shot:

https://drive.google.com/file/d/0B0BbrXeL6VyCRVUxUFJaWVJBdEE/view

Most stay between the lines... but looking at both graphs makes me wonder
if any of the lines are approaching the convergence of DarkForest. My gut
tells me they were onto something... and am rather curious of the playing
strength of the DarkForest SL network and the AG SL network.

Also... a picture of the network's view on a position... this one was
trained to 41% accuracy and played itself greedily.

https://drive.google.com/file/d/0B0BbrXeL6VyCNkRmVDBIYldraWs/view

Oh... and another thing.... AG used KGS amateur data... FB and my networks
have been trained on pro games only. At one point I tested the 41% network
in the image (trained on pro data) and a 44% network trained on amateur
(KGS games) against GnuGo... and the pro data network soundly won... and
the amateur network soundly lost... so I stuck with pro since. Not sure if
the end result is the same... and kinda glad AG team used amateur as that
removes the argument that it somehow learned Le Sedol's style.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20160822/33657a50/attachment.html>


More information about the Computer-go mailing list