[Computer-go] Replicating AlphaGo results
jim.oflaherty.jr at gmail.com
Thu Jan 28 08:29:29 PST 2016
I think the first goal was and is to find a pathway that clearly works to
reach into the upper echelons of human strength, even if the first version
used a huge amount of resources. Once found, then the approach can be
explored for efficiencies from both directions, top down (take this away
and see what we lose, if anything) and bottom up (efficiently reoriginate a
reflection of a larger pattern in a much more constrained environment).
>From what I can see in the chess community, this is essentially what
happened following Deep Blue's win against Kasperov. And now their are
solutions on single desktops that can best what Deep Blue did with far more
On Thu, Jan 28, 2016 at 10:07 AM, Petr Baudis <pasky at ucw.cz> wrote:
> Since I didn't say that yet, congratulations to DeepMind!
> (I guess I'm a bit disappointed that no really new ML models had to be
> invented for this though, I was wondering e.g. about capsule networks or
> training simple iterative evaluation subroutines (for semeai etc.) by
> NTM-based approaches. Just like everyone else, color me very awed by
> such an astonishing result with just what was presented.)
> On Wed, Jan 27, 2016 at 11:15:59PM -0800, David Fotland wrote:
> > Google’s breakthrough is just as impactful as the invention of MCTS.
> Congratulations to the team. It’s a huge leap for computer go, but more
> importantly it shows that DNN can be applied to many other difficult
> > I just added an answer. I don’t think anyone will try to exactly
> replicate it, but a year from now there should be several strong programs
> using very similar techniques, with similar strength.
> > An interesting question is, who has integrated or is integrating a DNN
> into their go program? I’m working on it. I know there are several others.
> > David
> > From: Computer-go [mailto:computer-go-bounces at computer-go.org] On
> Behalf Of Jason Li
> > Sent: Wednesday, January 27, 2016 3:14 PM
> > To: computer-go at computer-go.org
> > Subject: Re: [Computer-go] Mastering the Game of Go with Deep Neural
> Networks and Tree Search
> > Congratulations to Aja!
> > A question to the community. Is anyone going to replicate the
> experimental results?
> A perfect question, I think - what can we do to replicate this,
> without Google's computational power?
> I probably couldn't have resisted giving it a try myself (especially
> given that a lot of what I do nowadays are deep NNs, though on NLP),
> but thankfully I have two deadlines coming... ;-)
> I'd propose these as the major technical points to consider when
> bringing a Go program (or a new one) to an Alpha-Go analog:
> * Asynchronous integration of DNN evaluation with fast MCTS. I'm
> curious about this, as I thought this would be a much bigger problem
> that it apparently is, based on old results with batch parallelization.
> I guess virtual loss makes a lot of difference? Is 1 lost playout
> I wonder if Detlef has already solved this sufficiently well in
> What's the typical lag of getting the GPU evaluation (in, I guess,
> #playouts) in oakfoam and is the throughput sufficient to score all
> expanded leaf nodes (what's the #visits?)? Sorry if this has been
> answered before.
> * Are RL Policy Networks essential? AIUI by quick reading, they are
> actually used only for RL of the value networks, and based on Fig. 4
> the value network didn't use policy network for training on but still
> got quite stronger than zen/crazystone? Aside of extra work, this'd
> save us 50 GPU-days.
> (My intuition is that RL policy networks are the part that allows
> embedding knowledge about common tsumego/semeai situations in the
> value networks, because they probably have enough capacity to learn
> them. Does that make sense?)
> * Seems like the push for SL Policy Network prediction accuracy from
> 50% to 60% is really important for real-world strength (Fig. 2).
> I think right now the top open source solution has prediction
> accuracy 50%? IDK if there's any other factor (features, dataset
> size, training procedure) involved in this than "Updates were
> applied asynchronously on 50 GPUs using DistBelief 60; gradients older
> than 100 steps were discarded. Training took around 3 weeks for 340
> million training steps."
> * Value Networks require (i) 30 million self-play games (!); (ii) 50
> GPU-weeks to train the weights. This seems rather troublesome, even
> 1/10 of that is a bit problematic for individual programmers. It'd
> be interesting to see how much of that are diminishing returns and
> if a much smaller network on smaller data (+ some compromises like
> sampling the same game a few times, or adding the 8 million tygem
> corpus to the mix) could do something interesting too.
> In summary, seems to me that the big part of why this approach was so
> successful are the huge computational resources applied to this, which
> is of course an obstacle (except the big IT companies).
> I think the next main avenue of research is exploring solutions that
> are much less resource-hungry. The main problem here is hungry at
> training time, not play time. Well, the strength of this NN running on
> a normal single-GPU machine is another big question mark, of course.
> Petr Baudis
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go