[Computer-go] Mastering the Game of Go with Deep Neural Networks and Tree Search
muupan at gmail.com
Tue Feb 23 20:52:38 PST 2016
Congratulations, people at DeepMind! Your paper is very interesting to read.
I have a question about the paper. On policy network training it says
> On the first pass through the training pipeline, the baseline was set to
zero; on the second pass we used the value network vθ(s) as a baseline;
but I cannot find any other description about the "second pass". What is
it? It uses vθ(s), so at least it is done after training vθ(s). Is it that
after completing the whole training pipeline depicted in Fig. 1, only the
RL policy network training part is repeated? Or training vθ(s) is also
repeated? Is the second pass the last pass, or there are more passes? Sorry
if I just missed the relevant part of the paper.
2016-02-13 12:21 GMT+09:00 John Tromp <john.tromp at gmail.com>:
> On Wed, Jan 27, 2016 at 1:46 PM, Aja Huang <ajahuang at google.com> wrote:
> > We are very excited to announce that our Go program, AlphaGo, has beaten
> > professional player for the first time. AlphaGo beat the European
> > Fan Hui by 5 games to 0.
> It's interesting to go back nearly a decade and read this 2007 article:
> where Feng-Hsiung Hsu, Deep Blue's lead developer, made this prediction:
> "Nevertheless, I believe that a world-champion-level Go machine can be
> built within 10 years"
> Which now appears to be spot on. March 9 cannot come soon enough...
> The remainder of his prediction rings less true though:
> ", based on the same method of intensive analysis—brute force,
> basically—that Deep Blue employed for chess".
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go