[Computer-go] action-value Q for unexpanded nodes
ajahuang at gmail.com
Wed Dec 6 02:47:01 PST 2017
2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto <gcp at sjeng.org>:
> On 03-12-17 17:57, Rémi Coulom wrote:
> > They have a Q(s,a) term in their node-selection formula, but they
> > don't tell what value they give to an action that has not yet been
> > visited. Maybe Aja can tell us.
> FWIW I already asked Aja this exact question a bit after the paper came
> out and he told me he cannot answer questions about unpublished details.
Yes, I did ask my manager if I could answer your question but he
specifically said no. All I can say is that first-play-urgency is not a
significant technical detail, and what's why we didn't specify it in the
> This is not very promising regarding reproducibility considering the AZ
> paper is even lighter on them.
> Another issue which is up in the air is whether the choice of the number
> of playouts for the MCTS part represents an implicit balancing between
> self-play and training speed. This is particularly relevant if the
> evaluation step is removed. But it's possible even DeepMind doesn't know
> the answer for sure. They had a setup, and they optimized it. It's not
> clear which parts generalize.
> (Usually one wonders about such things in terms of algorithms, but here
> one wonders about it in terms of hardware!)
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go