[Computer-go] action-value Q for unexpanded nodes

Aja Huang ajahuang at gmail.com
Wed Dec 6 02:47:01 PST 2017


2017-12-06 9:23 GMT+00:00 Gian-Carlo Pascutto <gcp at sjeng.org>:

> On 03-12-17 17:57, RĂ©mi Coulom wrote:
> > They have a Q(s,a) term in their node-selection formula, but they
> > don't tell what value they give to an action that has not yet been
> > visited. Maybe Aja can tell us.
>
> FWIW I already asked Aja this exact question a bit after the paper came
> out and he told me he cannot answer questions about unpublished details.
>

Yes, I did ask my manager if I could answer your question but he
specifically said no. All I can say is that first-play-urgency is not a
significant technical detail, and what's why we didn't specify it in the
paper.

Aja



> This is not very promising regarding reproducibility considering the AZ
> paper is even lighter on them.
>
> Another issue which is up in the air is whether the choice of the number
> of playouts for the MCTS part represents an implicit balancing between
> self-play and training speed. This is particularly relevant if the
> evaluation step is removed. But it's possible even DeepMind doesn't know
> the answer for sure. They had a setup, and they optimized it. It's not
> clear which parts generalize.
>
> (Usually one wonders about such things in terms of algorithms, but here
> one wonders about it in terms of hardware!)
>
> --
> GCP
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171206/6c39df8c/attachment-0001.html>


More information about the Computer-go mailing list