[Computer-go] action-value Q for unexpanded nodes
alvaro.begue at gmail.com
Sun Dec 3 07:44:00 PST 2017
I am not sure where in the paper you think they use Q(s,a) for a node s
that hasn't been expanded yet. Q(s,a) is a property of an edge of the
graph. At a leaf they only use the `value' output of the neural network.
If this doesn't match your understanding of the paper, please point to the
specific paragraph that you are having trouble with.
On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen.tx at gmail.com> wrote:
> I don't see the AGZ paper explain what the mean action-value Q(s,a) should
> be for a node that hasn't been expanded yet. The equation for Q(s,a) has
> the term 1/N(s,a) in it because it's supposed to average over N(s,a)
> visits. But in this case N(s,a)=0 so that won't work.
> Does anyone know how this is supposed to work? Or is it another detail AGZ
> didn't spell out?
> Computer-go mailing list
> Computer-go at computer-go.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go