[Computer-go] action-value Q for unexpanded nodes
andy.olsen.tx at gmail.com
Sun Dec 3 06:53:02 PST 2017
I don't see the AGZ paper explain what the mean action-value Q(s,a) should
be for a node that hasn't been expanded yet. The equation for Q(s,a) has
the term 1/N(s,a) in it because it's supposed to average over N(s,a)
visits. But in this case N(s,a)=0 so that won't work.
Does anyone know how this is supposed to work? Or is it another detail AGZ
didn't spell out?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Computer-go