[Computer-go] action-value Q for unexpanded nodes

Andy andy.olsen.tx at gmail.com
Sun Dec 3 06:53:02 PST 2017


I don't see the AGZ paper explain what the mean action-value Q(s,a) should
be for a node that hasn't been expanded yet. The equation for Q(s,a) has
the term 1/N(s,a) in it because it's supposed to average over N(s,a)
visits. But in this case N(s,a)=0 so that won't work.

Does anyone know how this is supposed to work? Or is it another detail AGZ
didn't spell out?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171203/8fc94bcd/attachment.html>


More information about the Computer-go mailing list