[Computer-go] action-value Q for unexpanded nodes

Gian-Carlo Pascutto gcp at sjeng.org
Thu Dec 7 01:15:44 PST 2017


On 03-12-17 21:39, Brian Lee wrote:
> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get
> explored again - at least until all 362 top-level children have been
> explored and revealed to have negative values. So without initializing Q
> to the parent's Q, you would end up wasting 362 MCTS iterations.

Note that the same argument could be made for making it 0, which some
people think the AGZ paper implies, so the above can't be the entire
explanation.

That said, empirical testing indicates that initializing Q(s, a) to the
parent is indeed a well performing setting for both strong and weak
policy networks.

-- 
GCP


More information about the Computer-go mailing list