[Computer-go] action-value Q for unexpanded nodes

Álvaro Begué alvaro.begue at gmail.com
Sun Dec 3 13:39:52 PST 2017


The initial value of Q is not very important because Q+U is dominated by
the U piece when the number of visits is small.

On Sun, Dec 3, 2017 at 3:39 PM, Brian Lee <brian.kihoon.lee at gmail.com>
wrote:

> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get explored
> again - at least until all 362 top-level children have been explored and
> revealed to have negative values. So without initializing Q to the parent's
> Q, you would end up wasting 362 MCTS iterations.
>
> Brian
>
> On Sun, Dec 3, 2017 at 3:25 PM <computer-go-request at computer-go.org>
> wrote:
>
>> Send Computer-go mailing list submissions to
>>         computer-go at computer-go.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         http://computer-go.org/mailman/listinfo/computer-go
>> or, via email, send a message with subject or body 'help' to
>>         computer-go-request at computer-go.org
>>
>> You can reach the person managing the list at
>>         computer-go-owner at computer-go.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Computer-go digest..."
>>
>>
>> Today's Topics:
>>
>>    1. action-value Q for unexpanded nodes (Andy)
>>    2. Re: action-value Q for unexpanded nodes (Álvaro Begué)
>>    3. Re: action-value Q for unexpanded nodes (Andy)
>>    4. Re: action-value Q for unexpanded nodes (Rémi Coulom)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 3 Dec 2017 08:53:02 -0600
>> From: Andy <andy.olsen.tx at gmail.com>
>> To: computer-go <computer-go at computer-go.org>
>> Subject: [Computer-go] action-value Q for unexpanded nodes
>> Message-ID:
>>         <CAAtbd5Cguzt4arbSuM8-d91J31zNQ+2TKzpbXV4U5fxThHd3BQ at mail.
>> gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I don't see the AGZ paper explain what the mean action-value Q(s,a) should
>> be for a node that hasn't been expanded yet. The equation for Q(s,a) has
>> the term 1/N(s,a) in it because it's supposed to average over N(s,a)
>> visits. But in this case N(s,a)=0 so that won't work.
>>
>> Does anyone know how this is supposed to work? Or is it another detail AGZ
>> didn't spell out?
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://computer-go.org/pipermail/computer-go/
>> attachments/20171203/8fc94bcd/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sun, 3 Dec 2017 10:44:00 -0500
>> From: Álvaro Begué <alvaro.begue at gmail.com>
>> To: computer-go <computer-go at computer-go.org>
>> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
>> Message-ID:
>>         <CAF8dVMU_F0ue2YyKvBwVKrcSUY93WN-X9M8TgMcz+dqfbe4AaA at mail.
>> gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I am not sure where in the paper you think they use Q(s,a) for a node s
>> that hasn't been expanded yet. Q(s,a) is a property of an edge of the
>> graph. At a leaf they only use the `value' output of the neural network.
>>
>> If this doesn't match your understanding of the paper, please point to the
>> specific paragraph that you are having trouble with.
>>
>> Álvaro.
>>
>>
>>
>> On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen.tx at gmail.com> wrote:
>>
>> > I don't see the AGZ paper explain what the mean action-value Q(s,a)
>> should
>> > be for a node that hasn't been expanded yet. The equation for Q(s,a) has
>> > the term 1/N(s,a) in it because it's supposed to average over N(s,a)
>> > visits. But in this case N(s,a)=0 so that won't work.
>> >
>> > Does anyone know how this is supposed to work? Or is it another detail
>> AGZ
>> > didn't spell out?
>> >
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > Computer-go at computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://computer-go.org/pipermail/computer-go/
>> attachments/20171203/b8f3d1cc/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sun, 3 Dec 2017 10:27:16 -0600
>> From: Andy <andy.olsen.tx at gmail.com>
>> To: computer-go <computer-go at computer-go.org>
>> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
>> Message-ID:
>>         <CAAtbd5CBDTsJ7wHjm9MybrTDBzLhqduJiTOSN49Ce8kUT5_vXw at mail.
>> gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>> Figure 2a shows two bolded Q+U max values. The second one is going to a
>> leaf that doesn't exist yet, i.e. not expanded yet. Where do they get that
>> Q value from?
>>
>> The associated text doesn't clarify the situation: "Figure 2: Monte-Carlo
>> tree search in AlphaGo Zero. a Each simulation traverses the tree by
>> selecting the edge with maximum action-value Q, plus an upper confidence
>> bound U that depends on a stored prior probability P and visit count N for
>> that edge (which is incremented once traversed). b The leaf node is
>> expanded..."
>>
>>
>>
>>
>>
>>
>> 2017-12-03 9:44 GMT-06:00 Álvaro Begué <alvaro.begue at gmail.com>:
>>
>> > I am not sure where in the paper you think they use Q(s,a) for a node s
>> > that hasn't been expanded yet. Q(s,a) is a property of an edge of the
>> > graph. At a leaf they only use the `value' output of the neural network.
>> >
>> > If this doesn't match your understanding of the paper, please point to
>> the
>> > specific paragraph that you are having trouble with.
>> >
>> > Álvaro.
>> >
>> >
>> >
>> > On Sun, Dec 3, 2017 at 9:53 AM, Andy <andy.olsen.tx at gmail.com> wrote:
>> >
>> >> I don't see the AGZ paper explain what the mean action-value Q(s,a)
>> >> should be for a node that hasn't been expanded yet. The equation for
>> Q(s,a)
>> >> has the term 1/N(s,a) in it because it's supposed to average over
>> N(s,a)
>> >> visits. But in this case N(s,a)=0 so that won't work.
>> >>
>> >> Does anyone know how this is supposed to work? Or is it another detail
>> >> AGZ didn't spell out?
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> Computer-go at computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> >>
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > Computer-go at computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://computer-go.org/pipermail/computer-go/
>> attachments/20171203/c01677b3/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Sun, 3 Dec 2017 17:57:51 +0100 (CET)
>> From: Rémi Coulom <remi.coulom at free.fr>
>> To: computer-go at computer-go.org
>> Subject: Re: [Computer-go] action-value Q for unexpanded nodes
>> Message-ID:
>>         <1885878373.291683317.1512320271343.JavaMail.root at spooler6-g27>
>> Content-Type: text/plain; charset=utf-8
>>
>> They have a Q(s,a) term in their node-selection formula, but they don't
>> tell what value they give to an action that has not yet been visited. Maybe
>> Aja can tell us.
>>
>> ----- Mail original -----
>> De: "Álvaro Begué" <alvaro.begue at gmail.com>
>> À: "computer-go" <computer-go at computer-go.org>
>> Envoyé: Dimanche 3 Décembre 2017 16:44:00
>> Objet: Re: [Computer-go] action-value Q for unexpanded nodes
>>
>>
>>
>>
>> I am not sure where in the paper you think they use Q(s,a) for a node s
>> that hasn't been expanded yet. Q(s,a) is a property of an edge of the
>> graph. At a leaf they only use the `value' output of the neural network.
>>
>> If this doesn't match your understanding of the paper, please point to
>> the specific paragraph that you are having trouble with.
>>
>> Álvaro.
>>
>>
>>
>>
>>
>> On Sun, Dec 3, 2017 at 9:53 AM, Andy < andy.olsen.tx at gmail.com > wrote:
>>
>>
>>
>> I don't see the AGZ paper explain what the mean action-value Q(s,a)
>> should be for a node that hasn't been expanded yet. The equation for Q(s,a)
>> has the term 1/N(s,a) in it because it's supposed to average over N(s,a)
>> visits. But in this case N(s,a)=0 so that won't work.
>>
>>
>> Does anyone know how this is supposed to work? Or is it another detail
>> AGZ didn't spell out?
>>
>>
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go at computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
>> ------------------------------
>>
>> End of Computer-go Digest, Vol 95, Issue 5
>> ******************************************
>>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go at computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://computer-go.org/pipermail/computer-go/attachments/20171203/cb47b30a/attachment.html>


More information about the Computer-go mailing list