[computer-go] Rapid action value estimation

Benjamin Teuber benjamin.teuber at web.de
Fri Nov 2 14:28:21 PDT 2007


I don't think there's something different at different depths in the tree..
To update RAVE after a simulation, for each child of a node you visited
during that simulation, you update if the move leading to the child was
played later (until the end of the playout).
Then, always when you calculate the UCT value, you combine that with the
RAVE value with that weighted average formula to give the final score.
Of course, you need to be careful with signs :-)

Btw, I don't really see a point in calculating and adding the confidence
bound for RAVE as well, as all moves will have been played almost equally
often - thus I dropped the term..
Maybe Sylvain or someone else can comment on this..

Another thing - I didn't believe that you need to do RAVE seperately for
both colors (i.e. you should only consider later moves on the point by the
same color), as e.g. Peter Drake mentioned in a paper of his. But after some
experiments I changed my mind and think he is right =)

Cheers,
Benjamin

On 11/2/07, Jason House <jason.james.house at gmail.com> wrote:
>
> I'd like to implement RAVE as described in [1].  I believe I have a very
> clear understanding of how to do this at the leaves of the UCT search tree.
> What I'm not sure about is how to apply RAVE results higher in the UCT
> tree.  Does anyone have any experience with this that they're willing to
> share?
>
>
> [1] http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20071102/d74c0ac2/attachment.htm


More information about the computer-go mailing list