[computer-go] Rapid action value estimation
Benjamin Teuber
benjamin.teuber at web.de
Sat Nov 3 15:25:57 PDT 2007
On 11/3/07, Jason House <jason.james.house at gmail.com> wrote:
>
>
>
> On Fri, 2007-11-02 at 22:28 +0100, Benjamin Teuber wrote:
> > I don't think there's something different at different depths in the
> > tree..
> > To update RAVE after a simulation, for each child of a node you
> > visited during that simulation, you update if the move leading to the
> > child was played later (until the end of the playout).
>
> I start each new simulation at the root of the search tree. That could
> make every node in the tree a child (grandchild, etc...) of a node that
> was visited. While traversing the entire tree to update values could be
> done it seems complex and seems like it may bias results too much.
>
> Do you stop at just the children of nodes that are visited and not
> extend to grandchildren?
Sure, I was just referring to direct children. So, for each node n you
visited during this simulation and each move m later played during that
simulation by the player moving in position n, you update the node you would
get to from n by moving at m - if m is legal in n..
> Then, always when you calculate the UCT value, you combine that with
> > the RAVE value with that weighted average formula to give the final
> > score.
> > Of course, you need to be careful with signs :-)
> >
> > Btw, I don't really see a point in calculating and adding the
> > confidence bound for RAVE as well, as all moves will have been played
> > almost equally often - thus I dropped the term..
> > Maybe Sylvain or someone else can comment on this..
>
> I'll experiment with this after I get the initial formula to work.
>
>
> > Another thing - I didn't believe that you need to do RAVE seperately
> > for both colors (i.e. you should only consider later moves on the
> > point by the same color), as e.g. Peter Drake mentioned in a paper of
> > his. But after some experiments I changed my mind and think he is
> > right =)
>
>
> Do you have a link to the paper?
>
No, seems like his homepage is being reorganized..
The title was something like "Heuristics for Monte-Carlo Go", should be easy
to find once his site is back online..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://computer-go.org/pipermail/computer-go/attachments/20071103/f0005cc2/attachment.htm
More information about the computer-go
mailing list