[computer-go] Rapid action value estimation
Jason House
jason.james.house at gmail.com
Wed Nov 7 15:10:32 PST 2007
On Wed, 2007-11-07 at 14:34 -0800, Christoph Birk wrote:
> On Mon, 5 Nov 2007, Jason House wrote:
> > I implemented this yesterday. In doing so, I realized I didn't know the
> > proper way to initialize new leaves in the UCT tree. MoGo papers seem to
> > talk about a progression from always picking an unexplored leaf (AKA using
> > infinity for the upper confidence bound), to "first play urgency" (using a
> > fixed ucb for new leaves), to using patterns.
>
> What did you decide on?
> What is the difference between 'hb-678-UCTRAVE-10k' and 'hb-675-UCT-10k'.
When I coded it, I intended to use an upper confidence bound of 110%
winning rate, but it looks like I simply take the first move with no
RAVE sims. A move that has RAVE sims but no actual sims will use 110%
for the non-RAVE portion.
The rest should match the mogo paper, except that I may have done a very
brain-dead AMAF implementation. In the very near term, I should upgrade
it to do what Don Dailey previously suggested for AMAF.
More information about the computer-go
mailing list