[computer-go] analysis of UCT and BAST

Remi Munos remi.munos at inria.fr
Thu May 31 01:03:06 PDT 2007


Thanks Lukasz for the link.
I'm not sure to understand precisely the formulas. For example, for 
ego_bast_sqrt, you mention that the bast value of a node is the min of the 
max of the bast values of the children and the own value of the node plus the 
confidence interval term sqrt(explore_rat * sqrt(p) / n) where p and n are 
the number of visits of the parent and current nodes.
Do you add to the own value, an addition smoothness sequence (the delta term 
in the bast paper)? Here I think a such smoothness term could be 
k gamma^d /sqrt(n) 
(for some well chosen k and gamma constants, with a large k and gamma<1)
where d if the depth of the current node.
This would take into account the time needed for the bias (difference between 
the true value of a node and the empirical average of the obtained rewards) 
to decrease to 0 (since we know that from the sqrt sequence above, the bias 
will decrease with rate O(1/sqrt(n)).
From your formula, it does not seem that you add such an additional smoothness 
sequence, so I would say that this bound would be very close to a ego_sqrt 
(because the argument in the min above will most always be the second 
argument).
Tell me if I misunderstood the formulas of your student.
Remi



On Wednesday 30 May 2007 21:05:15 Łukasz Lew wrote:
> I'm not sure whether You have noticed, but my student made an
> empirical comparison
> between BAST, UCT and other formulas.
>
> It can  be found here:
> http://students.mimuw.edu.pl/~fg219435/Go/
>
> Best Regards,
> Lukasz Lew
>
> On 5/30/07, Remi Munos <remi.munos at inria.fr> wrote:
> > I have updated the BAST paper, providing additional comparison with UCT,
> > as suggested by one person in the list. See:
> > https://hal.inria.fr/inria-00150207
> >
> > Basically, in the considered examples, BAST with an appropriate
> > smoothness sequence performs always better than both UCT and Flat-UCB.
> > Now, the best smoothness coefficient is always smaller than the constant
> > corresponding to the true smoothnes of the tree, and sometimes very close
> > to 0 (which corresponds to UCT).
> > Since go is way more complex than the problem of optimizing a simple 1-d
> > function, I would say that no theoretical work could predict whether BAST
> > would do better than UCT or not, in Monte-Carlo-go.
> >
> > Best, Remi
> >
> > _______________________________________________
> > computer-go mailing list
> > computer-go at computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/



-- 
Rémi Munos
Directeur de recherche INRIA
Sequential Learning project
http://www.grappa.univ-lille3.fr/cgi-bin/twiki/view/Sequel/Munos


More information about the computer-go mailing list