[computer-go] analysis of UCT and BAST
Łukasz Lew
lukasz.lew at gmail.com
Thu May 31 08:10:28 PDT 2007
On 5/31/07, Remi Munos <remi.munos at inria.fr> wrote:
> Thanks Lukasz for the link.
> I'm not sure to understand precisely the formulas. For example, for
> ego_bast_sqrt, you mention that the bast value of a node is the min of the
As I said previously this is not my experiment.
You can reach the author - Filip Gruszczynski - using
gruszczy (at) gmail (dot) com
> max of the bast values of the children and the own value of the node plus the
> confidence interval term sqrt(explore_rat * sqrt(p) / n) where p and n are
> the number of visits of the parent and current nodes.
> Do you add to the own value, an addition smoothness sequence (the delta term
> in the bast paper)? Here I think a such smoothness term could be
> k gamma^d /sqrt(n)
> (for some well chosen k and gamma constants, with a large k and gamma<1)
> where d if the depth of the current node.
> This would take into account the time needed for the bias (difference between
> the true value of a node and the empirical average of the obtained rewards)
> to decrease to 0 (since we know that from the sqrt sequence above, the bias
> will decrease with rate O(1/sqrt(n)).
> From your formula, it does not seem that you add such an additional smoothness
> sequence, so I would say that this bound would be very close to a ego_sqrt
> (because the argument in the min above will most always be the second
> argument).
> Tell me if I misunderstood the formulas of your student.
The simplest and most definite way is to look at the source code.
Best Regards,
Lukasz
PS
Filip noted that more complicated versions of BAST were worse.
>
>
> On Wednesday 30 May 2007 21:05:15 Łukasz Lew wrote:
> > I'm not sure whether You have noticed, but my student made an
> > empirical comparison
> > between BAST, UCT and other formulas.
> >
> > It can be found here:
> > http://students.mimuw.edu.pl/~fg219435/Go/
> >
> > Best Regards,
> > Lukasz Lew
> >
> > On 5/30/07, Remi Munos <remi.munos at inria.fr> wrote:
> > > I have updated the BAST paper, providing additional comparison with UCT,
> > > as suggested by one person in the list. See:
> > > https://hal.inria.fr/inria-00150207
> > >
> > > Basically, in the considered examples, BAST with an appropriate
> > > smoothness sequence performs always better than both UCT and Flat-UCB.
> > > Now, the best smoothness coefficient is always smaller than the constant
> > > corresponding to the true smoothnes of the tree, and sometimes very close
> > > to 0 (which corresponds to UCT).
> > > Since go is way more complex than the problem of optimizing a simple 1-d
> > > function, I would say that no theoretical work could predict whether BAST
> > > would do better than UCT or not, in Monte-Carlo-go.
> > >
> > > Best, Remi
> > >
> > > _______________________________________________
> > > computer-go mailing list
> > > computer-go at computer-go.org
> > > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> > _______________________________________________
> > computer-go mailing list
> > computer-go at computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
>
>
>
> --
> Rémi Munos
> Directeur de recherche INRIA
> Sequential Learning project
> http://www.grappa.univ-lille3.fr/cgi-bin/twiki/view/Sequel/Munos
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
More information about the computer-go
mailing list