[computer-go] Experiments with UCT

Chris Fant chrisfant at gmail.com
Tue Jul 25 14:46:57 PDT 2006


Are you experimenting with these parameters for 5x5, 6x6, 7x7 to get
an idea of how they scale with boardsize so that you can make a good
guess at what they should be on a board that's harder to optimize,
like 9x9?


On 7/25/06, Don Dailey <drd at mit.edu> wrote:
> This is very interesting.   Of course I realize that my program does not
> do "pure" mini-max because there are always some scores "mixed" in from
> sibling nodes to parent nodes, no matter how deep I search.
>
> I previously believed that my algorithm is guaranteed to converge on a
> winning move if one is available, but now I'm not as sure.   I reasoned
> that even if a lousy line get's degraded to the point where a few lucky
> continuations make it look good,  it should become increasingly more
> difficult for this to happen since the subtree's also carry information
> about the proper way to continue (and these get strengthened with
> depth.)
>
> However, what you are saying makes more sense.  I believe I should be
> doing something like what you propose, increasing my constant as a
> function of the sample size.
>
> I'm not sure it will matter in practice, but it might.
>
> My constant is 0.999996
> and my value for N, the number of pseudo wins (or FUDGE) is still being
> experimented with, but I'm trying values as low as 12 to as high as 100.
> It's hard to tell how much this really matters.   Right this minute I am
> using 24.
>
> - Don
>
>
>
>
>
>
>
> On Tue, 2006-07-25 at 18:56 +0200, Rémi Coulom wrote:
> > Don Dailey wrote:
> > > But I slowly "degrade" the samples as Lazarus searches.  I degrade the
> > > number of simulation and the results in the same proportion by
> > > multiplying by a fraction very close to 1.0.   So as soon as a move is
> > > completed and the statistics accounted for,  I degrade all siblings
> > > including the move that was played by multiplying by the degrade
> > > constant.
> >
> > That's interesting.
> >
> > Is your degradation constant really constant? If you keep it constant,
> > losing moves are guaranteed to be searched a fraction of the time. I
> > guess this would make it impossible to prove that your algorithm
> > converges to the optimal action. It is a little annoying not to have
> > this theoretical guarantee.
> >
> > If you make the degradation "constant" go to 1, then your algorithm
> > converges to the best move. I would use a value of the kind x = 1 - C/t,
> > where C is a constant, and t is the number of real (not degraded)
> > simulations than went through this position. Besides the theoretical
> > aspect, my intuition tells me that it might be better in practice too.
> >
> > It would be very interesting to compare your method to UCT. It would be
> > great if you could implement the UCT formula I gave in my previous
> > message, and report the result. Also, if you give your values for N and
> > the degradation constant, I would try your algorithm in Crazy Stone.
> >
> > I believe your ideas and UCT can be combined. I will try that too.
> >
> > Thank you very much for your very interesting message.
> >
> > Rémi
>
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>


More information about the computer-go mailing list