[computer-go] Experiments with UCT
Don Dailey
drd at mit.edu
Tue Jul 25 15:24:54 PDT 2006
On Tue, 2006-07-25 at 17:46 -0400, Chris Fant wrote:
> Are you experimenting with these parameters for 5x5, 6x6, 7x7 to get
> an idea of how they scale with boardsize so that you can make a good
> guess at what they should be on a board that's harder to optimize,
> like 9x9?
>
No. I'm only trying to optimize for 9x9 right now. My program can
play on any square board but I have no idea what would happen on bigger
or smaller boards.
Maybe we set up a 7x7 CGOS server some time and see what happens?
- Don
>
> On 7/25/06, Don Dailey <drd at mit.edu> wrote:
> > This is very interesting. Of course I realize that my program does not
> > do "pure" mini-max because there are always some scores "mixed" in from
> > sibling nodes to parent nodes, no matter how deep I search.
> >
> > I previously believed that my algorithm is guaranteed to converge on a
> > winning move if one is available, but now I'm not as sure. I reasoned
> > that even if a lousy line get's degraded to the point where a few lucky
> > continuations make it look good, it should become increasingly more
> > difficult for this to happen since the subtree's also carry information
> > about the proper way to continue (and these get strengthened with
> > depth.)
> >
> > However, what you are saying makes more sense. I believe I should be
> > doing something like what you propose, increasing my constant as a
> > function of the sample size.
> >
> > I'm not sure it will matter in practice, but it might.
> >
> > My constant is 0.999996
> > and my value for N, the number of pseudo wins (or FUDGE) is still being
> > experimented with, but I'm trying values as low as 12 to as high as 100.
> > It's hard to tell how much this really matters. Right this minute I am
> > using 24.
> >
> > - Don
> >
> >
> >
> >
> >
> >
> >
> > On Tue, 2006-07-25 at 18:56 +0200, Rémi Coulom wrote:
> > > Don Dailey wrote:
> > > > But I slowly "degrade" the samples as Lazarus searches. I degrade the
> > > > number of simulation and the results in the same proportion by
> > > > multiplying by a fraction very close to 1.0. So as soon as a move is
> > > > completed and the statistics accounted for, I degrade all siblings
> > > > including the move that was played by multiplying by the degrade
> > > > constant.
> > >
> > > That's interesting.
> > >
> > > Is your degradation constant really constant? If you keep it constant,
> > > losing moves are guaranteed to be searched a fraction of the time. I
> > > guess this would make it impossible to prove that your algorithm
> > > converges to the optimal action. It is a little annoying not to have
> > > this theoretical guarantee.
> > >
> > > If you make the degradation "constant" go to 1, then your algorithm
> > > converges to the best move. I would use a value of the kind x = 1 - C/t,
> > > where C is a constant, and t is the number of real (not degraded)
> > > simulations than went through this position. Besides the theoretical
> > > aspect, my intuition tells me that it might be better in practice too.
> > >
> > > It would be very interesting to compare your method to UCT. It would be
> > > great if you could implement the UCT formula I gave in my previous
> > > message, and report the result. Also, if you give your values for N and
> > > the degradation constant, I would try your algorithm in Crazy Stone.
> > >
> > > I believe your ideas and UCT can be combined. I will try that too.
> > >
> > > Thank you very much for your very interesting message.
> > >
> > > Rémi
> >
> > _______________________________________________
> > computer-go mailing list
> > computer-go at computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> _______________________________________________
> computer-go mailing list
> computer-go at computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
More information about the computer-go
mailing list