[computer-go] Experiments with UCT

Don Dailey drd at mit.edu
Tue Jul 25 14:42:32 PDT 2006


This is very interesting.   Of course I realize that my program does not
do "pure" mini-max because there are always some scores "mixed" in from
sibling nodes to parent nodes, no matter how deep I search.  

I previously believed that my algorithm is guaranteed to converge on a
winning move if one is available, but now I'm not as sure.   I reasoned
that even if a lousy line get's degraded to the point where a few lucky
continuations make it look good,  it should become increasingly more
difficult for this to happen since the subtree's also carry information
about the proper way to continue (and these get strengthened with
depth.)

However, what you are saying makes more sense.  I believe I should be
doing something like what you propose, increasing my constant as a
function of the sample size.

I'm not sure it will matter in practice, but it might.

My constant is 0.999996
and my value for N, the number of pseudo wins (or FUDGE) is still being
experimented with, but I'm trying values as low as 12 to as high as 100.
It's hard to tell how much this really matters.   Right this minute I am
using 24.

- Don







On Tue, 2006-07-25 at 18:56 +0200, Rémi Coulom wrote:
> Don Dailey wrote:
> > But I slowly "degrade" the samples as Lazarus searches.  I degrade the
> > number of simulation and the results in the same proportion by
> > multiplying by a fraction very close to 1.0.   So as soon as a move is
> > completed and the statistics accounted for,  I degrade all siblings
> > including the move that was played by multiplying by the degrade
> > constant.  
> 
> That's interesting.
> 
> Is your degradation constant really constant? If you keep it constant, 
> losing moves are guaranteed to be searched a fraction of the time. I 
> guess this would make it impossible to prove that your algorithm 
> converges to the optimal action. It is a little annoying not to have 
> this theoretical guarantee.
> 
> If you make the degradation "constant" go to 1, then your algorithm 
> converges to the best move. I would use a value of the kind x = 1 - C/t, 
> where C is a constant, and t is the number of real (not degraded) 
> simulations than went through this position. Besides the theoretical 
> aspect, my intuition tells me that it might be better in practice too.
> 
> It would be very interesting to compare your method to UCT. It would be 
> great if you could implement the UCT formula I gave in my previous 
> message, and report the result. Also, if you give your values for N and 
> the degradation constant, I would try your algorithm in Crazy Stone.
> 
> I believe your ideas and UCT can be combined. I will try that too.
> 
> Thank you very much for your very interesting message.
> 
> Rémi



More information about the computer-go mailing list