[computer-go] Experiments with UCT
Rémi Coulom
Remi.Coulom at univ-lille3.fr
Tue Jul 25 09:56:58 PDT 2006
Don Dailey wrote:
> But I slowly "degrade" the samples as Lazarus searches. I degrade the
> number of simulation and the results in the same proportion by
> multiplying by a fraction very close to 1.0. So as soon as a move is
> completed and the statistics accounted for, I degrade all siblings
> including the move that was played by multiplying by the degrade
> constant.
That's interesting.
Is your degradation constant really constant? If you keep it constant,
losing moves are guaranteed to be searched a fraction of the time. I
guess this would make it impossible to prove that your algorithm
converges to the optimal action. It is a little annoying not to have
this theoretical guarantee.
If you make the degradation "constant" go to 1, then your algorithm
converges to the best move. I would use a value of the kind x = 1 - C/t,
where C is a constant, and t is the number of real (not degraded)
simulations than went through this position. Besides the theoretical
aspect, my intuition tells me that it might be better in practice too.
It would be very interesting to compare your method to UCT. It would be
great if you could implement the UCT formula I gave in my previous
message, and report the result. Also, if you give your values for N and
the degradation constant, I would try your algorithm in Crazy Stone.
I believe your ideas and UCT can be combined. I will try that too.
Thank you very much for your very interesting message.
Rémi
More information about the computer-go
mailing list