[Computer-go] Question about exploration in CLOP
Remi.Coulom at free.fr
Tue Nov 15 09:50:45 PST 2011
My implementation is very basic (and inefficient). I use Gibbs sampling (ie, Metropolis-Hastings, one dimension at a time, which scales better to higher dimensions), with uniform samples over the parameter range. Details of the implementation are in CSPWeight.cpp. I found it is good enough in practice, but I will improve it. It should be easy to use the quadratic regression to define a candidate distribution that is much better than uniform.
Also, an unrelated note about priors: it is a good idea to use a pessimistic prior for the mean/lcb estimation, and a more optimistic prior for the regression. I did not mention it in the paper. It prevents the algorithm from iterating forever until the winning rate is close to 100%. It is not extremely critical for performance, but it may help a bit.
On 15 nov. 2011, at 17:59, Brian Sheppard wrote:
> I would like to know more about the exploration methods that you tested in
> CLOP. Let's start with Metropolis-Hastings.
> I understand Metropolis-Hastings as having a current point P, which has a
> weight Wp, and randomly sampling a point Q, which has weight Wq. Then your
> next point will be Q if Wq >= Wp, or if Wq < Wp then move to Q with
> probability Wq/Wp, and keep P otherwise. Do I have that right?
> My question concerns the space over which Q is sampled. Is it just random
> over the whole domain? Or a radius around P?
> Computer-go mailing list
> Computer-go at dvandva.org
More information about the Computer-go