[Computer-go] FYI KL-UCB

Hideki Kato hideki_katoh at ybb.ne.jp
Mon Jul 22 23:50:07 PDT 2013


Thanks Lukasz,

For introducing such an interesting paper.

I have a quesion, though.  The second algorithm in Figures 1, 2 and 3 
is termed UCB2 but is apparently called MOSS in Sections 5 (and 1).  Do 
you know which algorithm is actually used in the numerical 
experiments?

BTW, I guess for MC Go programs, possibly the least "risky" algorithm be 
the best in practice, isn't it?

Hideki

ukasz Lew: <CAPXT8E4pMwmvkiiTuyHHpBVavgeUPGQLNnODyJoAmFGo0uOo_g at mail.gmail.com>:
>KL-UCB algorithm
>http://arxiv.org/pdf/1102.2490v4.pdf
>
>"Thus, KL-UCB is optimal for Bernoulli distributions and strictly dominates
>a-UCB for any
>bounded reward distributions."
>http://www.princeton.edu/~sbubeck/SurveyBCB12.pdf (page 18)
-- 
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>



More information about the Computer-go mailing list