[Computer-go] FYI KL-UCB
hideki_katoh at ybb.ne.jp
Mon Jul 22 23:50:07 PDT 2013
For introducing such an interesting paper.
I have a quesion, though. The second algorithm in Figures 1, 2 and 3
is termed UCB2 but is apparently called MOSS in Sections 5 (and 1). Do
you know which algorithm is actually used in the numerical
BTW, I guess for MC Go programs, possibly the least "risky" algorithm be
the best in practice, isn't it?
ukasz Lew: <CAPXT8E4pMwmvkiiTuyHHpBVavgeUPGQLNnODyJoAmFGo0uOo_g at mail.gmail.com>:
>"Thus, KL-UCB is optimal for Bernoulli distributions and strictly dominates
>a-UCB for any
>bounded reward distributions."
>http://www.princeton.edu/~sbubeck/SurveyBCB12.pdf (page 18)
Hideki Kato <mailto:hideki_katoh at ybb.ne.jp>
More information about the Computer-go