[Computer-go] Multi-armed bandit problem theory

Petr Baudis pasky at ucw.cz
Wed Oct 26 02:23:54 PDT 2011


  Does anyone have a good source for understanding the theory behind
the multi-armed bandit problem, i.e. the proof behind the exponential
arm play bounds etc.? My only source so far is Auer et al., 2002:
Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect
its description of the original bound is incomplete and/or simplified
with some implicit assumptions (i.e. in case of optimal arm, the bound
would involve division by zero?).

  Everyone refers to Lai & Robbins, 1985 and Agrawal, 1995, but I'm
unable to find these papers anywhere (my university JTOR subscription
somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping
that maybe I could grasp the details if I read those, does anyone have
a copy?


				Petr "Pasky" Baudis
We live on an island surrounded by a sea of ignorance. As our island
of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler

More information about the Computer-go mailing list