[Computer-go] Multi-armed bandit problem theory

"Ingo Althöfer" 3-Hirn-Verlag at gmx.de
Wed Oct 26 04:56:03 PDT 2011


Not a direct answer, but some bit of information:
Bandit theory started in the early 1950' by Herbert Robbins
(the same Robbins from the 1985 paper). However, he did
not prove best possible bounds in the seminal paper.

Ingo.




-------- Original-Nachricht --------
> Datum: Wed, 26 Oct 2011 11:23:54 +0200
> Von: Petr Baudis <pasky at ucw.cz>
> An: computer-go at computer-go.org
> Betreff: [Computer-go] Multi-armed bandit problem theory

>   Hi!
> 
>   Does anyone have a good source for understanding the theory behind
> the multi-armed bandit problem, i.e. the proof behind the exponential
> arm play bounds etc.? My only source so far is Auer et al., 2002:
> Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect
> its description of the original bound is incomplete and/or simplified
> with some implicit assumptions (i.e. in case of optimal arm, the bound
> would involve division by zero?).
> 
>   Everyone refers to Lai & Robbins, 1985 and Agrawal, 1995, but I'm
> unable to find these papers anywhere (my university JTOR subscription
> somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping
> that maybe I could grasp the details if I read those, does anyone have
> a copy?
> 
>   Thanks,
> 
> -- 
> 				Petr "Pasky" Baudis
> We live on an island surrounded by a sea of ignorance. As our island
> of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler
> _______________________________________________
> Computer-go mailing list
> Computer-go at dvandva.org
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de



More information about the Computer-go mailing list