A simple multi-armed bandit algorithm with optimal variation-bounded regret

Elad Hazan, Satyen Kale
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:817-820, 2011.

Abstract

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v19-hazan11b, title = {A simple multi-armed bandit algorithm with optimal variation-bounded regret}, author = {Hazan, Elad and Kale, Satyen}, booktitle = {Proceedings of the 24th Annual Conference on Learning Theory}, pages = {817--820}, year = {2011}, editor = {Kakade, Sham M. and von Luxburg, Ulrike}, volume = {19}, series = {Proceedings of Machine Learning Research}, address = {Budapest, Hungary}, month = {09--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v19/hazan11b/hazan11b.pdf}, url = {https://proceedings.mlr.press/v19/hazan11b.html}, abstract = {We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.} }
Endnote
%0 Conference Paper %T A simple multi-armed bandit algorithm with optimal variation-bounded regret %A Elad Hazan %A Satyen Kale %B Proceedings of the 24th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2011 %E Sham M. Kakade %E Ulrike von Luxburg %F pmlr-v19-hazan11b %I PMLR %P 817--820 %U https://proceedings.mlr.press/v19/hazan11b.html %V 19 %X We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms.
RIS
TY - CPAPER TI - A simple multi-armed bandit algorithm with optimal variation-bounded regret AU - Elad Hazan AU - Satyen Kale BT - Proceedings of the 24th Annual Conference on Learning Theory DA - 2011/12/21 ED - Sham M. Kakade ED - Ulrike von Luxburg ID - pmlr-v19-hazan11b PB - PMLR DP - Proceedings of Machine Learning Research VL - 19 SP - 817 EP - 820 L1 - http://proceedings.mlr.press/v19/hazan11b/hazan11b.pdf UR - https://proceedings.mlr.press/v19/hazan11b.html AB - We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of $O(\sqrt{Q \log T})$, where $Q$ is the total quadratic variation of all the arms. ER -
APA
Hazan, E. & Kale, S.. (2011). A simple multi-armed bandit algorithm with optimal variation-bounded regret. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:817-820 Available from https://proceedings.mlr.press/v19/hazan11b.html.

Related Material