Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin, Tomer Koren, Oren Somekh
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1238-1246, 2013.

Abstract

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-karnin13, title = {Almost Optimal Exploration in Multi-Armed Bandits}, author = {Karnin, Zohar and Koren, Tomer and Somekh, Oren}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1238--1246}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/karnin13.pdf}, url = {https://proceedings.mlr.press/v28/karnin13.html}, abstract = {We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases. } }
Endnote
%0 Conference Paper %T Almost Optimal Exploration in Multi-Armed Bandits %A Zohar Karnin %A Tomer Koren %A Oren Somekh %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-karnin13 %I PMLR %P 1238--1246 %U https://proceedings.mlr.press/v28/karnin13.html %V 28 %N 3 %X We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.
RIS
TY - CPAPER TI - Almost Optimal Exploration in Multi-Armed Bandits AU - Zohar Karnin AU - Tomer Koren AU - Oren Somekh BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-karnin13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 1238 EP - 1246 L1 - http://proceedings.mlr.press/v28/karnin13.pdf UR - https://proceedings.mlr.press/v28/karnin13.html AB - We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases. ER -
APA
Karnin, Z., Koren, T. & Somekh, O.. (2013). Almost Optimal Exploration in Multi-Armed Bandits. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1238-1246 Available from https://proceedings.mlr.press/v28/karnin13.html.

Related Material