Multi-Player Bandits – a Musical Chairs Approach

Jonathan Rosenski, Ohad Shamir, Liran Szlak
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:155-163, 2016.

Abstract

We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-rosenski16, title = {Multi-Player Bandits -- a Musical Chairs Approach}, author = {Rosenski, Jonathan and Shamir, Ohad and Szlak, Liran}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {155--163}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/rosenski16.pdf}, url = {https://proceedings.mlr.press/v48/rosenski16.html}, abstract = {We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.} }
Endnote
%0 Conference Paper %T Multi-Player Bandits – a Musical Chairs Approach %A Jonathan Rosenski %A Ohad Shamir %A Liran Szlak %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-rosenski16 %I PMLR %P 155--163 %U https://proceedings.mlr.press/v48/rosenski16.html %V 48 %X We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.
RIS
TY - CPAPER TI - Multi-Player Bandits – a Musical Chairs Approach AU - Jonathan Rosenski AU - Ohad Shamir AU - Liran Szlak BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-rosenski16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 155 EP - 163 L1 - http://proceedings.mlr.press/v48/rosenski16.pdf UR - https://proceedings.mlr.press/v48/rosenski16.html AB - We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees. ER -
APA
Rosenski, J., Shamir, O. & Szlak, L.. (2016). Multi-Player Bandits – a Musical Chairs Approach. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:155-163 Available from https://proceedings.mlr.press/v48/rosenski16.html.

Related Material