A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:218-227, 2015.

Abstract

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-gajane15, title = {A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits}, author = {Gajane, Pratik and Urvoy, Tanguy and Clérot, Fabrice}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {218--227}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/gajane15.pdf}, url = {https://proceedings.mlr.press/v37/gajane15.html}, abstract = {We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.} }
Endnote
%0 Conference Paper %T A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits %A Pratik Gajane %A Tanguy Urvoy %A Fabrice Clérot %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-gajane15 %I PMLR %P 218--227 %U https://proceedings.mlr.press/v37/gajane15.html %V 37 %X We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
RIS
TY - CPAPER TI - A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits AU - Pratik Gajane AU - Tanguy Urvoy AU - Fabrice Clérot BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-gajane15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 218 EP - 227 L1 - http://proceedings.mlr.press/v37/gajane15.pdf UR - https://proceedings.mlr.press/v37/gajane15.html AB - We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications. ER -
APA
Gajane, P., Urvoy, T. & Clérot, F.. (2015). A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:218-227 Available from https://proceedings.mlr.press/v37/gajane15.html.

Related Material