Online Learning with Composite Loss Functions

Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
Proceedings of The 27th Conference on Learning Theory, PMLR 35:1214-1231, 2014.

Abstract

We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v35-dekel14, title = {Online Learning with Composite Loss Functions}, author = {Dekel, Ofer and Ding, Jian and Koren, Tomer and Peres, Yuval}, booktitle = {Proceedings of The 27th Conference on Learning Theory}, pages = {1214--1231}, year = {2014}, editor = {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba}, volume = {35}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {13--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v35/dekel14.pdf}, url = {https://proceedings.mlr.press/v35/dekel14.html}, abstract = {We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.} }
Endnote
%0 Conference Paper %T Online Learning with Composite Loss Functions %A Ofer Dekel %A Jian Ding %A Tomer Koren %A Yuval Peres %B Proceedings of The 27th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2014 %E Maria Florina Balcan %E Vitaly Feldman %E Csaba Szepesvári %F pmlr-v35-dekel14 %I PMLR %P 1214--1231 %U https://proceedings.mlr.press/v35/dekel14.html %V 35 %X We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs.
RIS
TY - CPAPER TI - Online Learning with Composite Loss Functions AU - Ofer Dekel AU - Jian Ding AU - Tomer Koren AU - Yuval Peres BT - Proceedings of The 27th Conference on Learning Theory DA - 2014/05/29 ED - Maria Florina Balcan ED - Vitaly Feldman ED - Csaba Szepesvári ID - pmlr-v35-dekel14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 35 SP - 1214 EP - 1231 L1 - http://proceedings.mlr.press/v35/dekel14.pdf UR - https://proceedings.mlr.press/v35/dekel14.html AB - We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. This class includes problems where the algorithm’s loss is the \emphminimum over the recent adversarial values, the \emphmaximum over the recent values, or a \emphlinear combination of the recent values. We analyze the minimax regret of this class of problems when the algorithm receives bandit feedback, and prove that when the \emphminimum or \emphmaximum functions are used, the minimax regret is \widetilde Ω(T^2/3) (so called \emphhard online learning problems), and when a linear function is used, the minimax regret is \widetilde O(\sqrtT) (so called \empheasy learning problems). Previously, the only online learning problem that was known to be provably hard was the multi-armed bandit with switching costs. ER -
APA
Dekel, O., Ding, J., Koren, T. & Peres, Y.. (2014). Online Learning with Composite Loss Functions. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:1214-1231 Available from https://proceedings.mlr.press/v35/dekel14.html.

Related Material