Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback

Ankan Saha; Ambuj Tewari

Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback

Ankan Saha, Ambuj Tewari

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:636-642, 2011.

Abstract

The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.

Cite this Paper

BibTeX


@InProceedings{pmlr-v15-saha11a,
  title = 	 {Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback},
  author = 	 {Saha, Ankan and Tewari, Ambuj},
  booktitle = 	 {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {636--642},
  year = 	 {2011},
  editor = 	 {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  volume = 	 {15},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Fort Lauderdale, FL, USA},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v15/saha11a/saha11a.pdf},
  url = 	 {https://proceedings.mlr.press/v15/saha11a.html},
  abstract = 	 {The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.}
}

Endnote

%0 Conference Paper
%T Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
%A Ankan Saha
%A Ambuj Tewari
%B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2011
%E Geoffrey Gordon
%E David Dunson
%E Miroslav Dudík	
%F pmlr-v15-saha11a
%I PMLR
%P 636--642
%U https://proceedings.mlr.press/v15/saha11a.html
%V 15
%X The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.

RIS


TY  - CPAPER
TI  - Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
AU  - Ankan Saha
AU  - Ambuj Tewari
BT  - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
DA  - 2011/06/14
ED  - Geoffrey Gordon
ED  - David Dunson
ED  - Miroslav Dudík	
ID  - pmlr-v15-saha11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 15
SP  - 636
EP  - 642
L1  - http://proceedings.mlr.press/v15/saha11a/saha11a.pdf
UR  - https://proceedings.mlr.press/v15/saha11a.html
AB  - The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is $O(T^{2/3})$.
ER  -

APA


Saha, A. & Tewari, A.. (2011). Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:636-642 Available from https://proceedings.mlr.press/v15/saha11a.html.

Related Material

Download PDF