Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization

Brendan McMahan

Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization

Brendan McMahan

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:525-533, 2011.

Abstract

We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations. In particular, even though the FOBOS composite mirror descent algorithm handles $L_1$ regularization explicitly, it has been observed that the FTRL-style Regularized Dual Averaging (RDA) algorithm is even more effective at producing sparsity. Our results demonstrate that the key difference between these algorithms is how they handle the cumulative $L_1$ penalty. While FOBOS handles the $L_1$ term exactly on any given update, we show that it is effectively using subgradient approximations to the $L_1$ penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form. The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset.

Cite this Paper

BibTeX


@InProceedings{pmlr-v15-mcmahan11b,
  title = 	 {Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization},
  author = 	 {McMahan, Brendan},
  booktitle = 	 {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {525--533},
  year = 	 {2011},
  editor = 	 {Gordon, Geoffrey and Dunson, David and Dudík, Miroslav},
  volume = 	 {15},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Fort Lauderdale, FL, USA},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v15/mcmahan11b/mcmahan11b.pdf},
  url = 	 {https://proceedings.mlr.press/v15/mcmahan11b.html},
  abstract = 	 {We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations.  In particular, even though the FOBOS composite mirror descent algorithm handles $L_1$ regularization explicitly, it has been observed that the FTRL-style Regularized Dual Averaging (RDA) algorithm is even more effective at producing sparsity.  Our results demonstrate that the key difference between these algorithms is how they handle the cumulative $L_1$ penalty. While FOBOS handles the $L_1$ term exactly on any given update, we show that it is effectively using subgradient approximations to the $L_1$ penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form.  The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset.}
}

Endnote

%0 Conference Paper
%T Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization
%A Brendan McMahan
%B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2011
%E Geoffrey Gordon
%E David Dunson
%E Miroslav Dudík	
%F pmlr-v15-mcmahan11b
%I PMLR
%P 525--533
%U https://proceedings.mlr.press/v15/mcmahan11b.html
%V 15
%X We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations.  In particular, even though the FOBOS composite mirror descent algorithm handles $L_1$ regularization explicitly, it has been observed that the FTRL-style Regularized Dual Averaging (RDA) algorithm is even more effective at producing sparsity.  Our results demonstrate that the key difference between these algorithms is how they handle the cumulative $L_1$ penalty. While FOBOS handles the $L_1$ term exactly on any given update, we show that it is effectively using subgradient approximations to the $L_1$ penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form.  The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset.

RIS


TY  - CPAPER
TI  - Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization
AU  - Brendan McMahan
BT  - Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics
DA  - 2011/06/14
ED  - Geoffrey Gordon
ED  - David Dunson
ED  - Miroslav Dudík	
ID  - pmlr-v15-mcmahan11b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 15
SP  - 525
EP  - 533
L1  - http://proceedings.mlr.press/v15/mcmahan11b/mcmahan11b.pdf
UR  - https://proceedings.mlr.press/v15/mcmahan11b.html
AB  - We prove that many mirror descent algorithms for online convex optimization (such as online gradient descent) have an equivalent interpretation as follow-the-regularized-leader (FTRL) algorithms. This observation makes the relationships between many commonly used algorithms explicit, and provides theoretical insight on previous experimental observations.  In particular, even though the FOBOS composite mirror descent algorithm handles $L_1$ regularization explicitly, it has been observed that the FTRL-style Regularized Dual Averaging (RDA) algorithm is even more effective at producing sparsity.  Our results demonstrate that the key difference between these algorithms is how they handle the cumulative $L_1$ penalty. While FOBOS handles the $L_1$ term exactly on any given update, we show that it is effectively using subgradient approximations to the $L_1$ penalty from previous rounds, leading to less sparsity than RDA, which handles the cumulative penalty in closed form.  The FTRL-Proximal algorithm, which we introduce, can be seen as a hybrid of these two algorithms, and significantly outperforms both on a large, real-world dataset.
ER  -

APA


McMahan, B.. (2011). Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 15:525-533 Available from https://proceedings.mlr.press/v15/mcmahan11b.html.

Related Material

Download PDF