A Light Touch for Heavily Constrained SGD

Andrew Cotter, Maya Gupta, Jan Pfeifer
29th Annual Conference on Learning Theory, PMLR 49:729-771, 2016.

Abstract

Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-cotter16, title = {A Light Touch for Heavily Constrained SGD}, author = {Cotter, Andrew and Gupta, Maya and Pfeifer, Jan}, booktitle = {29th Annual Conference on Learning Theory}, pages = {729--771}, year = {2016}, editor = {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/cotter16.pdf}, url = {https://proceedings.mlr.press/v49/cotter16.html}, abstract = {Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.} }
Endnote
%0 Conference Paper %T A Light Touch for Heavily Constrained SGD %A Andrew Cotter %A Maya Gupta %A Jan Pfeifer %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-cotter16 %I PMLR %P 729--771 %U https://proceedings.mlr.press/v49/cotter16.html %V 49 %X Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints.
RIS
TY - CPAPER TI - A Light Touch for Heavily Constrained SGD AU - Andrew Cotter AU - Maya Gupta AU - Jan Pfeifer BT - 29th Annual Conference on Learning Theory DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-cotter16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 49 SP - 729 EP - 771 L1 - http://proceedings.mlr.press/v49/cotter16.pdf UR - https://proceedings.mlr.press/v49/cotter16.html AB - Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the default choice for large-scale optimization in machine learning, but requires a projection after each update. For heavily-constrained objectives, we propose an efficient extension of SGD that stays close to the feasible region while only applying constraints probabilistically at each iteration. Theoretical analysis shows a compelling trade-off between per-iteration work and the number of iterations needed on problems with a large number of constraints. ER -
APA
Cotter, A., Gupta, M. & Pfeifer, J.. (2016). A Light Touch for Heavily Constrained SGD. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:729-771 Available from https://proceedings.mlr.press/v49/cotter16.html.

Related Material