Early Stopping as Nonparametric Variational Inference

David Duvenaud, Dougal Maclaurin, Ryan Adams
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1070-1077, 2016.

Abstract

We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-duvenaud16, title = {Early Stopping as Nonparametric Variational Inference}, author = {Duvenaud, David and Maclaurin, Dougal and Adams, Ryan}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1070--1077}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/duvenaud16.pdf}, url = {https://proceedings.mlr.press/v51/duvenaud16.html}, abstract = {We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.} }
Endnote
%0 Conference Paper %T Early Stopping as Nonparametric Variational Inference %A David Duvenaud %A Dougal Maclaurin %A Ryan Adams %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-duvenaud16 %I PMLR %P 1070--1077 %U https://proceedings.mlr.press/v51/duvenaud16.html %V 51 %X We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.
RIS
TY - CPAPER TI - Early Stopping as Nonparametric Variational Inference AU - David Duvenaud AU - Dougal Maclaurin AU - Ryan Adams BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-duvenaud16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1070 EP - 1077 L1 - http://proceedings.mlr.press/v51/duvenaud16.pdf UR - https://proceedings.mlr.press/v51/duvenaud16.html AB - We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models. ER -
APA
Duvenaud, D., Maclaurin, D. & Adams, R.. (2016). Early Stopping as Nonparametric Variational Inference. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1070-1077 Available from https://proceedings.mlr.press/v51/duvenaud16.html.

Related Material