Why Does Unsupervised Pre-training Help Deep Learning?

Dumitru Erhan; Aaron Courville; Yoshua Bengio; Pascal Vincent

Why Does Unsupervised Pre-training Help Deep Learning?

Dumitru Erhan, Aaron Courville, Yoshua Bengio, Pascal Vincent

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:201-208, 2010.

Abstract

Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants with impressive results being obtained in several areas, mostly on vision and language datasets. The best results obtained on supervised learning tasks often involve an unsupervised learning component, usually in an unsupervised pre-training phase. The main question investigated here is the following: why does unsupervised pre-training work so well? Through extensive experimentation, we explore several possible explanations discussed in the literature including its action as a regularizer (Erhan et al. 2009) and as an aid to optimization (Bengio et al. 2007). Our results build on the work of Erhan et al. 2009, showing that unsupervised pre-training appears to play predominantly a regularization role in subsequent supervised training. However our results in an online setting, with a virtually unlimited data stream, point to a somewhat more nuanced interpretation of the roles of optimization and regularization in the unsupervised pre-training effect.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-erhan10a,
  title = 	 {Why Does Unsupervised Pre-training Help Deep Learning?},
  author = 	 {Erhan, Dumitru and Courville, Aaron and Bengio, Yoshua and Vincent, Pascal},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {201--208},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/erhan10a/erhan10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/erhan10a.html},
  abstract = 	 {Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder   variants with impressive results being obtained in several areas, mostly   on vision and language datasets.  The best results obtained on supervised   learning tasks often involve an unsupervised learning component, usually   in an unsupervised pre-training phase. The main question investigated   here is the following: why does unsupervised pre-training work so well?   Through extensive experimentation, we explore several possible   explanations discussed in the literature including its action as a   regularizer (Erhan et al. 2009) and as an aid to optimization   (Bengio et al. 2007).  Our results build on the work of   Erhan et al. 2009, showing that unsupervised pre-training appears to   play predominantly a regularization role in subsequent supervised   training. However our results in an online setting, with a virtually unlimited   data stream, point to a somewhat more nuanced interpretation of the roles   of optimization and regularization in the unsupervised pre-training   effect.}
}

Endnote

%0 Conference Paper
%T Why Does Unsupervised Pre-training Help Deep Learning?
%A Dumitru Erhan
%A Aaron Courville
%A Yoshua Bengio
%A Pascal Vincent
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-erhan10a
%I PMLR
%P 201--208
%U https://proceedings.mlr.press/v9/erhan10a.html
%V 9
%X Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder   variants with impressive results being obtained in several areas, mostly   on vision and language datasets.  The best results obtained on supervised   learning tasks often involve an unsupervised learning component, usually   in an unsupervised pre-training phase. The main question investigated   here is the following: why does unsupervised pre-training work so well?   Through extensive experimentation, we explore several possible   explanations discussed in the literature including its action as a   regularizer (Erhan et al. 2009) and as an aid to optimization   (Bengio et al. 2007).  Our results build on the work of   Erhan et al. 2009, showing that unsupervised pre-training appears to   play predominantly a regularization role in subsequent supervised   training. However our results in an online setting, with a virtually unlimited   data stream, point to a somewhat more nuanced interpretation of the roles   of optimization and regularization in the unsupervised pre-training   effect.

RIS


TY  - CPAPER
TI  - Why Does Unsupervised Pre-training Help Deep Learning?
AU  - Dumitru Erhan
AU  - Aaron Courville
AU  - Yoshua Bengio
AU  - Pascal Vincent
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-erhan10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 201
EP  - 208
L1  - http://proceedings.mlr.press/v9/erhan10a/erhan10a.pdf
UR  - https://proceedings.mlr.press/v9/erhan10a.html
AB  - Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder   variants with impressive results being obtained in several areas, mostly   on vision and language datasets.  The best results obtained on supervised   learning tasks often involve an unsupervised learning component, usually   in an unsupervised pre-training phase. The main question investigated   here is the following: why does unsupervised pre-training work so well?   Through extensive experimentation, we explore several possible   explanations discussed in the literature including its action as a   regularizer (Erhan et al. 2009) and as an aid to optimization   (Bengio et al. 2007).  Our results build on the work of   Erhan et al. 2009, showing that unsupervised pre-training appears to   play predominantly a regularization role in subsequent supervised   training. However our results in an online setting, with a virtually unlimited   data stream, point to a somewhat more nuanced interpretation of the roles   of optimization and regularization in the unsupervised pre-training   effect.
ER  -

APA


Erhan, D., Courville, A., Bengio, Y. & Vincent, P.. (2010). Why Does Unsupervised Pre-training Help Deep Learning?. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:201-208 Available from https://proceedings.mlr.press/v9/erhan10a.html.

Related Material

Download PDF