Divide and Conquer Kernel Ridge Regression

Yuchen Zhang, John Duchi, Martin Wainwright
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:592-617, 2013.

Abstract

We study a decomposition-based scalable approach to performing kernel ridge regression. The method is simply described: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our main theorem establishes that despite the computational speed-up, statistical optimality is retained: that so long as m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N. As concrete examples, our theory guarantees that m may grow polynomially in N for Sobolev spaces, and nearly linearly for finite-rank kernels and Gaussian kernels. We conclude with simulations complementing our theoretical results and exhibiting the computational and statistical benefits of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v30-Zhang13, title = {Divide and Conquer Kernel Ridge Regression}, author = {Zhang, Yuchen and Duchi, John and Wainwright, Martin}, booktitle = {Proceedings of the 26th Annual Conference on Learning Theory}, pages = {592--617}, year = {2013}, editor = {Shalev-Shwartz, Shai and Steinwart, Ingo}, volume = {30}, series = {Proceedings of Machine Learning Research}, address = {Princeton, NJ, USA}, month = {12--14 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v30/Zhang13.pdf}, url = {https://proceedings.mlr.press/v30/Zhang13.html}, abstract = {We study a decomposition-based scalable approach to performing kernel ridge regression. The method is simply described: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our main theorem establishes that despite the computational speed-up, statistical optimality is retained: that so long as m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N. As concrete examples, our theory guarantees that m may grow polynomially in N for Sobolev spaces, and nearly linearly for finite-rank kernels and Gaussian kernels. We conclude with simulations complementing our theoretical results and exhibiting the computational and statistical benefits of our approach.} }
Endnote
%0 Conference Paper %T Divide and Conquer Kernel Ridge Regression %A Yuchen Zhang %A John Duchi %A Martin Wainwright %B Proceedings of the 26th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2013 %E Shai Shalev-Shwartz %E Ingo Steinwart %F pmlr-v30-Zhang13 %I PMLR %P 592--617 %U https://proceedings.mlr.press/v30/Zhang13.html %V 30 %X We study a decomposition-based scalable approach to performing kernel ridge regression. The method is simply described: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our main theorem establishes that despite the computational speed-up, statistical optimality is retained: that so long as m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N. As concrete examples, our theory guarantees that m may grow polynomially in N for Sobolev spaces, and nearly linearly for finite-rank kernels and Gaussian kernels. We conclude with simulations complementing our theoretical results and exhibiting the computational and statistical benefits of our approach.
RIS
TY - CPAPER TI - Divide and Conquer Kernel Ridge Regression AU - Yuchen Zhang AU - John Duchi AU - Martin Wainwright BT - Proceedings of the 26th Annual Conference on Learning Theory DA - 2013/06/13 ED - Shai Shalev-Shwartz ED - Ingo Steinwart ID - pmlr-v30-Zhang13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 30 SP - 592 EP - 617 L1 - http://proceedings.mlr.press/v30/Zhang13.pdf UR - https://proceedings.mlr.press/v30/Zhang13.html AB - We study a decomposition-based scalable approach to performing kernel ridge regression. The method is simply described: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our main theorem establishes that despite the computational speed-up, statistical optimality is retained: that so long as m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N. As concrete examples, our theory guarantees that m may grow polynomially in N for Sobolev spaces, and nearly linearly for finite-rank kernels and Gaussian kernels. We conclude with simulations complementing our theoretical results and exhibiting the computational and statistical benefits of our approach. ER -
APA
Zhang, Y., Duchi, J. & Wainwright, M.. (2013). Divide and Conquer Kernel Ridge Regression. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:592-617 Available from https://proceedings.mlr.press/v30/Zhang13.html.

Related Material