Memory Efficient Kernel Approximation

Si Si, Cho-Jui Hsieh, Inderjit Dhillon
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):701-709, 2014.

Abstract

The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the MNIST2M dataset with two-million samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nyström approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-si14, title = {Memory Efficient Kernel Approximation}, author = {Si, Si and Hsieh, Cho-Jui and Dhillon, Inderjit}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {701--709}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/si14.pdf}, url = {https://proceedings.mlr.press/v32/si14.html}, abstract = {The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the MNIST2M dataset with two-million samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nyström approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE.} }
Endnote
%0 Conference Paper %T Memory Efficient Kernel Approximation %A Si Si %A Cho-Jui Hsieh %A Inderjit Dhillon %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-si14 %I PMLR %P 701--709 %U https://proceedings.mlr.press/v32/si14.html %V 32 %N 1 %X The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the MNIST2M dataset with two-million samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nyström approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE.
RIS
TY - CPAPER TI - Memory Efficient Kernel Approximation AU - Si Si AU - Cho-Jui Hsieh AU - Inderjit Dhillon BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-si14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 1 SP - 701 EP - 709 L1 - http://proceedings.mlr.press/v32/si14.pdf UR - https://proceedings.mlr.press/v32/si14.html AB - The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a low rank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shift-invariant kernels changes from low-rank to block-diagonal (without any low-rank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms state-of-the-art low-rank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the MNIST2M dataset with two-million samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nyström approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE. ER -
APA
Si, S., Hsieh, C. & Dhillon, I.. (2014). Memory Efficient Kernel Approximation. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):701-709 Available from https://proceedings.mlr.press/v32/si14.html.

Related Material