On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

Pratik Jawanpuria, Manik Varma, Saketha Nath
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):118-126, 2014.

Abstract

Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-jawanpuria14, title = {On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection}, author = {Jawanpuria, Pratik and Varma, Manik and Nath, Saketha}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {118--126}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/jawanpuria14.pdf}, url = {https://proceedings.mlr.press/v32/jawanpuria14.html}, abstract = {Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.} }
Endnote
%0 Conference Paper %T On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection %A Pratik Jawanpuria %A Manik Varma %A Saketha Nath %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-jawanpuria14 %I PMLR %P 118--126 %U https://proceedings.mlr.press/v32/jawanpuria14.html %V 32 %N 2 %X Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware.
RIS
TY - CPAPER TI - On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection AU - Pratik Jawanpuria AU - Manik Varma AU - Saketha Nath BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-jawanpuria14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 118 EP - 126 L1 - http://proceedings.mlr.press/v32/jawanpuria14.pdf UR - https://proceedings.mlr.press/v32/jawanpuria14.html AB - Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l_p\geq1 regularization (l_p-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l_p-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l_p-MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l_p-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l_p-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware. ER -
APA
Jawanpuria, P., Varma, M. & Nath, S.. (2014). On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):118-126 Available from https://proceedings.mlr.press/v32/jawanpuria14.html.

Related Material