Stochastic Neighbor Compression

Matt Kusner; Stephen Tyree; Kilian Weinberger; Kunal Agrawal

Stochastic Neighbor Compression

Matt Kusner, Stephen Tyree, Kilian Weinberger, Kunal Agrawal

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):622-630, 2014.

Abstract

We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction, demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-kusner14,
  title = 	 {Stochastic Neighbor Compression},
  author = 	 {Kusner, Matt and Tyree, Stephen and Weinberger, Kilian and Agrawal, Kunal},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {622--630},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/kusner14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/kusner14.html},
  abstract = 	 {We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction, demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.}
}

Endnote

%0 Conference Paper
%T Stochastic Neighbor Compression
%A Matt Kusner
%A Stephen Tyree
%A Kilian Weinberger
%A Kunal Agrawal
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-kusner14
%I PMLR
%P 622--630
%U https://proceedings.mlr.press/v32/kusner14.html
%V 32
%N 2
%X We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction, demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.

RIS


TY  - CPAPER
TI  - Stochastic Neighbor Compression
AU  - Matt Kusner
AU  - Stephen Tyree
AU  - Kilian Weinberger
AU  - Kunal Agrawal
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-kusner14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 622
EP  - 630
L1  - http://proceedings.mlr.press/v32/kusner14.pdf
UR  - https://proceedings.mlr.press/v32/kusner14.html
AB  - We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction, demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.
ER  -

APA


Kusner, M., Tyree, S., Weinberger, K. & Agrawal, K.. (2014). Stochastic Neighbor Compression. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):622-630 Available from https://proceedings.mlr.press/v32/kusner14.html.

Related Material

Download PDF