Faster cover trees

Mike Izbicki, Christian Shelton
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1162-1170, 2015.

Abstract

The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces. This paper makes cover trees even faster. In particular, we provide (1) a simpler definition of the cover tree that reduces the number of nodes from O(n) to exactly n, (2) an additional invariant that makes queries faster in practice, (3) algorithms for constructing and querying the tree in parallel on multiprocessor systems, and (4) a more cache efficient memory layout. On standard benchmark datasets, we reduce the number of distance computations by 10–50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-izbicki15, title = {Faster cover trees}, author = {Izbicki, Mike and Shelton, Christian}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {1162--1170}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/izbicki15.pdf}, url = {https://proceedings.mlr.press/v37/izbicki15.html}, abstract = {The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces. This paper makes cover trees even faster. In particular, we provide (1) a simpler definition of the cover tree that reduces the number of nodes from O(n) to exactly n, (2) an additional invariant that makes queries faster in practice, (3) algorithms for constructing and querying the tree in parallel on multiprocessor systems, and (4) a more cache efficient memory layout. On standard benchmark datasets, we reduce the number of distance computations by 10–50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.} }
Endnote
%0 Conference Paper %T Faster cover trees %A Mike Izbicki %A Christian Shelton %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-izbicki15 %I PMLR %P 1162--1170 %U https://proceedings.mlr.press/v37/izbicki15.html %V 37 %X The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces. This paper makes cover trees even faster. In particular, we provide (1) a simpler definition of the cover tree that reduces the number of nodes from O(n) to exactly n, (2) an additional invariant that makes queries faster in practice, (3) algorithms for constructing and querying the tree in parallel on multiprocessor systems, and (4) a more cache efficient memory layout. On standard benchmark datasets, we reduce the number of distance computations by 10–50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.
RIS
TY - CPAPER TI - Faster cover trees AU - Mike Izbicki AU - Christian Shelton BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-izbicki15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 1162 EP - 1170 L1 - http://proceedings.mlr.press/v37/izbicki15.pdf UR - https://proceedings.mlr.press/v37/izbicki15.html AB - The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces. This paper makes cover trees even faster. In particular, we provide (1) a simpler definition of the cover tree that reduces the number of nodes from O(n) to exactly n, (2) an additional invariant that makes queries faster in practice, (3) algorithms for constructing and querying the tree in parallel on multiprocessor systems, and (4) a more cache efficient memory layout. On standard benchmark datasets, we reduce the number of distance computations by 10–50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes. ER -
APA
Izbicki, M. & Shelton, C.. (2015). Faster cover trees. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1162-1170 Available from https://proceedings.mlr.press/v37/izbicki15.html.

Related Material