Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

Anshumali Shrivastava, Ping Li
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):557-565, 2014.

Abstract

The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), \em minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL+dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-shrivastava14, title = {Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search}, author = {Shrivastava, Anshumali and Li, Ping}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {557--565}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/shrivastava14.pdf}, url = {https://proceedings.mlr.press/v32/shrivastava14.html}, abstract = {The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), \em minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL+dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.} }
Endnote
%0 Conference Paper %T Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search %A Anshumali Shrivastava %A Ping Li %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-shrivastava14 %I PMLR %P 557--565 %U https://proceedings.mlr.press/v32/shrivastava14.html %V 32 %N 1 %X The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), \em minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL+dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.
RIS
TY - CPAPER TI - Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search AU - Anshumali Shrivastava AU - Ping Li BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-shrivastava14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 1 SP - 557 EP - 565 L1 - http://proceedings.mlr.press/v32/shrivastava14.pdf UR - https://proceedings.mlr.press/v32/shrivastava14.html AB - The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), \em minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL+dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies. ER -
APA
Shrivastava, A. & Li, P.. (2014). Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):557-565 Available from https://proceedings.mlr.press/v32/shrivastava14.html.

Related Material