SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Chen Luo, Anshumali Shrivastava
Proceedings of the Time Series Workshop at NIPS 2016, PMLR 55:38-58, 2017.

Abstract

Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v55-luo16, title = {{SSH} (Sketch, Shingle, \& Hash) for Indexing Massive-Scale Time Series}, author = {Luo, Chen and Shrivastava, Anshumali}, booktitle = {Proceedings of the Time Series Workshop at NIPS 2016}, pages = {38--58}, year = {2017}, editor = {Anava, Oren and Khaleghi, Azadeh and Cuturi, Marco and Kuznetsov, Vitaly and Rakhlin, Alexander}, volume = {55}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {09 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v55/luo16.pdf}, url = {https://proceedings.mlr.press/v55/luo16.html}, abstract = {Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy. } }
Endnote
%0 Conference Paper %T SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series %A Chen Luo %A Anshumali Shrivastava %B Proceedings of the Time Series Workshop at NIPS 2016 %C Proceedings of Machine Learning Research %D 2017 %E Oren Anava %E Azadeh Khaleghi %E Marco Cuturi %E Vitaly Kuznetsov %E Alexander Rakhlin %F pmlr-v55-luo16 %I PMLR %P 38--58 %U https://proceedings.mlr.press/v55/luo16.html %V 55 %X Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
RIS
TY - CPAPER TI - SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series AU - Chen Luo AU - Anshumali Shrivastava BT - Proceedings of the Time Series Workshop at NIPS 2016 DA - 2017/02/16 ED - Oren Anava ED - Azadeh Khaleghi ED - Marco Cuturi ED - Vitaly Kuznetsov ED - Alexander Rakhlin ID - pmlr-v55-luo16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 55 SP - 38 EP - 58 L1 - http://proceedings.mlr.press/v55/luo16.pdf UR - https://proceedings.mlr.press/v55/luo16.html AB - Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy. ER -
APA
Luo, C. & Shrivastava, A.. (2017). SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series. Proceedings of the Time Series Workshop at NIPS 2016, in Proceedings of Machine Learning Research 55:38-58 Available from https://proceedings.mlr.press/v55/luo16.html.

Related Material