Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree

Zhinoos Razavi Hesabi, Timos Sellis, Xiuzhen Zhang
Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 41:19-32, 2015.

Abstract

With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v41-razavi15, title = {Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree}, author = {Razavi Hesabi, Zhinoos and Sellis, Timos and Zhang, Xiuzhen}, booktitle = {Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications}, pages = {19--32}, year = {2015}, editor = {Fan, Wei and Bifet, Albert and Yang, Qiang and Yu, Philip S.}, volume = {41}, series = {Proceedings of Machine Learning Research}, month = {10 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v41/razavi15.pdf}, url = {https://proceedings.mlr.press/v41/razavi15.html}, abstract = {With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches. } }
Endnote
%0 Conference Paper %T Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree %A Zhinoos Razavi Hesabi %A Timos Sellis %A Xiuzhen Zhang %B Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications %C Proceedings of Machine Learning Research %D 2015 %E Wei Fan %E Albert Bifet %E Qiang Yang %E Philip S. Yu %F pmlr-v41-razavi15 %I PMLR %P 19--32 %U https://proceedings.mlr.press/v41/razavi15.html %V 41 %X With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches.
RIS
TY - CPAPER TI - Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree AU - Zhinoos Razavi Hesabi AU - Timos Sellis AU - Xiuzhen Zhang BT - Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications DA - 2015/08/31 ED - Wei Fan ED - Albert Bifet ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v41-razavi15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 41 SP - 19 EP - 32 L1 - http://proceedings.mlr.press/v41/razavi15.pdf UR - https://proceedings.mlr.press/v41/razavi15.html AB - With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches. ER -
APA
Razavi Hesabi, Z., Sellis, T. & Zhang, X.. (2015). Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree. Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in Proceedings of Machine Learning Research 41:19-32 Available from https://proceedings.mlr.press/v41/razavi15.html.

Related Material