Clusterability: A Theoretical Study

Margareta Ackerman; Shai Ben-David

Clusterability: A Theoretical Study

Margareta Ackerman, Shai Ben-David

Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, PMLR 5:1-8, 2009.

Abstract

We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation model. We survey several notions of clusterability that have been discussed in the literature, as well as propose a new notion of data clusterability. Our comparison of these notions reveals that, although they all attempt to evaluate the same intuitive property, they are pairwise inconsistent. Our analysis discovers an interesting phenomenon; Although most of the common clustering tasks are NP-hard, finding a close-to-optimal clustering for well-clusterable data sets is easy (computationally). We prove instances of this general claim with respect to the various clusterability notions that we discuss. Finally, we investigate how hard it is to determine the clusterability value of a given data set. In most cases, it turns out that this is an NP-hard problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v5-ackerman09a,
  title = 	 {Clusterability: A Theoretical Study},
  author = 	 {Ackerman, Margareta and Ben-David, Shai},
  booktitle = 	 {Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1--8},
  year = 	 {2009},
  editor = 	 {van Dyk, David and Welling, Max},
  volume = 	 {5},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v5/ackerman09a/ackerman09a.pdf},
  url = 	 {https://proceedings.mlr.press/v5/ackerman09a.html},
  abstract = 	 {We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation  model.    We survey several notions of clusterability that have been discussed in the literature, as well as propose a new notion of data clusterability.    Our comparison of these notions reveals that, although they all attempt to evaluate the same intuitive property, they are pairwise inconsistent.    Our analysis discovers an interesting phenomenon; Although most of the common clustering tasks are NP-hard, finding a close-to-optimal clustering for well-clusterable data sets is easy (computationally). We prove instances of this general claim with respect to the various clusterability notions that we discuss.    Finally, we investigate how hard it is to determine the clusterability value of a given data set. In most cases, it turns out that this is an NP-hard problem.}
}

Endnote

%0 Conference Paper
%T Clusterability: A Theoretical Study
%A Margareta Ackerman
%A Shai Ben-David
%B Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2009
%E David van Dyk
%E Max Welling	
%F pmlr-v5-ackerman09a
%I PMLR
%P 1--8
%U https://proceedings.mlr.press/v5/ackerman09a.html
%V 5
%X We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation  model.    We survey several notions of clusterability that have been discussed in the literature, as well as propose a new notion of data clusterability.    Our comparison of these notions reveals that, although they all attempt to evaluate the same intuitive property, they are pairwise inconsistent.    Our analysis discovers an interesting phenomenon; Although most of the common clustering tasks are NP-hard, finding a close-to-optimal clustering for well-clusterable data sets is easy (computationally). We prove instances of this general claim with respect to the various clusterability notions that we discuss.    Finally, we investigate how hard it is to determine the clusterability value of a given data set. In most cases, it turns out that this is an NP-hard problem.

RIS


TY  - CPAPER
TI  - Clusterability: A Theoretical Study
AU  - Margareta Ackerman
AU  - Shai Ben-David
BT  - Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics
DA  - 2009/04/15
ED  - David van Dyk
ED  - Max Welling	
ID  - pmlr-v5-ackerman09a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 5
SP  - 1
EP  - 8
L1  - http://proceedings.mlr.press/v5/ackerman09a/ackerman09a.pdf
UR  - https://proceedings.mlr.press/v5/ackerman09a.html
AB  - We investigate measures of the clusterability of data sets. Namely, ways to define how ‘strong’ or ‘conclusive’ is the clustering structure of a given data set. We address this issue with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specific data generation  model.    We survey several notions of clusterability that have been discussed in the literature, as well as propose a new notion of data clusterability.    Our comparison of these notions reveals that, although they all attempt to evaluate the same intuitive property, they are pairwise inconsistent.    Our analysis discovers an interesting phenomenon; Although most of the common clustering tasks are NP-hard, finding a close-to-optimal clustering for well-clusterable data sets is easy (computationally). We prove instances of this general claim with respect to the various clusterability notions that we discuss.    Finally, we investigate how hard it is to determine the clusterability value of a given data set. In most cases, it turns out that this is an NP-hard problem.
ER  -

APA


Ackerman, M. & Ben-David, S.. (2009). Clusterability: A Theoretical Study. Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 5:1-8 Available from https://proceedings.mlr.press/v5/ackerman09a.html.

Related Material

Download PDF