Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

Jason Chuang, Sonal Gupta, Christopher Manning, Jeffrey Heer
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):612-620, 2013.

Abstract

The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-chuang13, title = {Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment}, author = {Chuang, Jason and Gupta, Sonal and Manning, Christopher and Heer, Jeffrey}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {612--620}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/chuang13.pdf}, url = {https://proceedings.mlr.press/v28/chuang13.html}, abstract = {The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.} }
Endnote
%0 Conference Paper %T Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment %A Jason Chuang %A Sonal Gupta %A Christopher Manning %A Jeffrey Heer %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-chuang13 %I PMLR %P 612--620 %U https://proceedings.mlr.press/v28/chuang13.html %V 28 %N 3 %X The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.
RIS
TY - CPAPER TI - Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment AU - Jason Chuang AU - Sonal Gupta AU - Christopher Manning AU - Jeffrey Heer BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-chuang13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 612 EP - 620 L1 - http://proceedings.mlr.press/v28/chuang13.pdf UR - https://proceedings.mlr.press/v28/chuang13.html AB - The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure they are meaningful. We introduce a framework to support large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality. ER -
APA
Chuang, J., Gupta, S., Manning, C. & Heer, J.. (2013). Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):612-620 Available from https://proceedings.mlr.press/v28/chuang13.html.

Related Material