On Estimation and Selection for Topic Models

Matt Taddy
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1184-1193, 2012.

Abstract

This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-taddy12, title = {On Estimation and Selection for Topic Models}, author = {Taddy, Matt}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {1184--1193}, year = {2012}, editor = {Lawrence, Neil D. and Girolami, Mark}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/taddy12/taddy12.pdf}, url = {https://proceedings.mlr.press/v22/taddy12.html}, abstract = {This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.} }
Endnote
%0 Conference Paper %T On Estimation and Selection for Topic Models %A Matt Taddy %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-taddy12 %I PMLR %P 1184--1193 %U https://proceedings.mlr.press/v22/taddy12.html %V 22 %X This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques.
RIS
TY - CPAPER TI - On Estimation and Selection for Topic Models AU - Matt Taddy BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-taddy12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 22 SP - 1184 EP - 1193 L1 - http://proceedings.mlr.press/v22/taddy12/taddy12.pdf UR - https://proceedings.mlr.press/v22/taddy12.html AB - This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques. ER -
APA
Taddy, M.. (2012). On Estimation and Selection for Topic Models. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:1184-1193 Available from https://proceedings.mlr.press/v22/taddy12.html.

Related Material