Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model

Xianghua Fu, Ting Wang, Jing Li, Chong Yu, Wangwang Liu
Proceedings of The 8th Asian Conference on Machine Learning, PMLR 63:190-205, 2016.

Abstract

We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v63-Fu60, title = {Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model}, author = {Fu, Xianghua and Wang, Ting and Li, Jing and Yu, Chong and Liu, Wangwang}, booktitle = {Proceedings of The 8th Asian Conference on Machine Learning}, pages = {190--205}, year = {2016}, editor = {Durrant, Robert J. and Kim, Kee-Eung}, volume = {63}, series = {Proceedings of Machine Learning Research}, address = {The University of Waikato, Hamilton, New Zealand}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v63/Fu60.pdf}, url = {https://proceedings.mlr.press/v63/Fu60.html}, abstract = {We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.} }
Endnote
%0 Conference Paper %T Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model %A Xianghua Fu %A Ting Wang %A Jing Li %A Chong Yu %A Wangwang Liu %B Proceedings of The 8th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Robert J. Durrant %E Kee-Eung Kim %F pmlr-v63-Fu60 %I PMLR %P 190--205 %U https://proceedings.mlr.press/v63/Fu60.html %V 63 %X We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models.
RIS
TY - CPAPER TI - Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model AU - Xianghua Fu AU - Ting Wang AU - Jing Li AU - Chong Yu AU - Wangwang Liu BT - Proceedings of The 8th Asian Conference on Machine Learning DA - 2016/11/20 ED - Robert J. Durrant ED - Kee-Eung Kim ID - pmlr-v63-Fu60 PB - PMLR DP - Proceedings of Machine Learning Research VL - 63 SP - 190 EP - 205 L1 - http://proceedings.mlr.press/v63/Fu60.pdf UR - https://proceedings.mlr.press/v63/Fu60.html AB - We propose a Word-Topic Mixture(WTM) model to improve word representation and topic model simultaneously. Firstly, it introduces the initial external word embeddings into the Topical Word Embeddings(TWE) model based on Latent Dirichlet Allocation(LDA) model to learn word embeddings and topic vectors. Then the results learned from TWE are integrated in the LDA by defining the probability distribution of topic vectors-word embeddings according to the idea of latent feature model with LDA (LFLDA), meanwhile minimizing the KL divergence of the new topic-word distribution function and the original one. The experimental results prove that the WTM model performs better on word representation and topic detection compared with some state-of-the-art models. ER -
APA
Fu, X., Wang, T., Li, J., Yu, C. & Liu, W.. (2016). Improving Distributed Word Representation and Topic Model by Word-Topic Mixture Model. Proceedings of The 8th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 63:190-205 Available from https://proceedings.mlr.press/v63/Fu60.html.

Related Material