Exponential Stochastic Cellular Automata for Massively Parallel Inference

Manzil Zaheer, Michael Wick, Jean-Baptiste Tristan, Alex Smola, Guy Steele
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:966-975, 2016.

Abstract

We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 20-node Amazon EC2 cluster samples at more than 1 billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-zaheer16, title = {Exponential Stochastic Cellular Automata for Massively Parallel Inference}, author = {Zaheer, Manzil and Wick, Michael and Tristan, Jean-Baptiste and Smola, Alex and Steele, Guy}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {966--975}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/zaheer16.pdf}, url = {https://proceedings.mlr.press/v51/zaheer16.html}, abstract = {We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 20-node Amazon EC2 cluster samples at more than 1 billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.} }
Endnote
%0 Conference Paper %T Exponential Stochastic Cellular Automata for Massively Parallel Inference %A Manzil Zaheer %A Michael Wick %A Jean-Baptiste Tristan %A Alex Smola %A Guy Steele %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-zaheer16 %I PMLR %P 966--975 %U https://proceedings.mlr.press/v51/zaheer16.html %V 51 %X We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 20-node Amazon EC2 cluster samples at more than 1 billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.
RIS
TY - CPAPER TI - Exponential Stochastic Cellular Automata for Massively Parallel Inference AU - Manzil Zaheer AU - Michael Wick AU - Jean-Baptiste Tristan AU - Alex Smola AU - Guy Steele BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-zaheer16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 966 EP - 975 L1 - http://proceedings.mlr.press/v51/zaheer16.pdf UR - https://proceedings.mlr.press/v51/zaheer16.html AB - We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 20-node Amazon EC2 cluster samples at more than 1 billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference. ER -
APA
Zaheer, M., Wick, M., Tristan, J., Smola, A. & Steele, G.. (2016). Exponential Stochastic Cellular Automata for Massively Parallel Inference. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:966-975 Available from https://proceedings.mlr.press/v51/zaheer16.html.

Related Material