Scalable SDE Filtering and Inference with Apache Spark

Harish S. Bhat, R. W. M. A. Madushani, Shagun Rawat
Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, PMLR 53:18-34, 2016.

Abstract

In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.

Cite this Paper


BibTeX
@InProceedings{pmlr-v53-bhat16, title = {Scalable SDE Filtering and Inference with Apache Spark}, author = {Bhat, Harish S. and Madushani, R. W. M. A. and Rawat, Shagun}, booktitle = {Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016}, pages = {18--34}, year = {2016}, editor = {Fan, Wei and Bifet, Albert and Read, Jesse and Yang, Qiang and Yu, Philip S.}, volume = {53}, series = {Proceedings of Machine Learning Research}, address = {San Francisco, California, USA}, month = {14 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v53/bhat16.pdf}, url = {https://proceedings.mlr.press/v53/bhat16.html}, abstract = {In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.} }
Endnote
%0 Conference Paper %T Scalable SDE Filtering and Inference with Apache Spark %A Harish S. Bhat %A R. W. M. A. Madushani %A Shagun Rawat %B Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 %C Proceedings of Machine Learning Research %D 2016 %E Wei Fan %E Albert Bifet %E Jesse Read %E Qiang Yang %E Philip S. Yu %F pmlr-v53-bhat16 %I PMLR %P 18--34 %U https://proceedings.mlr.press/v53/bhat16.html %V 53 %X In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series.
RIS
TY - CPAPER TI - Scalable SDE Filtering and Inference with Apache Spark AU - Harish S. Bhat AU - R. W. M. A. Madushani AU - Shagun Rawat BT - Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016 DA - 2016/12/06 ED - Wei Fan ED - Albert Bifet ED - Jesse Read ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v53-bhat16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 53 SP - 18 EP - 34 L1 - http://proceedings.mlr.press/v53/bhat16.pdf UR - https://proceedings.mlr.press/v53/bhat16.html AB - In this paper, we consider the problem of Bayesian filtering and inference for time series data modeled as noisy, discrete-time observations of a stochastic differential equation (SDE) with undetermined parameters. We develop a Metropolis algorithm to sample from the high-dimensional joint posterior density of all SDE parameters and state time series. Our approach relies on an innovative density tracking by quadrature (DTQ) method to compute the likelihood of the SDE, the part of the posterior that requires the most computational effort to evaluate. As we show, the DTQ method lends itself to a natural implementation using Scala and Apache Spark, an open source framework for scalable data mining. We study the performance and scalability of our algorithm on filtering and inference problems for both regularly and irregularly spaced time series. ER -
APA
Bhat, H.S., Madushani, R.W.M.A. & Rawat, S.. (2016). Scalable SDE Filtering and Inference with Apache Spark. Proceedings of the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications at KDD 2016, in Proceedings of Machine Learning Research 53:18-34 Available from https://proceedings.mlr.press/v53/bhat16.html.

Related Material