Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs

Jon Parker, Hans Engler
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:743-750, 2014.

Abstract

Sampling from a probability mass function (PMF) has many applications in modern computing. This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from. All samples are drawn from Q as opposed to the original input distribution P. We say that Q “spoofs” P while this switch is difficult to detect with a statistical test. The lifetime of Q is the sample size required to detect the switch from P to Q. We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted. These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v33-parker14, title = {{Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs}}, author = {Parker, Jon and Engler, Hans}, booktitle = {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics}, pages = {743--750}, year = {2014}, editor = {Kaski, Samuel and Corander, Jukka}, volume = {33}, series = {Proceedings of Machine Learning Research}, address = {Reykjavik, Iceland}, month = {22--25 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v33/parker14.pdf}, url = {https://proceedings.mlr.press/v33/parker14.html}, abstract = {Sampling from a probability mass function (PMF) has many applications in modern computing. This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from. All samples are drawn from Q as opposed to the original input distribution P. We say that Q “spoofs” P while this switch is difficult to detect with a statistical test. The lifetime of Q is the sample size required to detect the switch from P to Q. We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted. These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.} }
Endnote
%0 Conference Paper %T Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs %A Jon Parker %A Hans Engler %B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2014 %E Samuel Kaski %E Jukka Corander %F pmlr-v33-parker14 %I PMLR %P 743--750 %U https://proceedings.mlr.press/v33/parker14.html %V 33 %X Sampling from a probability mass function (PMF) has many applications in modern computing. This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from. All samples are drawn from Q as opposed to the original input distribution P. We say that Q “spoofs” P while this switch is difficult to detect with a statistical test. The lifetime of Q is the sample size required to detect the switch from P to Q. We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted. These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling.
RIS
TY - CPAPER TI - Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs AU - Jon Parker AU - Hans Engler BT - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics DA - 2014/04/02 ED - Samuel Kaski ED - Jukka Corander ID - pmlr-v33-parker14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 33 SP - 743 EP - 750 L1 - http://proceedings.mlr.press/v33/parker14.pdf UR - https://proceedings.mlr.press/v33/parker14.html AB - Sampling from a probability mass function (PMF) has many applications in modern computing. This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. This compression method closely approximates an input PMF P with another PMF Q that is easy to store and sample from. All samples are drawn from Q as opposed to the original input distribution P. We say that Q “spoofs” P while this switch is difficult to detect with a statistical test. The lifetime of Q is the sample size required to detect the switch from P to Q. We show how to compute a single PMF’s lifetime and present numeric examples demonstrating compression rates ranging from 62% to 75% when the input PMF is not sorted and 88% to 99% when the input is already sorted. These examples have speed ups ranging from 1.47 to 2.82 compared to binary search sampling. ER -
APA
Parker, J. & Engler, H.. (2014). Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:743-750 Available from https://proceedings.mlr.press/v33/parker14.html.

Related Material