Learning Efficient Anomaly Detectors from K-NN Graphs

Jonathan Root, Jing Qian, Venkatesh Saligrama
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:790-799, 2015.

Abstract

We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average K-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-root15, title = {{Learning Efficient Anomaly Detectors from K-NN Graphs}}, author = {Root, Jonathan and Qian, Jing and Saligrama, Venkatesh}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {790--799}, year = {2015}, editor = {Lebanon, Guy and Vishwanathan, S. V. N.}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/root15.pdf}, url = {https://proceedings.mlr.press/v38/root15.html}, abstract = {We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average K-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings.} }
Endnote
%0 Conference Paper %T Learning Efficient Anomaly Detectors from K-NN Graphs %A Jonathan Root %A Jing Qian %A Venkatesh Saligrama %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-root15 %I PMLR %P 790--799 %U https://proceedings.mlr.press/v38/root15.html %V 38 %X We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average K-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings.
RIS
TY - CPAPER TI - Learning Efficient Anomaly Detectors from K-NN Graphs AU - Jonathan Root AU - Jing Qian AU - Venkatesh Saligrama BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-root15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 790 EP - 799 L1 - http://proceedings.mlr.press/v38/root15.pdf UR - https://proceedings.mlr.press/v38/root15.html AB - We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average K-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings. ER -
APA
Root, J., Qian, J. & Saligrama, V.. (2015). Learning Efficient Anomaly Detectors from K-NN Graphs. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:790-799 Available from https://proceedings.mlr.press/v38/root15.html.

Related Material