Scalable Optimization of Neighbor Embedding for Visualization

Zhirong Yang, Jaakko Peltonen, Samuel Kaski
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):127-135, 2013.

Abstract

Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-yang13b, title = {Scalable Optimization of Neighbor Embedding for Visualization}, author = {Yang, Zhirong and Peltonen, Jaakko and Kaski, Samuel}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {127--135}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/yang13b.pdf}, url = {https://proceedings.mlr.press/v28/yang13b.html}, abstract = {Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.} }
Endnote
%0 Conference Paper %T Scalable Optimization of Neighbor Embedding for Visualization %A Zhirong Yang %A Jaakko Peltonen %A Samuel Kaski %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-yang13b %I PMLR %P 127--135 %U https://proceedings.mlr.press/v28/yang13b.html %V 28 %N 2 %X Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization.
RIS
TY - CPAPER TI - Scalable Optimization of Neighbor Embedding for Visualization AU - Zhirong Yang AU - Jaakko Peltonen AU - Samuel Kaski BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-yang13b PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 2 SP - 127 EP - 135 L1 - http://proceedings.mlr.press/v28/yang13b.pdf UR - https://proceedings.mlr.press/v28/yang13b.html AB - Neighbor embedding (NE) methods have found their use in data visualization but are limited in big data analysis tasks due to their O(n^2) complexity for n data samples. We demonstrate that the obvious approach of subsampling produces inferior results and propose a generic approximated optimization technique that reduces the NE optimization cost to O(n log n). The technique is based on realizing that in visualization the embedding space is necessarily very low-dimensional (2D or 3D), and hence efficient approximations developed for n-body force calculations can be applied. In gradient-based NE algorithms the gradient for an individual point decomposes into “forces” exerted by the other points. The contributions of close-by points need to be computed individually but far-away points can be approximated by their “center of mass”, rapidly computable by applying a recursive decomposition of the visualization space into quadrants. The new algorithm brings a significant speed-up for medium-size data, and brings “big data” within reach of visualization. ER -
APA
Yang, Z., Peltonen, J. & Kaski, S.. (2013). Scalable Optimization of Neighbor Embedding for Visualization. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(2):127-135 Available from https://proceedings.mlr.press/v28/yang13b.html.

Related Material