Home Page

Papers

Submissions

News

Editorial Board

Open Source Software

Proceedings (PMLR)

Transactions (TMLR)

Search

Statistics

Login

Frequently Asked Questions

Contact Us



RSS Feed

Submatrix localization via message passing

Bruce Hajek, Yihong Wu, Jiaming Xu; 18(186):1−52, 2018.

Abstract

The principal submatrix localization problem deals with recovering a $K\times K$ principal submatrix of elevated mean $\mu$ in a large $n\times n$ symmetric matrix subject to additive standard Gaussian noise, or more generally, mean zero, variance one, subgaussian noise. This problem serves as a prototypical example for community detection, in which the community corresponds to the support of the submatrix. The main result of this paper is that in the regime $\Omega(\sqrt{n}) \leq K \leq o(n)$, the support of the submatrix can be weakly recovered (with $o(K)$ misclassification errors on average) by an optimized message passing algorithm if $\lambda = \mu^2K^2/n$, the signal-to-noise ratio, exceeds $1/e$. This extends a result by Deshpande and Montanari previously obtained for $K=\Theta(\sqrt{n})$ and $\mu=\Theta(1).$ In addition, the algorithm can be combined with a voting procedure to achieve the information-theoretic limit of exact recovery with sharp constants for all $K \geq \frac{n}{\log n} (\frac{1}{8e} + o(1))$. The total running time of the algorithm is $O(n^2\log n)$. Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a $K_1\times K_2$ submatrix of elevated mean $\mu$ in a large $n_1\times n_2$ Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming $\Omega(\sqrt{n_i}) \leq K_i \leq o(n_i)$ and $K_1\asymp K_2.$ A sharp information-theoretic condition for the weak recovery of both clusters is also identified.

[abs][pdf][bib]       
© JMLR 2018. (edit, beta)