Do semidefinite relaxations solve sparse PCA up to the information limit?

06/16/2013
by   Robert Krauthgamer, et al.
0

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an ℓ_0-sparse eigenvector, in the asymptotic regime as dimension p and sample size n both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels k≥Ω(n/ p), no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for k≤ O(√(n/ p)), diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when k≥Ω(√(n)), the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of ℓ_0-sparsity k≥Ω(√(n)). Finally, we present empirical results suggesting that up to sparsity levels k=O(√(n)), recovery is possible by a simple covariance thresholding algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro