Sparse Principal Component Analysis and Iterative Thresholding

Ma, Zongming

Sparse Principal Component Analysis and Iterative Thresholding

Files

euclid.aos.1368018173.pdf (372.02 KB)

Penn collection

Statistics Papers

Subject

dimension reduction
high-dimensional statistics
principal component analysis
principal subspace
sparsity
spiked covariance model
thresholding
Statistics and Probability

Permalink

https://repository.upenn.edu/handle/20.500.14332/47666

View all metadata

Author

Ma, Zongming

Abstract

. Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

Publication date

2013-01-01

Journal title

Annals of Statistics

Collection

Articles