Sparse CCA: Adaptive Estimation and Computational Barriers

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
Convex programming
group-Lasso
Minimax rates
Computational complexity
Planted Clique
Sparse CCA (SCCA)
Sparse PCA (SPCA)
Physical Sciences and Mathematics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Gao, Chao
Ma, Zongming
Zhou, Harrison
Contributor
Abstract

Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of variables. It has applications in many fields including genomics and imaging, to extract meaningful features as well as to use the features for subsequent analysis. This paper considers adaptive and computationally tractable estimation of leading sparse canonical directions when the ambient dimensions are high. Three intrinsically related problems are studied to fully address the topic. First, we establish the minimax rates of the problem under prediction loss. Separate minimax rates are obtained for canonical directions of each set of random variables under mild conditions. There is no structural assumption needed on the marginal covariance matrices as long as they are well conditioned. Second, we propose a computationally feasible two-stage estimation procedure, which consists of a convex programming based initialization stage and a group-Lasso based refinement stage, to attain the minimax rates under an additional sample size condition. Finally, we provide evidence that the additional sample size condition is essentially necessary for any randomized polynomial-time estimator to be consistent, assuming hardness of the Planted Clique detection problem. The computational lower bound is faithful to the Gaussian models used in the paper, which is achieved by a novel construction of the reduction scheme and an asymptotic equivalence theory for Gaussian discretization that is necessary for computational complexity to be well-defined. As a byproduct, we also obtain computational lower bound for the sparse PCA problem under the Gaussian spiked covariance model. This bridges a gap in the sparse PCA literature.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2016-01-01
Journal title
The Annals of Statistics
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection