Sparse CCA: Adaptive Estimation and Computational Barriers
Penn collection
Degree type
Discipline
Subject
group-Lasso
Minimax rates
Computational complexity
Planted Clique
Sparse CCA (SCCA)
Sparse PCA (SPCA)
Physical Sciences and Mathematics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Canonical correlation analysis (CCA) is a classical and important multivariate technique for exploring the relationship between two sets of variables. It has applications in many fields including genomics and imaging, to extract meaningful features as well as to use the features for subsequent analysis. This paper considers adaptive and computationally tractable estimation of leading sparse canonical directions when the ambient dimensions are high. Three intrinsically related problems are studied to fully address the topic. First, we establish the minimax rates of the problem under prediction loss. Separate minimax rates are obtained for canonical directions of each set of random variables under mild conditions. There is no structural assumption needed on the marginal covariance matrices as long as they are well conditioned. Second, we propose a computationally feasible two-stage estimation procedure, which consists of a convex programming based initialization stage and a group-Lasso based refinement stage, to attain the minimax rates under an additional sample size condition. Finally, we provide evidence that the additional sample size condition is essentially necessary for any randomized polynomial-time estimator to be consistent, assuming hardness of the Planted Clique detection problem. The computational lower bound is faithful to the Gaussian models used in the paper, which is achieved by a novel construction of the reduction scheme and an asymptotic equivalence theory for Gaussian discretization that is necessary for computational complexity to be well-defined. As a byproduct, we also obtain computational lower bound for the sparse PCA problem under the Gaussian spiked covariance model. This bridges a gap in the sparse PCA literature.