High-dimensional Statistical Inference: from Vector to Matrix

Zhang, Anru

High-dimensional Statistical Inference: from Vector to Matrix

Files

Zhang_upenngdas_0175C_11592.pdf (1.53 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Applied Mathematics

Subject

Constrained l_1 minimization
Constrained nuclear norm minimization
Genomic data integration
Low-rank matrix recovery
Optimal rate of convergence
Sparse signal recovery
Applied Mathematics
Statistics and Probability

Copyright date

2015-07-20T20:15:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/27957

View all metadata

Author

Zhang, Anru

Abstract

Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, $\delta_k^A<1/3$, $\delta_k^A+\theta_{k,k}^A <1$, or $\delta_{tk}^A < \sqrt{(t-1)/t}$ for any given constant $t\ge {4/3}$ guarantee the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_1$ minimization, and similarly in affine rank minimization $\delta_r^\mathcal{M}<1/3$, $\delta_r^{\mathcal{M}}+\theta_{r, r}^{\mathcal{M}}<1$, or $\delta_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ ensure the exact reconstruction of all matrices with rank at most $r$ in the noiseless case via the constrained nuclear norm minimization. Moreover, for any $\epsilon>0$, $\delta_{k}^A < 1/3+\epsilon$, $\delta_k^A+\theta_{k,k}^A<1+\epsilon$, or $\delta_{tk}^A<\sqrt{\frac{t-1}{t}}+\epsilon$ are not sufficient to guarantee the exact recovery of all $k$-sparse signals for large $k$. Similar result also holds for matrix recovery. In addition, the conditions $\delta_k^A<1/3$, $\delta_k^A+\theta_{k,k}^A<1$, $\delta_{tk}^A < \sqrt{(t-1)/t}$ and $\delta_r^\mathcal{M}<1/3$, $\delta_r^\mathcal{M}+\theta_{r,r}^\mathcal{M}<1$, $\delta_{tr}^\mathcal{M}< \sqrt{(t-1)/t}$ are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

Advisor

T. Tony Cai

Date of degree

2015-01-01

Collection

Dissertations and Theses