Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group


First Advisor

Dean P. Foster

Second Advisor

Zongming Ma


Classical decision theory evaluates an estimator mostly by its statistical properties, either the closeness to the underlying truth or the predictive ability for new observations. The goal is to find estimators to achieve statistical optimality. Modern "Big Data" applications, however, necessitate efficient processing of large-scale ("big-n-big-p'") datasets, which poses great challenge to classical decision-theoretic framework which seldom takes into account the scalability of estimation procedures. On the one hand, statistically optimal estimators could be computationally intensive and on the other hand, fast estimation procedures might suffer from a loss of statistical efficiency. So the challenge is to kill two birds with one stone. This thesis brings together statistical and computational perspectives to study canonical correlation analysis (CCA) and network data modeling, where we investigate both the optimality and the scalability of the estimators. Interestingly, in both cases, we find iterative estimation procedures based on non-convex optimization can significantly reduce the computational cost and meanwhile achieve desirable statistical properties.

In the first part of the thesis, motivated by the recent success of using CCA to learn low-dimensional feature representations of high-dimensional objects, we propose novel metrics which quantify the estimation loss of CCA by the excess prediction loss defined through a prediction-after-dimension-reduction framework. These new metrics have rich statistical and geometric interpretations, which suggest viewing CCA estimation as estimating the subspaces spanned by the canonical variates.

We characterize, with minimal assumptions, the non-asymptotic minimax rates under the proposed error metrics, especially how the minimax rates depend on the key quantities including the dimensions, the condition number of the covariance matrices and the canonical correlations. Finally, by formulating sample CCA as a non-convex optimization problem, we propose an efficient (stochastic) first order algorithm which scales to large datasets.

In the second part of the thesis, we propose two universal fitting algorithms for networks (possibly with edge covariates) under latent space models: one based on finding the exact maximizer of a convex surrogate of the non-convex likelihood function and the other based on finding an approximate optimizer of the original non-convex objective. Both algorithms are motivated by a special class of inner-product models but are shown to work for a much wider range of latent space models which allow the latent vectors to determine the connection probability of the edges in flexible ways. We derive the statistical rates of convergence of both algorithms and characterize the basin-of-attraction of the non-convex approach. The effectiveness and efficiency of the non-convex procedure is demonstrated by extensive simulations and real-data experiments.