Date of Award
Doctor of Philosophy (PhD)
Multivariate statistics concerns the study of dependence relations among multiple variables of interest. Distinct from widely studied regression problems where one of the variables is singled out as a response, in multivariate analysis all variables are treated symmetrically and the dependency structures are examined, either for interest in its own right or for further analyses such as regressions. This thesis includes the study of three independent research problems in multivariate statistics.
The first part of the thesis studies additive principal components (APCs for short), a nonlinear method useful for exploring additive relationships among a set of variables. We propose a shrinkage regularization approach for estimating APC transformations by casting the problem in the framework of reproducing kernel Hilbert spaces. To formulate the kernel APC problem, we introduce the Null Comparison Principle, a principle that ties the constraint in a multivariate problem to its criterion in a way that makes the goal of the multivariate method under study transparent. In addition to providing a detailed formulation and exposition of the kernel APC problem, we study asymptotic theory of kernel APCs. Our theory also motivates an iterative algorithm for computing kernel APCs.
The second part of the thesis investigates the estimation of precision matrices in high dimensions when the data is corrupted in a cellwise manner and the uncontaminated data follows a multivariate normal distribution. It is known that in the setting of Gaussian graphical models, the conditional independence relations among variables is captured by the precision matrix of a multivariate normal distribution, and estimating the support of the precision matrix is equivalent to graphical model selection. In this work, we analyze the theoretical properties of robust estimators for precision matrices in high dimensions. The estimators we analyze are formed by plugging appropriately chosen robust covariance matrix estimators into the graphical Lasso and CLIME, two existing methods for high-dimensional precision matrix estimation. We establish error bounds for the precision matrix estimators that reveal the interplay between the dimensionality of the problem and the degree of contamination permitted in the observed distribution, and also analyze the breakdown point of both estimators. We also discuss implications of our work for Gaussian graphical model estimation in the presence of cellwise contamination.
The third part of the thesis studies the problem of optimal estimation of a quadratic functional under the Gaussian two-sequence model. Quadratic functional estimation has been well studied under the Gaussian sequence model, and close connections between the problem of quadratic functional estimation and that of signal detection have been noted. Focusing on the estimation problem in the Gaussian two-sequence model, in this work we propose optimal estimators of the quadratic functional for different regimes and establish the minimax rates of convergence over a family of parameter spaces. The optimal rates exhibit interesting phase transition in this family. We also discuss the implications of our estimation results on the associated simultaneous signal detection problem.
Tan, Xin Lu, "Topics In Multivariate Statistics" (2017). Publicly Accessible Penn Dissertations. 2602.