Date of Award
Doctor of Philosophy (PhD)
The age of “Big Data” features large volume of massive and high-dimensional datasets, leading to fast emergence of different algorithms, as well as new concerns such as privacy and fairness. To compare different algorithms with (without) these new constraints, minimax decision theory provides a principled framework to quantify the optimality of algorithms and investigate the fundamental difficulty of statistical problems. Under the framework of minimax theory, this thesis aims to address the following four problems:
1. The first part of this thesis aims to develop an optimality theory for linear discriminant analysis in the high-dimensional setting. In addition, we consider classification with incomplete data under the missing completely at random (MCR) model.
2. In the second part, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates.
3. In the third part, we study the optimality of high-dimensional clustering on the unsupervised setting under the Gaussian mixtures model. We propose a EM-based procedure with the optimal rate of convergence for the excess mis-clustering error.
4. In the fourth part, we investigate the minimax optimality under the privacy constraint for mean estimation and linear regression models, under both the classical low-dimensional and modern high-dimensional settings.
Zhang, Linjun, "Minimax Optimality In High-Dimensional Classification, Clustering, And Privacy" (2019). Publicly Accessible Penn Dissertations. 3274.