## Publicly Accessible Penn Dissertations

2021

Dissertation

#### Degree Name

Doctor of Philosophy (PhD)

Applied Mathematics

Weijie J. Su

Qi Long

#### Abstract

The era of machine learning features large datasets that have high dimension of features. This leads to the emergence of various algorithms to learn efficiently from such high-dimensional datasets, as well as the need to analyze these algorithms from both the prediction and the statistical inference viewpoint. To be more specific, an ideal model is expected to predict accurately on the unseen new data, and to provide valid inference so as to harness the uncertainty in the model. Unfortunately, the high dimension of features poses a great challenge on the analysis of many prevalent models, rendering them either inapplicable or difficult to study.

This thesis leverages the approximate message passing (AMP) algorithm, the optimization theory, and the Sorted L-One Penalized Estimation (SLOPE) to study several important problems of the sparse models.

The first chapter introduces various $\ell_1$ penalties including but not limited to the SLOPE, a relatively new convex optimization procedure via the sorted $\ell_1$ penalty, in the general machine learning models. We then focus on the linear models and demonstrate some basic properties of SLOPE, especially its advantages over the Lasso. Next, we cover the AMP algorithm in terms of convergence behavior and asymptotic statistical characterization.

The second chapter extends the AMP algorithms from Lasso to SLOPE and provides an asymptotically tight characterization of the SLOPE solution. Note that SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted $\ell_1$ penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. We develop an asymptotically exact characterization of the SLOPE solution under Gaussian random designs through solving the SLOPE problem using approximate message passing (AMP). This algorithmic approach allows us to approximate the SLOPE solution via the much more amenable AMP iterates. Explicitly, we characterize the asymptotic dynamics of the AMP iterates relying on a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted $\ell_1$ penalty. Moreover, we prove that the AMP iterates converge to the SLOPE solution in an asymptotic sense, and numerical simulations show that the convergence is surprisingly fast. Our proof rests on a novel technique that specifically leverages the SLOPE problem. In contrast to prior literature, our work not only yields an asymptotically sharp analysis but also offers an algorithmic, flexible, and constructive approach to understanding the SLOPE problem.

The third chapter builds on top of the asymptotic characterization of SLOPE to study the trade-off between true positive proportion (TPP) and false discovery proportion (FDP) or, equivalently, between measures of type I error and power. Assuming a regime of linear sparsity and working under Gaussian random designs, we obtain an upper bound on the optimal trade-off for SLOPE, showing its capability of breaking the Donoho--Tanner power limit. To put it into perspective, this limit is the highest possible power that the Lasso, which is perhaps the most popular $\ell_1$-based method, can achieve even with arbitrarily strong effect sizes. Next, we derive a tight lower bound that delineates the fundamental limit of sorted $\ell_1$ regularization in optimally trading theFDP off for the TPP. Finally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms the Lasso, in the sense of having a smaller FDP, larger TPP, and smaller $\ell_2$ estimation risk simultaneously. Our proofs are based on a novel technique that reduces a calculus of variations problem to a class of infinite-dimensional convex optimization problems and a very recent result from approximate message passing theory.

The fourth chapter works on the practical application of SLOPE by efficiently designing the SLOPE penalty sequence in the finite dimension, by restricting the number of unique values in the SLOPE penalty to be small. SLOPE's magnitude-dependent regularization requires an input of penalty sequence $\blam$, instead of a scalar penalty as in the Lasso case, thus making the design extremely expensive in computation. We propose two efficient algorithms to design the possibly high-dimensional SLOPE penalty, in order to minimize the mean squared error. For Gaussian data matrices, we propose a first-order Projected Gradient Descent (PGD) under the Approximate Message Passing regime. For general data matrices, we present a zeroth-order Coordinate Descent (CD) to design a sub-class of SLOPE, referred to as the $k$-level SLOPE. Our CD allows a useful trade-off between accuracy and computation speed. We demonstrate the performance of SLOPE with our designs via extensive experiments on synthetic data and real-world datasets.

COinS