STATISTICAL AND HIGH-DIMENSIONAL PERSPECTIVES ON MACHINE LEARNING
Degree type
Graduate group
Discipline
Statistics and Probability
Computer Sciences
Subject
machine learning theory
statistics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
In the first chapter, we consider the problem of calibration.While the accuracy of modern machine learning techniques continues to improve, many models exhibit mis-calibration, wherein the probability scores produced by the models fail to align with the actual frequencies of the labels. This discrepancy can lead to unreliable predictions and hinder the practical application of these models. To address this issue, we frame the task of detecting mis-calibration as a hypothesis testing problem. Drawing inspiration from nonparametric hypothesis testing, we propose T-Cal, a minimax optimal test for calibration based on a debiased plug-in estimator of the $\ell_2$-Expected Calibration Error (ECE). T-Cal offers a principled and statistically sound approach to assess the calibration of machine learning models. The second chapter focuses on out-of-distribution performance estimation.Evaluating model performance under distribution shift is particularly challenging when we are only given unlabeled data from the target domain. Recent work suggests the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, can be used as a proxy for the accuracy. We establish a theoretical foundation for analyzing disagreement in high-dimensional random features regression. Our analysis shows that there is a linear relationship between source and target disagreement, which we can leverage to estimate the out-of-distribution performance. The third chapter studies feature learning in two-layer neural networks, which is considered one of the fundamental reasons behind the success of deep neural networks.Despite its significance, existing theoretical frameworks do not fully explain the mechanism of feature learning, even in the simplest case of two-layer neural networks. In this work, we enrich our understanding of feature learning by considering a general setting where the learning rate grows with the sample size. Under this setting, we demonstrate that a single step of gradient descent introduces multiple rank-one components to the feature matrix, each corresponding to a specific polynomial feature. Furthermore, we prove that the limiting training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the training and test errors, we illustrate how these non-linear features can enhance the learning process. Through this comprehensive analysis, we shed light on the intricate dynamics of feature learning and its crucial role in the performance of neural networks.
Advisor
Hassani, Hamed