Date of Award
Doctor of Philosophy (PhD)
Data analysis is inherently adaptive, where previous results may influence which tests are carried out on a single dataset as part of a series of exploratory analyses. Unfortunately, classical statistical tools break down once the choice of analysis may depend on the dataset, which leads to overfitting and spurious conclusions. In this dissertation we put constraints on what type of analyses can be used adaptively on the same dataset in order to ensure valid conclusions are made. Following a line of work initiated from Dwork et al. , we focus on extending the connection between differential privacy and adaptive data analysis.
Our first contribution follows work presented in Rogers et al. . We generalize and unify previous works in the area by showing that the generalization properties of (approximately) differentially private algorithms can be used to give valid p-value corrections in adaptive hypothesis testing while recovering results for statistical and low-sensitivity queries. One of the main benefits of differential privacy is that it composes, i.e. the combination of several differentially private algorithms is itself differentially private and the privacy parameters degrade sublinearly. However, we can only apply the composition theorems when the privacy parameters are all fixed up front. Our second contribution then presents a framework for obtaining composition theorems when the privacy parameters, along with the number of procedures that are to be used, need not be fixed up front and can be adjusted adaptively Rogers et al. . These contributions are only useful if there actually exists some differentially private procedures that a data analyst would want to use. Hence, we present differentially private hypothesis tests for categorical data based on the classical chi-square hypothesis tests (Gaboardi et al. , Kifer Rogers ).
Rogers, Ryan Michael, "Leveraging Privacy In Data Analysis" (2017). Publicly Accessible Penn Dissertations. 2554.