Leveraging Privacy In Data Analysis

Rogers, Ryan Michael

Leveraging Privacy In Data Analysis

Files

Rogers_upenngdas_0175C_12720.pdf (3.4 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Applied Mathematics

Subject

Adaptive Data Analysis
Differential Privacy
Statistics
Computer Sciences
Statistics and Probability

Copyright date

2018-02-23T20:17:00-08:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/29484

View all metadata

Author

Rogers, Ryan Michael

Abstract

Data analysis is inherently adaptive, where previous results may influence which tests are carried out on a single dataset as part of a series of exploratory analyses. Unfortunately, classical statistical tools break down once the choice of analysis may depend on the dataset, which leads to overfitting and spurious conclusions. In this dissertation we put constraints on what type of analyses can be used adaptively on the same dataset in order to ensure valid conclusions are made. Following a line of work initiated from Dwork et al. [2015], we focus on extending the connection between differential privacy and adaptive data analysis. Our first contribution follows work presented in Rogers et al. [2016]. We generalize and unify previous works in the area by showing that the generalization properties of (approximately) differentially private algorithms can be used to give valid p-value corrections in adaptive hypothesis testing while recovering results for statistical and low-sensitivity queries. One of the main benefits of differential privacy is that it composes, i.e. the combination of several differentially private algorithms is itself differentially private and the privacy parameters degrade sublinearly. However, we can only apply the composition theorems when the privacy parameters are all fixed up front. Our second contribution then presents a framework for obtaining composition theorems when the privacy parameters, along with the number of procedures that are to be used, need not be fixed up front and can be adjusted adaptively Rogers et al. [2016]. These contributions are only useful if there actually exists some differentially private procedures that a data analyst would want to use. Hence, we present differentially private hypothesis tests for categorical data based on the classical chi-square hypothesis tests (Gaboardi et al. [2016], Kifer Rogers [2017]).

Advisor

Michael Kearns
Aaron Roth

Date of degree

2017-01-01

Collection

Dissertations and Theses