Leveraging Privacy In Data Analysis

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Applied Mathematics
Discipline
Subject
Adaptive Data Analysis
Differential Privacy
Statistics
Computer Sciences
Statistics and Probability
Funder
Grant number
License
Copyright date
2018-02-23T20:17:00-08:00
Distributor
Related resources
Contributor
Abstract

Data analysis is inherently adaptive, where previous results may influence which tests are carried out on a single dataset as part of a series of exploratory analyses. Unfortunately, classical statistical tools break down once the choice of analysis may depend on the dataset, which leads to overfitting and spurious conclusions. In this dissertation we put constraints on what type of analyses can be used adaptively on the same dataset in order to ensure valid conclusions are made. Following a line of work initiated from Dwork et al. [2015], we focus on extending the connection between differential privacy and adaptive data analysis. Our first contribution follows work presented in Rogers et al. [2016]. We generalize and unify previous works in the area by showing that the generalization properties of (approximately) differentially private algorithms can be used to give valid p-value corrections in adaptive hypothesis testing while recovering results for statistical and low-sensitivity queries. One of the main benefits of differential privacy is that it composes, i.e. the combination of several differentially private algorithms is itself differentially private and the privacy parameters degrade sublinearly. However, we can only apply the composition theorems when the privacy parameters are all fixed up front. Our second contribution then presents a framework for obtaining composition theorems when the privacy parameters, along with the number of procedures that are to be used, need not be fixed up front and can be adjusted adaptively Rogers et al. [2016]. These contributions are only useful if there actually exists some differentially private procedures that a data analyst would want to use. Hence, we present differentially private hypothesis tests for categorical data based on the classical chi-square hypothesis tests (Gaboardi et al. [2016], Kifer Rogers [2017]).

Advisor
Michael Kearns
Aaron Roth
Date of degree
2017-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation