Discrete Methods in Statistics: Feature Selection and Fairness-Aware Data Mining

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Statistics
Discipline
Subject
Fairness Aware Data Mining
Feature Selection
Forward Stepwise
Post-Selection Inference
Sequential Testing
Submodular
Statistics and Probability
Funder
Grant number
License
Copyright date
2016-11-29T00:00:00-08:00
Distributor
Related resources
Contributor
Abstract

This dissertation is a detailed investigation of issues that arise in models that change discretely. Models are often constructed by either including or excluding features based on some criteria. These discrete changes are challenging to analyze due to correlation between features. Feature selection is the problem of identifying an appropriate set of features to include in a model, while fairness-aware data mining is the problem of needing to remove the \emph{influence} of protected features from a model. This dissertation provides frameworks for understanding each problem and algorithms for accomplishing the desired goal. The feature selection problem is addressed through the framework of sequential hypothesis testing. We elucidate the statistical challenges in repeatedly using inference in this domain and demonstrate how current methods fail to address them. Our algorithms build on classically motivated, multiple testing procedures to control measures of false rejections when using hypothesis testing during forward stepwise regression. Furthermore, these methods have much higher power than recent proposals from the conditional inference literature. The fairness-aware data mining community is grappling with fundamental questions concerning fairness in statistical modeling. Tension exists between identifying explainable differences between groups and discriminatory ones. We provide a framework for understanding the connections between fairness and the use of protected information in modeling. With this discussion in hand, generating fair estimates is straight-forward.

Advisor
Robert A. Stine
Dean P. Foster
Date of degree
2016-01-01
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation