Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group


First Advisor

Robert A. Stine

Second Advisor

Dean P. Foster


This dissertation is a detailed investigation of issues that arise in models

that change discretely. Models are often constructed by either including or

excluding features based on some criteria. These discrete changes are

challenging to analyze due to correlation between features. Feature selection

is the problem of identifying an appropriate set of features to include in a

model, while fairness-aware data mining is the problem of needing to remove

the \emph{influence} of protected features from a model. This dissertation

provides frameworks for understanding each problem and algorithms for

accomplishing the desired goal.

The feature selection problem is addressed through the framework of sequential

hypothesis testing. We elucidate the statistical challenges in repeatedly using

inference in this domain and demonstrate how current methods fail to address

them. Our algorithms build on classically motivated, multiple testing procedures

to control measures of false rejections when using hypothesis testing during

forward stepwise regression. Furthermore, these methods have much higher power

than recent proposals from the conditional inference literature.

The fairness-aware data mining community is grappling with fundamental

questions concerning fairness in statistical modeling. Tension exists between

identifying explainable differences between groups and discriminatory ones. We

provide a framework for understanding the connections between fairness and

the use of protected information in modeling. With this discussion in hand,

generating fair estimates is straight-forward.