Date of Award
Doctor of Philosophy (PhD)
Lawrence D. Brown
The development of the classical inferential theory of mathematical statistics is
based on the philosophy that all the models to fit, all the hypotheses to test and all the
parameters to do inference for are fixed prior to the collection of data. Interestingly
and in fact, more concerningly, this is not how the practice of statistics is. The practice
of statistics often explores (if not tortures) the data to find the \right" model to fit
to the data, \right" hypothesis to test and so on. Quoting Tullock (2001, page 205)
As Ronald Coase says, "if you torture the data long enough it will confess".
The young researcher, convinced he knows the truth will make changes
in his specifications and very likely produce significant results. In some
cases this is correct; his original specification was wrong and his new
one is right. Nevertheless, this procedure reduces the significance of the
Once the data is explored to find the hypothesis or model, the classical theory is
(bluntly speaking) useless for inference and can, in fact, be very misleading.
The current thesis focuses on the problem of providing Valid Inference after Data
Exploration (VIDE). Although a unified framework is provided for such a goal, the framework is explained through the problem of inference with the ordinary least
squares linear regression estimator when the data is explored to find the "right"
subset of covariates to be used in the regression model.
Valid post-selection inference has been a topic of research interest at least since
1960’s but has received increasing attention in recent times. The invalidity of classical
inference in post-selection problems may not only be due to the selection but also
due to misspecification of model. Misspecification is a very natural outcome of model
selection since the selected model cannot always be guaranteed to match the truth.
If such a guarantee exists, then the post-selection problem does not require further
study. Most of the literature on valid post-selection inference has concentrated on
the assumption of a true parametric model.
In this thesis, valid post-selection inference is provided under no parametric assumptions. The simplest setting in this thesis is when the observations are independent satisfying certain moment restrictions (and no further model/distributional
assumptions). Extensions to various dependent settings are also given. Throughout,
the total number of covariates available is allowed to grow with the sample size and
can be almost exponential in the sample size.
Kuchibhotla, Arun Kumar, "Unified Framework For Post-Selection Inference" (2020). Publicly Accessible Penn Dissertations. 4251.