Valid Post-Selection Inference

Loading...
Thumbnail Image
Penn collection
Statistics Papers
Degree type
Discipline
Subject
linear regression
model selection
multiple comparison
family-wise error
high-dimensional inference
sphere packing
Statistics and Probability
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Berk, Richard A
Brown, Lawrence D
Buja, Andreas
Zhang, Kai
Zhao, Linda
Contributor
Abstract

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid “post-selection inference” by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity insurance” for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2013-01-01
Journal title
Annals of Statistics
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection