Date of this Version
Journal of Statistical Computation and Simulation
Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk.
This is an Accepted Manuscript of an article published by Taylor & Francis in the Journal of Statistical Computation and Simulation on 17 Dec 2014, available online: http://dx.doi.org/10.1080/00949655.2014.990454
stochastic dynamic programming, testimator, variable selection
Foster, D. P., & Stine, R. A. (2015). Risk Inflation of Sequential Tests Controlled by Alpha Investing. Journal of Statistical Computation and Simulation, 85 (18), 3613-3627. http://dx.doi.org/10.1080/00949655.2014.990454
Date Posted: 25 October 2018
This document has been peer reviewed.