Statistics Papers

Document Type

Technical Report

Date of this Version

2015

Publication Source

Journal of Statistical Computation and Simulation

Volume

85

Issue

18

Start Page

3613

Last Page

3627

DOI

10.1080/00949655.2014.990454

Abstract

Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk.

Copyright/Permission Statement

This is an Accepted Manuscript of an article published by Taylor & Francis in the Journal of Statistical Computation and Simulation on 17 Dec 2014, available online: http://dx.doi.org/10.1080/00949655.2014.990454

Keywords

stochastic dynamic programming, testimator, variable selection

Share

COinS
 

Date Posted: 25 October 2018

This document has been peer reviewed.