Permuted Inclusion Criterion: A Variable Selection Technique

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Statistics
Discipline
Subject
variable selection
linear regression
permutation
CART
model selection
Applied Statistics
Multivariate Analysis
Statistical Methodology
Statistical Models
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract

We introduce a new variable selection technique called the Permuted Inclusion Criterion (PIC) based on augmenting the predictor space X with a row-permuted version denoted Xpi. We adopt the linear regression setup with n observations on p variables. Thus, our augmented space has p real predictors and p permuted predictors. This has many desirable properties for variable selection. For example, this preserves relations between variables, e.g. squares and interactions and equates the moments and covariance structure of X and Xpi. More importantly, Xpi scales with the size of X. We motivate the idea with forward selection. The first time we select a predictor from Xpi, we stop. As this depends on the permutation, we simulate many times and create a distribution of models and stopping points. This has the added benefit of quantifying how certain we are about stopping. Variable selection typically penalizes each additional variable by a prespecified amount. Our method uses a data-adaptive penalty. We apply this method to simulated data and compare its predictive performance to other widely used criteria such as Cp, RIC, and the Lasso. Viewing PIC as a selection scheme for greedy algorithms, we extend the PIC to generalized linear regression (GLM) and classification and regression trees (CART).

Advisor
Andreas Buja
Date of degree
2009-08-14
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation