Permuted Inclusion Criterion: A Variable Selection Technique

Lysen, Shaun

Permuted Inclusion Criterion: A Variable Selection Technique

Files

Dissertation___Shaun_Lysen.pdf (805.14 KB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Statistics

Subject

variable selection
linear regression
permutation
CART
model selection
Applied Statistics
Multivariate Analysis
Statistical Methodology
Statistical Models

Permalink

https://repository.upenn.edu/handle/20.500.14332/29754

View all metadata

Author

Lysen, Shaun

Abstract

We introduce a new variable selection technique called the Permuted Inclusion Criterion (PIC) based on augmenting the predictor space X with a row-permuted version denoted Xpi. We adopt the linear regression setup with n observations on p variables. Thus, our augmented space has p real predictors and p permuted predictors. This has many desirable properties for variable selection. For example, this preserves relations between variables, e.g. squares and interactions and equates the moments and covariance structure of X and Xpi. More importantly, Xpi scales with the size of X. We motivate the idea with forward selection. The first time we select a predictor from Xpi, we stop. As this depends on the permutation, we simulate many times and create a distribution of models and stopping points. This has the added benefit of quantifying how certain we are about stopping. Variable selection typically penalizes each additional variable by a prespecified amount. Our method uses a data-adaptive penalty. We apply this method to simulated data and compare its predictive performance to other widely used criteria such as Cp, RIC, and the Lasso. Viewing PIC as a selection scheme for greedy algorithms, we extend the PIC to generalized linear regression (GLM) and classification and regression trees (CART).

Advisor

Andreas Buja

Date of degree

2009-08-14

Collection

Dissertations and Theses