Date of Award
Doctor of Philosophy (PhD)
We introduce a new variable selection technique called the Permuted Inclusion Criterion (PIC) based on augmenting the predictor space X with a row-permuted version denoted Xpi. We adopt the linear regression setup with n observations on p variables. Thus, our augmented space has p real predictors and p permuted predictors. This has many desirable properties for variable selection. For example, this preserves relations between variables, e.g. squares and interactions and equates the moments and covariance structure of X and Xpi. More importantly, Xpi scales with the size of X. We motivate the idea with forward selection. The first time we select a predictor from Xpi, we stop. As this depends on the permutation, we simulate many times and create a distribution of models and stopping points. This has the added benefit of quantifying how certain we are about stopping. Variable selection typically penalizes each additional variable by a prespecified amount. Our method uses a data-adaptive penalty. We apply this method to simulated data and compare its predictive performance to other widely used criteria such as Cp, RIC, and the Lasso. Viewing PIC as a selection scheme for greedy algorithms, we extend the PIC to generalized linear regression (GLM) and classification and regression trees (CART).
Lysen, Shaun, "Permuted Inclusion Criterion: A Variable Selection Technique" (2009). Publicly Accessible Penn Dissertations. 28.