Efficient Feature Selection in the Presence of Multiple Feature Classes

Dhillon, Paramveer Singh; Foster, Dean P; Ungar, Lyle H

Efficient Feature Selection in the Presence of Multiple Feature Classes

Files

Dhillon_2008.pdf (257.62 KB)

Penn collection

Departmental Papers (CIS)

Subject

feature extraction
pattern classification
feature selection
features extraction
gene expression data
information theoretic approach
multiple feature classes
word sense disambiguation
Minimum Description Length Coding

Permalink

https://repository.upenn.edu/handle/20.500.14332/6462

View all metadata

Author

Dhillon, Paramveer Singh

Foster, Dean P

Ungar, Lyle H

Abstract

We present an information theoretic approach to feature selection when the data possesses feature classes. Feature classes are pervasive in real data. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When doing word sense disambiguation or named entity extraction, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. When predictive features occur predominantly in a small number of feature classes, our information theoretic approach significantly improves feature selection. Experiments on real and synthetic data demonstrate substantial improvement in predictive accuracy over the standard L0 penalty-based stepwise and stream wise feature selection methods as well as over Lasso and Elastic Nets, all of which are oblivious to the existence of feature classes.

Date of presentation

2008-12-15

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-17T03:00:38.000

Comments

Copyright YEAR 2008. Reprinted from: Dhillon, P.S.; Foster, D.; Ungar, L.H., "Efficient Feature Selection in the Presence of Multiple Feature Classes," Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on , vol., no., pp.779-784, 15-19 Dec. 2008 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4781178&isnumber=4781078 This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Collection

Presentations