Cluster-based Concept Invention for Statistical Relational Learning

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Artificial Intelligence
Algorithms
Relational Learning
Clustering
Feature Generation
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Popescul, Alexandrin
Contributor
Abstract

We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the creation of new features. For example, in CiteSeer, papers can be clustered based on words or citations giving "topics", and authors can be clustered based on documents they co-author giving "communities". Such cluster-derived concepts become part of more complex feature expressions. Out of the large number of generated features, those which improve predictive accuracy are kept in the model, as decided by statistical feature selection criteria. We present results demonstrating improved accuracy on two tasks, venue prediction and link prediction, using CiteSeer data.

Advisor
Date of presentation
2004-08-22
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-16T22:27:58.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Postprint version. Copyright ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2004), pages 665-670. Publisher URL: http://doi.acm.org/10.1145/1014052.1014137
Postprint version. Copyright ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2004), pages 665-670. Publisher URL: http://doi.acm.org/10.1145/1014052.1014137
Recommended citation
Collection