Departmental Papers (ESE)

Abstract

Many different high dimensional data sets are characterized by the same underlying modes of variability. When these modes of variability are continuous and few in number, they can be viewed as parameterizing a low dimensional manifold. The manifold provides a compact shared representation of the data, suggesting correspondences between the high dimensional examples from different data sets. These correspondences, though naturally induced by the underlying manifold, are difficult to learn using traditional methods in supervised learning. In this paper, we generalize three methods in unsupervised learning—principal components analysis, factor analysis, and locally linear embedding—to discover subspaces and manifolds that provide common low dimensional representations of different high dimensional data sets. We use the shared representations discovered by these algorithms to put high dimensional examples from different data sets into correspondence. Finally, we show that a notion of "self-correspondence" between examples in the same data set can be used to improve the performance of these algorithms on small data sets. The algorithms are demonstrated on images and text.

Document Type

Conference Paper

Subject Area

GRASP

Date of this Version

August 2003

Comments

Presented at the 20th International Conference on Machine Learning (ICML 2003) Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, held 21-24 August 2003 in Washington, DC.

Share

COinS
 

Date Posted: 04 August 2005