
Departmental Papers (ESE)
Document Type
Conference Paper
Date of this Version
August 2003
Abstract
Many different high dimensional data sets are characterized by the same underlying modes of variability. When these modes of variability are continuous and few in number, they can be viewed as parameterizing a low dimensional manifold. The manifold provides a compact shared representation of the data, suggesting correspondences between the high dimensional examples from different data sets. These correspondences, though naturally induced by the underlying manifold, are difficult to learn using traditional methods in supervised learning. In this paper, we generalize three methods in unsupervised learning—principal components analysis, factor analysis, and locally linear embedding—to discover subspaces and manifolds that provide common low dimensional representations of different high dimensional data sets. We use the shared representations discovered by these algorithms to put high dimensional examples from different data sets into correspondence. Finally, we show that a notion of "self-correspondence" between examples in the same data set can be used to improve the performance of these algorithms on small data sets. The algorithms are demonstrated on images and text.
Date Posted: 04 August 2005

Comments
Presented at the 20th International Conference on Machine Learning (ICML 2003) Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, held 21-24 August 2003 in Washington, DC.