Learning High Dimensional Correspondences from Low Dimensional Manifolds

Many different high dimensional data sets are characterized by the same underlying modes of variability. When these modes of variability are continuous and few in number, they can be viewed as parameterizing a low dimensional manifold. The manifold provides a compact shared representation of the data, suggesting correspondences between the high dimensional examples from different data sets. These correspondences, though naturally induced by the underlying manifold, are difficult to learn using traditional methods in supervised learning. In this paper, we generalize three methods in unsupervised learning—principal components analysis, factor analysis, and locally linear embedding—to discover subspaces and manifolds that provide common low dimensional representations of different high dimensional data sets. We use the shared representations discovered by these algorithms to put high dimensional examples from different data sets into correspondence. Finally, we show that a notion of "self-correspondence" between examples in the same data set can be used to improve the performance of these algorithms on small data sets. The algorithms are demonstrated on images and text.

Date of presentation

2003-08-21

Conference name

Departmental Papers (ESE)

Conference dates

2023-05-16T22:31:26.000

Comments

Presented at the 20th International Conference on Machine Learning (ICML 2003) Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, held 21-24 August 2003 in Washington, DC.
Postprint version. Published in Proceedings of the ICML 2003 Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, August 2003, pages 34-41. Publisher URL: http://www.hpl.hp.com/conferences/icml03/

Collection

Presentations