Multi-View Clustering via Canonical Correlation Analysis

Chaudhuri, Kamalika; Kakade, Sham M; Livescu, Karen; Sridharan, Karthik

Multi-View Clustering via Canonical Correlation Analysis

Files

ttic_tr_2008_5.pdf (394.29 KB)

Penn collection

Statistics Papers

Subject

Statistics and Probability
Theory and Algorithms

Permalink

https://repository.upenn.edu/handle/20.500.14332/47552

View all metadata

Author

Chaudhuri, Kamalika

Kakade, Sham M

Livescu, Karen

Sridharan, Karthik

Abstract

Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means (in order for the algorithm to be be successful). Here, we show how using multiple views of the data can relax these stringent requirements. We use Canonical Correlation Analysis (CCA) to project the data in each view to a lower-dimensional subspace. Under the assumption that conditioned on the cluster label the views are uncorrelated, we show that the separation conditions required for the algorithm to be successful are rather mild (significantly weaker than those of prior results in the literature). We provide results for mixture of Gaussians, mixtures of log concave distributions, and mixtures of product distributions.

Publication date

2009-01-01

Journal title

Proceedings of the 26th Annual International Conference on Machine Learning

Comments

At the time of publication, author Sham M. Kakade was affiliated with Toyota Technological Institute at Chicago. Currently, he is a faculty member at the Statistics Department at the University of Pennsylvania.

Collection

Articles