Unsupervised Distance Metric Learning Using Predictability

Gupta, Abhishek A.; Foster, Dean P.; Ungar, Lyle H.

Unsupervised Distance Metric Learning Using Predictability

Files

MS_CIS_08_23.pdf (295.35 KB)

Penn collection

Technical Reports (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/7850

View all metadata

Author

Gupta, Abhishek A.

Foster, Dean P.

Ungar, Lyle H.

Abstract

Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. This paper does unsupervised metric learning in the context of clustering. We seek transformations of data which give clean and well separated clusters where clean clusters are those for which membership can be accurately predicted. The transformation (hence distance metric) is obtained by minimizing the blur ratio, which is defined as the ratio of the within cluster variance divided by the total data variance in the transformed space. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM). CPCM alternately (a) predicts cluster memberships (e.g., using linear regression) and (b) clusters these predictions (e.g., using k-means). With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting clusters are invariant to linear transformations of original features, and tend to eliminate noise features by driving their weights to zero.

Publication date

2008-06-13

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-08-23.

Collection

Reports