Learning from Partial Labels

Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Computer Sciences
Grant number
Copyright date
Related resources
Cour, Timothee
Sapp, Benjamin

We address the problem of partially-labeled multiclass classification, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classifier that can disambiguate the partially-labeled training instances, and generalize to unseen data. We define an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost.

Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
Journal title
Volume number
Issue number
Publisher DOI
Journal Issue
Cour, T.., Sapp, B.., & Taskar, B.. (2011). Learning from Partial Labels. Journal of Machine Learning Research, 1225-1261. ©2011 Timothee Cour, Ben Sapp and Ben Taskar.
Recommended citation