Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition

Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Cour, Timothee
Sapp, Benjamin
Nagle, Akash
Contributor
Abstract

We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of ‘supervision’ are the dialog cues: first, second and third person references (such as “I’m Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we exploit multiple modalities and their interactions (appearance, dialog, mouth movement, synchrony, continuityediting cues) to effectively resolve identities through local temporal grouping followed by global weakly supervised recognition. We propose a novel temporal grouping model that partitions face tracks across multiple shots while respecting appearance, geometric and film-editing cues and constraints. In this model, states represent partitions of the k most recent face tracks, and transitions represent compatibility of consecutive partitions. We present dynamic programming inference and discriminative learning for the model. The individual face tracks are subsequently assigned a name by learning a classifier from partial label constraints. The weakly supervised classifier incorporates multiple-instance constraints from dialog cues as well as soft grouping constraints from our temporal grouping. We evaluate both the temporal grouping and final character naming on several hours of TV and movies.

Advisor
Date of presentation
2010-06-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T07:09:40.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Talking Pictures: Temporal Grouping and Dialog-Supervised Person Recognition (http://www.cis.upenn.edu/%7Etaskar/pubs/dialog_cvpr10.pdf), T. Cour (http://www.seas.upenn.edu/%7Etimothee/), B. Sapp (http://www.seas.upenn.edu/%7Ebensapp), A. Nagle, and B Taskar, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (http://cvl.umiacs.umd.edu/conferences/cvpr2010/), San Francisco, CA, June 2010. ©2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Recommended citation
Collection