Efficient Human Pose Estimation with Image-dependent Interactions

Sapp, Benjamin John

Efficient Human Pose Estimation with Image-dependent Interactions

Files

Sapp_upenngdas_0175C_10378.pdf (62.66 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

computer vision
convex optimization
graphical models
machine learning
statistical inference
Applied Mathematics
Computer Sciences
Statistics and Probability

Copyright date

2014-08-20T20:12:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32459

View all metadata

Author

Sapp, Benjamin John

Abstract

Human pose estimation from 2D images is one of the most challenging and computationally-demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. Finally, we develop a local linear approach that learns bases centered around modes in the training data, giving us image-dependent local models which are fast and accurate. These techniques allow for sparse and efficient inference on the order of minutes or seconds per image. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and multiple modes. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets.

Advisor

Ben Taskar

Date of degree

2012-01-01

Collection

Dissertations and Theses