Shape Representations for Object Recognition

The problem of object recognition has been at the forefront of computer vision research in the last decade. The most successful approaches have used mainly edge- or texture-based representations. The shape of the object outline, albeit widely used for pre-segmented objects, has found limited applicability to the detection problem in real images. The fact that shape is a truly holistic global percept is challenging because background structure and interior object contours can easily clutter a global shape descriptor and render it unusable. Therefore, figure-ground organization, which segments the object of interest and removes the cluttering contours, is of paramount importance. However, purely bottom-up segmentation rarely provides a good object outline suitable for shape-based detection. In this thesis, we study a novel shape representation, called a chordiogram, which allows us to address the above challenges. The chordiogram is a holistic shape descriptor capturing global geometric relationships between object boundaries. Based on the chordiogram, we introduce a boundary structure segmentation model which efficiently integrates region and boundary grouping principles with shape-based matching. This method uses holistic shape for simultaneous object segmentation and detection in highly cluttered scenes. We apply it on established recognition benchmarks and achieve state-of-the art results. Further, we study the applicability of shape for object detection in videos. We show that shape-based representations can be used not only to robustly detect moving objects but also to provide a rough estimate of their pose. For this purpose, we utilize freely available large datasets of 3D synthetic models. Beyond linking shape matching with perceptual grouping, we study the interplay between feature matching and perceptual grouping. We introduce co-salient regions -- coherent, corresponding segments in two or more images -- and describe two algorithms for their detection. Co-salient regions are applied to two problems -- wide-baseline stereo and motion segmentation. In the former problem we show how to estimate correspondences between regions and improve feature matches, while in the latter segments representing same object parts are tracked across multiple frames in a video.

Advisor

Kostas Daniilidis
Ben Taskar
Jianbo Shi

Date of degree

2011-05-16

Collection

Dissertations and Theses