Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Computer and Information Science

First Advisor

Jianbo Shi

Second Advisor

Stella X. Yu


How to equip machines with the ability to understand an image and explain everything in it has a long history in computer vision, motivating tasks from image recognition, detection, and segmentation towards holistic scene understanding. Researchers have been tackling an array of related structured prediction tasks to approach this problem: low-level and high-level image segmentation, monocular image depth estimation, surface normal prediction, etc. These tasks focus on different aspects of holistic image understanding yet intertwine with one another to unveil underlying image structures. This dissertation presents our efforts in this direction, with an emphasis on learning to model pixel relationships.

We identify three major problems in image understanding and segmentation: (1) The pixel-wise classification based approaches fundamentally lack the sense of object structures and spatial layouts. (2) The lack of explainability prevents us from understanding the mistakes and from proposing remedies accordingly. (3) The definition of image segmentation is ambiguous and changing over time. These constant arguments reveal the complexity of holistic image understanding and motivate us to propose unsupervised feature learning for image segmentation.

To tackle these problems one by one, we first propose Adaptive Affinity Fields (AAF) for semantic segmentation, which capture and match the semantic relations between neighboring pixels in the label space. Furthermore, we generalize the affinity fields in AAF as another neural network and propose Adversarial Structure Matching (ASM) for structured prediction tasks. ASM extends the structure learning concept from segmentation to monocular depth estimation and surface normal prediction. To tackle the second and third problems, we turn attention to metric learning and propose Segment Sorting (SegSort) for semantic segmentation. Most excitingly, SegSort is the first attempt using deep learning for unsupervised semantic segmentation with interpretable results. Last but not least, we extend our proposed SegSort to image parsing as a first attempt to unify semantic and instance segmentation.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."