Learning Image Segmentation With Relation-Centric Loss And Representation
Degree type
Graduate group
Discipline
Subject
Computer Vision
Deep Learning
Image Segmentation
Machine Learning
Metric Learning
Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
How to equip machines with the ability to understand an image and explain everything in it has a long history in computer vision, motivating tasks from image recognition, detection, and segmentation towards holistic scene understanding. Researchers have been tackling an array of related structured prediction tasks to approach this problem: low-level and high-level image segmentation, monocular image depth estimation, surface normal prediction, etc. These tasks focus on different aspects of holistic image understanding yet intertwine with one another to unveil underlying image structures. This dissertation presents our efforts in this direction, with an emphasis on learning to model pixel relationships. We identify three major problems in image understanding and segmentation: (1) The pixel-wise classification based approaches fundamentally lack the sense of object structures and spatial layouts. (2) The lack of explainability prevents us from understanding the mistakes and from proposing remedies accordingly. (3) The definition of image segmentation is ambiguous and changing over time. These constant arguments reveal the complexity of holistic image understanding and motivate us to propose unsupervised feature learning for image segmentation. To tackle these problems one by one, we first propose Adaptive Affinity Fields (AAF) for semantic segmentation, which capture and match the semantic relations between neighboring pixels in the label space. Furthermore, we generalize the affinity fields in AAF as another neural network and propose Adversarial Structure Matching (ASM) for structured prediction tasks. ASM extends the structure learning concept from segmentation to monocular depth estimation and surface normal prediction. To tackle the second and third problems, we turn attention to metric learning and propose Segment Sorting (SegSort) for semantic segmentation. Most excitingly, SegSort is the first attempt using deep learning for unsupervised semantic segmentation with interpretable results. Last but not least, we extend our proposed SegSort to image parsing as a first attempt to unify semantic and instance segmentation.
Advisor
Stella X. Yu