Learning Image Segmentation With Relation-Centric Loss And Representation

Hwang, Jyh-Jing

Learning Image Segmentation With Relation-Centric Loss And Representation

Files

Hwang_upenngdas_0175C_14210.pdf (94.39 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

Artificial Intelligence
Computer Vision
Deep Learning
Image Segmentation
Machine Learning
Metric Learning
Computer Sciences

Copyright date

2021-08-31T20:20:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30775

View all metadata

Author

Hwang, Jyh-Jing

Abstract

How to equip machines with the ability to understand an image and explain everything in it has a long history in computer vision, motivating tasks from image recognition, detection, and segmentation towards holistic scene understanding. Researchers have been tackling an array of related structured prediction tasks to approach this problem: low-level and high-level image segmentation, monocular image depth estimation, surface normal prediction, etc. These tasks focus on different aspects of holistic image understanding yet intertwine with one another to unveil underlying image structures. This dissertation presents our efforts in this direction, with an emphasis on learning to model pixel relationships. We identify three major problems in image understanding and segmentation: (1) The pixel-wise classification based approaches fundamentally lack the sense of object structures and spatial layouts. (2) The lack of explainability prevents us from understanding the mistakes and from proposing remedies accordingly. (3) The definition of image segmentation is ambiguous and changing over time. These constant arguments reveal the complexity of holistic image understanding and motivate us to propose unsupervised feature learning for image segmentation. To tackle these problems one by one, we first propose Adaptive Affinity Fields (AAF) for semantic segmentation, which capture and match the semantic relations between neighboring pixels in the label space. Furthermore, we generalize the affinity fields in AAF as another neural network and propose Adversarial Structure Matching (ASM) for structured prediction tasks. ASM extends the structure learning concept from segmentation to monocular depth estimation and surface normal prediction. To tackle the second and third problems, we turn attention to metric learning and propose Segment Sorting (SegSort) for semantic segmentation. Most excitingly, SegSort is the first attempt using deep learning for unsupervised semantic segmentation with interpretable results. Last but not least, we extend our proposed SegSort to image parsing as a first attempt to unify semantic and instance segmentation.

Advisor

Jianbo Shi
Stella X. Yu

Date of degree

2020-01-01

Collection

Dissertations and Theses