Learning Representations For Matching

Stephen Phillips, University of Pennsylvania

Abstract

Matching is an old and fundamental problem in Computer Vision. Ranging from low level feature matching for extracting the geometry of a scene to high level semantic matching for scene understanding, there is a broad scope of applications to the matching problem. However, there are many challenges, such as noise and outliers, that make the problemespecially difficult. Recent work has shown that using multiple images improves matching performance over pairwise matches. Additionally, in recent years, deep learning has shown great promise in Computer Vision. Deep learning techniques are state of the art in object detection, segmentation, and image generation. Deep learning techniques excel at feature learning, and prior distribution learning implicitly helps them to achieve state of the art. We hope to leverage this power to learn better representations for matching problems. In this work we propose to use various deep learning techniques to learn better matches by learning better feature representations for match. We use graph neural networks to handle the sparse nature of many of these matching problems, using multi-image cycle consistency and geometric consistency losses to learn robust representations. We propose a framework for handling outlier rejection in training the deep neural networks using primal-dual optimization. We will apply these techniques to Structure from Motion sub-problems (such as two-view or multi-view matching), shape and point cloud matching.