Learning, Moving, And Predicting With Global Motion Representations

Jaegle, Andrew Coulter

Learning, Moving, And Predicting With Global Motion Representations

Files

Jaegle_upenngdas_0175C_13234.pdf (15.96 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Neuroscience

Subject

Computational neuroscience
Computer vision
Deep learning
Machine learning
Motion
Vision
Artificial Intelligence and Robotics
Computer Sciences
Neuroscience and Neurobiology

Copyright date

2018-09-27T20:18:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/29840

View all metadata

Author

Jaegle, Andrew Coulter

Abstract

In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action.

Advisor

Kostas Daniilidis

Date of degree

2018-01-01

Collection

Dissertations and Theses