Leveraging Symmetric Structure For Improved Learning In Convolutional Neural Networks
Degree type
Graduate group
Discipline
Subject
Physical priors
Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
The aggressive resurgence of convolutional neural network (CNN) models for prediction has led to new benchmarks in speech recognition, natural language processing and computer vision. In computer vision, the success of these models is often attributed to the combination of a highly nonlinear cascaded processing scheme and the equivariance of planar convolution to translations of the input. This thesis introduces methods that extend the equivariance capability of CNNs to linear Lie groups that describe: the motion of objects, the structure of the Euclidean world and the formation of images. The first approach introduces a framework for joint estimation of image and motion representations. The linear Lie group structure is enforced through a bilinear motion model which transforms an image representation by the linear combination of motion generators. The approach affords extrapolation of image sequences through linear extrapolation of transformation coefficients. In the second approach, 3D rotationally equivariant representations are learned by convolution of spherical functions with respect to the 3D rotation group. Methods are described for the convolution of functions on both the two- and three-spheres. The final approach enforces equivariance of representations to 2D dilated-rotations by preprocessing the input with a change of coordinates.