AN INFORMATION GEOMETRIC PICTURE OF THE SPACE OF TASKS

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Applied Mathematics and Computational Science
Discipline
Data Science
Subject
Information Geometry
Machine Learning
Representation Learning
Funder
Grant number
License
Copyright date
2023
Distributor
Related resources
Author
Gao, Yansong
Contributor
Abstract

This dissertation seeks to understand why deep learning models can be effectively applied to a wide range of downstream tasks. To shed light on this question, we will characterize the space of tasks using techniques in information geometry. The first result of this thesis uses ideas from variational inference and thermodynamics to formalize a free energy principle that identifies the reconstruction of the input data as a canonical task for transfer learning. This principle suggests that maintaining some redundant information about the source task in the representation is necessary to transfer the representation to other tasks effectively. It also motivates new algorithms to transfer a pre-trained model using interpolation of the source and target tasks. The second result of the thesis defines a distance on the space tasks. This distance can be thought of as the length of the shortest path that a model needs to travel in order to adapt from a source task to a target task on the manifold of tasks. The procedure to calculate this distance can be thought of as the optimal way to transfer a pre-trained representation to a new task. The third result of this thesis uses a concept from Bayesian statistics called a reference prior to show that the optimal way to use unlabeled data to pre-train a representation is to maximize the mutual information of the weights with respect to the finite number of samples available for inference. We show how this amounts to constructing a set of diverse tasks on the unlabeled data and discuss how many existing empirically successful techniques in semi and self-supervised learning can be understood as implementing parts of the reference prior objective. We conclude the thesis by discussing a new class of models known as foundation models, which are trained on a large amount of diverse data to adapt to a wide range of downstream tasks. Using classical results in statistical learning theory and the new findings in this thesis as a backdrop, we argue instead for building foundation priors, which are supported on a set of representative expert models that span the space of tasks and can be combined to create a model for a new task.

Advisor
Chaudhari, Pratik
Date of degree
2023
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation