The Principles of Learning on Multiple Tasks

Loading...
Thumbnail Image
Degree type
PhD
Graduate group
Computer and Information Science
Discipline
Data Science
Computer Sciences
Subject
Deep learning
Learning from multiple tasks
Self-supervised learning
Statistical learning theory
Funder
Grant number
License
Copyright date
01/01/2025
Distributor
Related resources
Author
Ramesh, Rahul
Contributor
Abstract

Deep networks are increasingly trained on data from multiple tasks with the goal of sharing synergistic information across related tasks. Vision models, for example, are trained on over a billion images for tasks like object recognition, depth prediction and semantic segmentation. With this motivation, this dissertation studies the principles behind how to optimally train representations on multiple tasks and attempts to answer why we are able to learn representations shared across many tasks. In the first part of the dissertation, we develop theories for training representations on multiple tasks using labeled or unlabeled data. We challenge the notion that a single pretrained representation is optimal for all tasks and show that it is optimal to instead train an ensemble of models that span the space of tasks. For labeled data, we use the lens of statistical learning theory to discuss how to: (i) split the capacity of the learner amongst related tasks; (ii) reweigh the objectives of different tasks; (iii) handle tasks that change over time. For unlabeled data, we: (i) develop a theory for self-supervised learning to train an ensemble of models that span the space of tasks; (ii) show how masked autoencoders can be adapted to different tasks by changing the scale of the noise. The second part of this dissertation is dedicated to characterizing the nature of typical tasks, with the goal of understanding why representation learning works. The shocking result is that many typical tasks are highly redundant functions of the input, i.e., subspaces that vary the most and those that vary the least are both highly predictive of the outputs. We believe that this redundancy is key to understanding why we can generalize to many tasks, not just in machines, but also in organisms.

Advisor
Chaudhari, Pratik
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation