The Principles of Learning on Multiple Tasks

Loading...
Thumbnail Image

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Data Science
Computer Sciences

Subject

Deep learning
Learning from multiple tasks
Self-supervised learning
Statistical learning theory

Funder

Grant number

License

Copyright date

2025

Distributor

Related resources

Contributor

Abstract

Deep networks are increasingly trained on data from multiple tasks with the goal of sharing synergistic information across related tasks. Vision models, for example, are trained on over a billion images for tasks like object recognition, depth prediction and semantic segmentation. With this motivation, this dissertation studies the principles behind how to optimally train representations on multiple tasks and attempts to answer why we are able to learn representations shared across many tasks. In the first part of the dissertation, we develop theories for training representations on multiple tasks using labeled or unlabeled data. We challenge the notion that a single pretrained representation is optimal for all tasks and show that it is optimal to instead train an ensemble of models that span the space of tasks. For labeled data, we use the lens of statistical learning theory to discuss how to: (i) split the capacity of the learner amongst related tasks; (ii) reweigh the objectives of different tasks; (iii) handle tasks that change over time. For unlabeled data, we: (i) develop a theory for self-supervised learning to train an ensemble of models that span the space of tasks; (ii) show how masked autoencoders can be adapted to different tasks by changing the scale of the noise. The second part of this dissertation is dedicated to characterizing the nature of typical tasks, with the goal of understanding why representation learning works. The shocking result is that many typical tasks are highly redundant functions of the input, i.e., subspaces that vary the most and those that vary the least are both highly predictive of the outputs. We believe that this redundancy is key to understanding why we can generalize to many tasks, not just in machines, but also in organisms.

Date of degree

2025

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Recommended citation