The Principles of Learning on Multiple Tasks

Ramesh, Rahul

The Principles of Learning on Multiple Tasks

Files

Ramesh_upenngdas_0175C_17008.pdf (19.8 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Data Science
Computer Sciences

Subject

Deep learning
Learning from multiple tasks
Self-supervised learning
Statistical learning theory

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61362

View all metadata

Author

Ramesh, Rahul

Abstract

Deep networks are increasingly trained on data from multiple tasks with the goal of sharing synergistic information across related tasks. Vision models, for example, are trained on over a billion images for tasks like object recognition, depth prediction and semantic segmentation. With this motivation, this dissertation studies the principles behind how to optimally train representations on multiple tasks and attempts to answer why we are able to learn representations shared across many tasks. In the first part of the dissertation, we develop theories for training representations on multiple tasks using labeled or unlabeled data. We challenge the notion that a single pretrained representation is optimal for all tasks and show that it is optimal to instead train an ensemble of models that span the space of tasks. For labeled data, we use the lens of statistical learning theory to discuss how to: (i) split the capacity of the learner amongst related tasks; (ii) reweigh the objectives of different tasks; (iii) handle tasks that change over time. For unlabeled data, we: (i) develop a theory for self-supervised learning to train an ensemble of models that span the space of tasks; (ii) show how masked autoencoders can be adapted to different tasks by changing the scale of the noise. The second part of this dissertation is dedicated to characterizing the nature of typical tasks, with the goal of understanding why representation learning works. The shocking result is that many typical tasks are highly redundant functions of the input, i.e., subspaces that vary the most and those that vary the least are both highly predictive of the outputs. We believe that this redundancy is key to understanding why we can generalize to many tasks, not just in machines, but also in organisms.

Advisor

Chaudhari, Pratik

Date of degree

2025

Collection

Dissertations and Theses