Rethinking the Inductive Bias of Optimization for Learning Under Distribution Shift

Loading...
Thumbnail Image
Penn collection
Interdisciplinary Centers, Units and Projects::Center for Undergraduate Research and Fellowships (CURF)::Fall Research Expo
Degree type
Discipline
Applied Mathematics
Electrical Engineering
Computer Sciences
Subject
Optimization for Deep Learning
Funder
Grant number
Copyright date
2025-09-15
Distributor
Related resources
Author
Shah, Alok
Zhang, Thomas
Zhang, Vincent
Matni, Nikolai
Contributor
Abstract

Deep learning architectures and optimizers have co-evolved with IID train/test paradigm. However, in deep generative modeling, sequential decision making, and language modeling, distribution shift is inevitable: at test time the model drives itself towards different inputs than those seen during training. We hypothesize that State-of-the-art architectures and optimizers over-fixate on loss minimization, aggressively descending along sharp curvature directions in the loss landscape, which often correspond to brittle feature learning. We present three case studies that elucidate the inductive bias of standard deep learning optimizers under shift, and propose layerwise preconditioning as a simple correction.

Advisor
Date of presentation
2025-10-06
Conference name
Conference dates
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
This project was supported with funding from the Class of 1971 Robert J. Holtz Fund.
Recommended citation
Collection