Rethinking the Inductive Bias of Optimization for Learning Under Distribution Shift

Shah, Alok; Zhang, Thomas; Zhang, Vincent; Matni, Nikolai

Rethinking the Inductive Bias of Optimization for Learning Under Distribution Shift

Files

Penn collection

Interdisciplinary Centers, Units and Projects::Center for Undergraduate Research and Fellowships (CURF)::Fall Research Expo

Discipline

Applied Mathematics
Electrical Engineering
Computer Sciences

Subject

Optimization for Deep Learning

License

https://creativecommons.org/licenses/by/4.0/

Copyright date

2025-09-15

Permalink

https://repository.upenn.edu/handle/20.500.14332/62073

View all metadata

Author

Shah, Alok

Zhang, Thomas

Zhang, Vincent

Matni, Nikolai

Abstract

Deep learning architectures and optimizers have co-evolved with IID train/test paradigm. However, in deep generative modeling, sequential decision making, and language modeling, distribution shift is inevitable: at test time the model drives itself towards different inputs than those seen during training. We hypothesize that State-of-the-art architectures and optimizers over-fixate on loss minimization, aggressively descending along sharp curvature directions in the loss landscape, which often correspond to brittle feature learning. We present three case studies that elucidate the inductive bias of standard deep learning optimizers under shift, and propose layerwise preconditioning as a simple correction.

Date of presentation

2025-10-06

Comments

This project was supported with funding from the Class of 1971 Robert J. Holtz Fund.

Collection

Presentations