Foundation Reward Models for Robot Learning

Ma, Yecheng

Foundation Reward Models for Robot Learning

Files

Ma_upenngdas_0175C_17256.pdf (86.17 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

Foundation Models
Reinforcement Learning
Robot Learning

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61795

View all metadata

Author

Ma, Yecheng

Abstract

Learning-based algorithms for robotic control have achieved remarkable success in recent years. However, a fundamental bottleneck in existing approaches is their heavy reliance on human supervision, whether through expert demonstrations in imitation learning or carefully engineered reward functions in reinforcement learning. This dependency limits scalability, as it is infeasible to collect demonstrations or design reward functions for every possible task and environment. This thesis proposes a viable path towards scaling robotics by introducing foundation reward models - models capable of generating dense reward labels for robot state, sensory observations, and actions across a wide range of tasks and embodiments. With foundation reward models, robots can train on more diverse, mixed-quality data and and learn from data that they gathered themselves, bypassing the bottleneck of human supervision. However, the key technical challenge is the lack of available robot data to train such models to generalize. In addressing this challenge, we present two classes of approaches to train foundation reward models that can be trained entirely without robot data: (1) a novel offline reinforcement learning algorithm that learns goal-conditioned value functions from unstructured human videos, enabling zero-shot reward generation for unseen robot tasks specified in image or language modalities, and (2) a framework combining large language models with search to automatically design programmatic reward functions for robot simulation environments, enabling sim-to-real transfer of novel skills such as a quadruped robot dog balancing on a yoga ball.

Advisor

Bastani, Osbert
Jayaraman, Dinesh

Date of degree

2025

Collection

Dissertations and Theses