Foundation Reward Models for Robot Learning

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
Foundation Models
Reinforcement Learning
Robot Learning
Funder
Grant number
License
Copyright date
2025
Distributor
Related resources
Author
Ma, Yecheng
Contributor
Abstract

Learning-based algorithms for robotic control have achieved remarkable success in recent years. However, a fundamental bottleneck in existing approaches is their heavy reliance on human supervision, whether through expert demonstrations in imitation learning or carefully engineered reward functions in reinforcement learning. This dependency limits scalability, as it is infeasible to collect demonstrations or design reward functions for every possible task and environment. This thesis proposes a viable path towards scaling robotics by introducing foundation reward models - models capable of generating dense reward labels for robot state, sensory observations, and actions across a wide range of tasks and embodiments. With foundation reward models, robots can train on more diverse, mixed-quality data and and learn from data that they gathered themselves, bypassing the bottleneck of human supervision. However, the key technical challenge is the lack of available robot data to train such models to generalize. In addressing this challenge, we present two classes of approaches to train foundation reward models that can be trained entirely without robot data: (1) a novel offline reinforcement learning algorithm that learns goal-conditioned value functions from unstructured human videos, enabling zero-shot reward generation for unseen robot tasks specified in image or language modalities, and (2) a framework combining large language models with search to automatically design programmatic reward functions for robot simulation environments, enabling sim-to-real transfer of novel skills such as a quadruped robot dog balancing on a yoga ball.

Advisor
Bastani, Osbert
Jayaraman, Dinesh
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation