Foundation Reward Models for Robot Learning
Degree type
Graduate group
Discipline
Subject
Reinforcement Learning
Robot Learning
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Learning-based algorithms for robotic control have achieved remarkable success in recent years. However, a fundamental bottleneck in existing approaches is their heavy reliance on human supervision, whether through expert demonstrations in imitation learning or carefully engineered reward functions in reinforcement learning. This dependency limits scalability, as it is infeasible to collect demonstrations or design reward functions for every possible task and environment. This thesis proposes a viable path towards scaling robotics by introducing foundation reward models - models capable of generating dense reward labels for robot state, sensory observations, and actions across a wide range of tasks and embodiments. With foundation reward models, robots can train on more diverse, mixed-quality data and and learn from data that they gathered themselves, bypassing the bottleneck of human supervision. However, the key technical challenge is the lack of available robot data to train such models to generalize. In addressing this challenge, we present two classes of approaches to train foundation reward models that can be trained entirely without robot data: (1) a novel offline reinforcement learning algorithm that learns goal-conditioned value functions from unstructured human videos, enabling zero-shot reward generation for unseen robot tasks specified in image or language modalities, and (2) a framework combining large language models with search to automatically design programmatic reward functions for robot simulation environments, enabling sim-to-real transfer of novel skills such as a quadruped robot dog balancing on a yoga ball.
Advisor
Jayaraman, Dinesh