LEARNING TO ACT FROM DIVERSE DATA SOURCES VIA WORLD MODELS
The next frontier in learning to act is generalization - the ability of the agent to operate in a diverse set of environments and to solve a diverse set of tasks. How can we learn generalist agents? I argue that learning world models that predict future outcomes of actions directly from image observations is a uniquely suitable approach for training generalizable agents. I will discuss how world models can enable powerful unsupervised exploration and how to use a single world model to learn a diverse range of tasks. An implementation of this agent learns to solve a variety of kitchen or sorting tasks using a simulated robotic arm, as well as achieve arbitrary poses with a biped or a quadruped agent without any reward or demonstration supervision. I will further discuss how we can scale world model training by considering datasets of passive videos and other improvements to world models and planning. An agent that possesses a world model can use it to learn general knowledge from diverse datasets and to solve diverse tasks; I believe world models provide a principled and promising path towards building more and more general-purpose machines.