Active Perception for 3D Scene Representations in Robotics from an Information Theoretic Perspective
Degree type
Graduate group
Discipline
Subject
Active Learning
Computer Vision
Machine Learning
Robotics
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
How much do we know about the environment? This is one of the primary questions we ask when we want to perform actions more effectively in an environment. If we can answer this question, we can tackle a further question: where should I observe the environment to help perform the task? These questions relate to active learning, which addresses an ill-posed problem: approximating the information gained before making observations. This issue is critical in robotics because taking an observation comes with a cost: time, energy, and risk. This dissertation explores the development of active robot perception systems utilizing radiance field representations, particularly 3D Gaussian Splatting, from an information-theoretic perspective.
We first address the challenge of quantifying observed information in 3D scene representations. By leveraging the Fisher Information Matrix (FIM), we compute the expected information gain (EIG) without requiring ground-truth observations of candidate viewpoints.
Additionally, we tackle Active Simultaneous Localization and Mapping(SLAM) by incorporating localization uncertainty into the 3D Gaussian representation.
Our active SLAM system balances the risk of localization errors with the reward of exploring new regions. We also employ off-the-shelf multimodal language models for long-term planning to incorporate prior knowledge of the environment for exploration.
Finally, we extend our approach to task-driven perception, focusing on improving representations for specific robotic tasks such as grasping in cluttered environments.
By exploiting energy-based models to define log-likelihood for the grasping task, we adapt our next-best-view selection to be task-driven, optimizing perception for scene reconstruction and enhancing task performance.