Active Perception for 3D Scene Representations in Robotics from an Information Theoretic Perspective

Loading...
Thumbnail Image

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

3D reconstruction
Active Learning
Computer Vision
Machine Learning
Robotics

Funder

Grant number

License

Copyright date

2025

Distributor

Related resources

Contributor

Abstract

How much do we know about the environment? This is one of the primary questions we ask when we want to perform actions more effectively in an environment. If we can answer this question, we can tackle a further question: where should I observe the environment to help perform the task? These questions relate to active learning, which addresses an ill-posed problem: approximating the information gained before making observations. This issue is critical in robotics because taking an observation comes with a cost: time, energy, and risk. This dissertation explores the development of active robot perception systems utilizing radiance field representations, particularly 3D Gaussian Splatting, from an information-theoretic perspective. We first address the challenge of quantifying observed information in 3D scene representations. By leveraging the Fisher Information Matrix (FIM), we compute the expected information gain (EIG) without requiring ground-truth observations of candidate viewpoints.
Additionally, we tackle Active Simultaneous Localization and Mapping(SLAM) by incorporating localization uncertainty into the 3D Gaussian representation. Our active SLAM system balances the risk of localization errors with the reward of exploring new regions. We also employ off-the-shelf multimodal language models for long-term planning to incorporate prior knowledge of the environment for exploration. Finally, we extend our approach to task-driven perception, focusing on improving representations for specific robotic tasks such as grasping in cluttered environments. By exploiting energy-based models to define log-likelihood for the grasping task, we adapt our next-best-view selection to be task-driven, optimizing perception for scene reconstruction and enhancing task performance.

Date of degree

2025

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation