Active Perception for 3D Scene Representations in Robotics from an Information Theoretic Perspective

Jiang, Wen

Active Perception for 3D Scene Representations in Robotics from an Information Theoretic Perspective

Files

Jiang_upenngdas_0175C_17136.pdf (110.18 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

3D Reconstruction
Active Learning
Computer Vision
Machine Learning
Robotics

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61682

View all metadata

Author

Jiang, Wen

Abstract

How much do we know about the environment? This is one of the primary questions we ask when we want to perform actions more effectively in an environment. If we can answer this question, we can tackle a further question: where should I observe the environment to help perform the task? These questions relate to active learning, which addresses an ill-posed problem: approximating the information gained before making observations. This issue is critical in robotics because taking an observation comes with a cost: time, energy, and risk. This dissertation explores the development of active robot perception systems utilizing radiance field representations, particularly 3D Gaussian Splatting, from an information-theoretic perspective. We first address the challenge of quantifying observed information in 3D scene representations. By leveraging the Fisher Information Matrix (FIM), we compute the expected information gain (EIG) without requiring ground-truth observations of candidate viewpoints.
Additionally, we tackle Active Simultaneous Localization and Mapping(SLAM) by incorporating localization uncertainty into the 3D Gaussian representation. Our active SLAM system balances the risk of localization errors with the reward of exploring new regions. We also employ off-the-shelf multimodal language models for long-term planning to incorporate prior knowledge of the environment for exploration. Finally, we extend our approach to task-driven perception, focusing on improving representations for specific robotic tasks such as grasping in cluttered environments. By exploiting energy-based models to define log-likelihood for the grasping task, we adapt our next-best-view selection to be task-driven, optimizing perception for scene reconstruction and enhancing task performance.

Advisor

Daniilidis, Kostas

Date of degree

2025

Collection

Dissertations and Theses