Bridging Visual Recognition and Synthesis

Zhang, Lingzhi

Bridging Visual Recognition and Synthesis

Files

Zhang_upenngdas_0175C_16003.pdf (190.47 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

computer vision

Copyright date

2023

Permalink

https://repository.upenn.edu/handle/20.500.14332/59305

View all metadata

Author

Zhang, Lingzhi

Abstract

Human beings possess an innate ability to fluidly engage with unfamiliar environments and executeimaginative interactions with visual objects. It can be postulated that this prowess stems from the human capacity to not only identify visual objects but also to conceptualize their potential interactions and anticipate possible future scenarios. The dynamic interplay between recognition and imagination capabilities culminates in highly adaptable and creative problem-solving skills. Do modern AI vision systems possess similar synergistic capabilities in recognition and imagination(or synthesis)? While there has been significant advancement in visual recognition and synthesis, the interplay between the two domains appears to be relatively under-explored. In this thesis, our objective is to delve into the reciprocal relationship between visual recognition and synthesis. Initially, we investigate how recognition models can be harnessed to enhance visual synthesis through: 1) employing recognition models to provide semantic and geometric information that explicitly guide visual synthesis; and 2) utilizing segmentation-based models to pinpoint and refine perceptual artifacts within generated images. Conversely, we examine the ways in which visual synthesis can be employed to enhance visual recognition by 1) utilizing generative models to predict a spectrum of hypotheses for highly ambiguous recognition scenarios; and 2) generating novel yet plausible training images through synthesis to counterbalance biases in training data and boost recognition accuracy. Finally, we delve into two emerging directions that are inspired by our previous examination ofthe symbiosis between visual recognition and synthesis. First, we demonstrate that by utilizing generative features extracted from diffusion outpainting, we can build dynamic and meaningful associative representations that cluster objects based on similarities in interactions or functionalities. Second, we put forward the concept of constructing a data flywheel system by integrating recognitionand synthesis models into a closed-loop system, with the objective of enabling AI models to evolve autonomously through progressively refined data with minimal human intervention.

Advisor

Shi, Jianbo, JS

Date of degree

2023

Collection

Dissertations and Theses