Bridging Visual Recognition and Synthesis
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Human beings possess an innate ability to fluidly engage with unfamiliar environments and executeimaginative interactions with visual objects. It can be postulated that this prowess stems from the human capacity to not only identify visual objects but also to conceptualize their potential interactions and anticipate possible future scenarios. The dynamic interplay between recognition and imagination capabilities culminates in highly adaptable and creative problem-solving skills. Do modern AI vision systems possess similar synergistic capabilities in recognition and imagination(or synthesis)? While there has been significant advancement in visual recognition and synthesis, the interplay between the two domains appears to be relatively under-explored. In this thesis, our objective is to delve into the reciprocal relationship between visual recognition and synthesis. Initially, we investigate how recognition models can be harnessed to enhance visual synthesis through: 1) employing recognition models to provide semantic and geometric information that explicitly guide visual synthesis; and 2) utilizing segmentation-based models to pinpoint and refine perceptual artifacts within generated images. Conversely, we examine the ways in which visual synthesis can be employed to enhance visual recognition by 1) utilizing generative models to predict a spectrum of hypotheses for highly ambiguous recognition scenarios; and 2) generating novel yet plausible training images through synthesis to counterbalance biases in training data and boost recognition accuracy. Finally, we delve into two emerging directions that are inspired by our previous examination ofthe symbiosis between visual recognition and synthesis. First, we demonstrate that by utilizing generative features extracted from diffusion outpainting, we can build dynamic and meaningful associative representations that cluster objects based on similarities in interactions or functionalities. Second, we put forward the concept of constructing a data flywheel system by integrating recognitionand synthesis models into a closed-loop system, with the objective of enabling AI models to evolve autonomously through progressively refined data with minimal human intervention.