Bridging Visual Recognition and Synthesis

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
computer vision
Funder
Grant number
License
Copyright date
2023
Distributor
Related resources
Author
Zhang, Lingzhi
Contributor
Abstract

Human beings possess an innate ability to fluidly engage with unfamiliar environments and executeimaginative interactions with visual objects. It can be postulated that this prowess stems from the human capacity to not only identify visual objects but also to conceptualize their potential interactions and anticipate possible future scenarios. The dynamic interplay between recognition and imagination capabilities culminates in highly adaptable and creative problem-solving skills. Do modern AI vision systems possess similar synergistic capabilities in recognition and imagination(or synthesis)? While there has been significant advancement in visual recognition and synthesis, the interplay between the two domains appears to be relatively under-explored. In this thesis, our objective is to delve into the reciprocal relationship between visual recognition and synthesis. Initially, we investigate how recognition models can be harnessed to enhance visual synthesis through: 1) employing recognition models to provide semantic and geometric information that explicitly guide visual synthesis; and 2) utilizing segmentation-based models to pinpoint and refine perceptual artifacts within generated images. Conversely, we examine the ways in which visual synthesis can be employed to enhance visual recognition by 1) utilizing generative models to predict a spectrum of hypotheses for highly ambiguous recognition scenarios; and 2) generating novel yet plausible training images through synthesis to counterbalance biases in training data and boost recognition accuracy. Finally, we delve into two emerging directions that are inspired by our previous examination ofthe symbiosis between visual recognition and synthesis. First, we demonstrate that by utilizing generative features extracted from diffusion outpainting, we can build dynamic and meaningful associative representations that cluster objects based on similarities in interactions or functionalities. Second, we put forward the concept of constructing a data flywheel system by integrating recognitionand synthesis models into a closed-loop system, with the objective of enabling AI models to evolve autonomously through progressively refined data with minimal human intervention.

Advisor
Shi, Jianbo, JS
Date of degree
2023
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation