Learning Visual Concepts
Degree type
Graduate group
Discipline
Computer Sciences
Electrical Engineering
Subject
Computer Vision
Machine Perception
Object Detection
Object Recognition
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
We propose a framework to use off-the-shelf pre-trained object detection models and extend them for use on unseen datasets in a manner requiring little to no modification of the original architecture, and by adding only a few additional components to the overall pipeline. Motivated by the role of attributes in zero-shot-learning paradigms, we define conceptual groups by using positive and negative exemplars retroactively, and evaluate the feasibility of recognizing a variety of these proposed conceptual groups in a corpus of previously unseen data, including unseen categories. We conduct experiments with networks trained on the COCO dataset, and utilize Open-Images-V7 as our held out unseen dataset. Our analysis suggests that existing off-the-shelf object detection networks such as Faster-RCNN can be leveraged to extract useful information beyond the scope of a straightforward category prediction framework. This information can be used to operationalize the idea of concept learning through a set of positive and negative exemplars and a simple linear SVM operating on the features produced by the deep network. We compare this approach to vision enabled large language models such as LLaVA, CogVLM and GPT4V, and show a strong baseline performance with lower resource requirements. Additionally, we illustrate that this method can be scaled to larger concept sets by validating this approach on a larger set of concepts in the LVIS dataset. We illustrate a few approaches to better understand the semantic topology of their learned feature space, and we measure the feasibility of using these features for the identification of the proposed conceptual groups. We propose strategies to leverage this information to predict these conceptual groups on previously unseen samples containing unseen class categories.