Learning Visual Concepts

Shivakumar, Shreyas Skandan

Learning Visual Concepts

Files

Shivakumar_upenngdas_0175C_16540.pdf (31.32 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Data Science
Computer Sciences
Electrical Engineering

Subject

Classification
Computer Vision
Machine Perception
Object Detection
Object Recognition

Copyright date

01/01/2024

Permalink

https://repository.upenn.edu/handle/20.500.14332/60385

View all metadata

Author

Shivakumar, Shreyas Skandan

Abstract

We propose a framework to use off-the-shelf pre-trained object detection models and extend them for use on unseen datasets in a manner requiring little to no modification of the original architecture, and by adding only a few additional components to the overall pipeline. Motivated by the role of attributes in zero-shot-learning paradigms, we define conceptual groups by using positive and negative exemplars retroactively, and evaluate the feasibility of recognizing a variety of these proposed conceptual groups in a corpus of previously unseen data, including unseen categories. We conduct experiments with networks trained on the COCO dataset, and utilize Open-Images-V7 as our held out unseen dataset. Our analysis suggests that existing off-the-shelf object detection networks such as Faster-RCNN can be leveraged to extract useful information beyond the scope of a straightforward category prediction framework. This information can be used to operationalize the idea of concept learning through a set of positive and negative exemplars and a simple linear SVM operating on the features produced by the deep network. We compare this approach to vision enabled large language models such as LLaVA, CogVLM and GPT4V, and show a strong baseline performance with lower resource requirements. Additionally, we illustrate that this method can be scaled to larger concept sets by validating this approach on a larger set of concepts in the LVIS dataset. We illustrate a few approaches to better understand the semantic topology of their learned feature space, and we measure the feasibility of using these features for the identification of the proposed conceptual groups. We propose strategies to leverage this information to predict these conceptual groups on previously unseen samples containing unseen class categories.

Advisor

Taylor, Camillo, J

Date of degree

2024

Collection

Dissertations and Theses