Neurosymbolic Programming in Scallop: Design, Implementation, and Applications

Li, Ziyang

Neurosymbolic Programming in Scallop: Design, Implementation, and Applications

Files

Li_upenngdas_0175C_17236.pdf (25.2 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

Machine learning
Neurosymbolic methods
Programming languages

Copyright date

2025

Permalink

https://repository.upenn.edu/handle/20.500.14332/61776

View all metadata

Author

Li, Ziyang

Abstract

Neurosymbolic programming combines the otherwise complementary worlds of deep learning and symbolic reasoning. It thereby enables more accurate, interpretable, and domain-aware AI solutions that surpass purely neural or symbolic approaches. While significant advances have been made in domain-specific neurosymbolic methods, the field lacks a unified programming system for general neurosymbolic applications. This dissertation proposes Scallop, a language for neurosymbolic programming. Scallop is relational and declarative, offering expressive reasoning capabilities such as recursion, negation, and aggregation. Scallop supports discrete, probabilistic, and differentiable modes of reasoning, allowing for seamless integration with diverse neurosymbolic pipelines. Scallop employs a provenance framework, which supports numerous reasoning back-ends that balance reasoning accuracy and scalability.Additionally, Scallop offers extensive tooling to integrate with PyTorch and a foreign interface for incorporating modern foundation models. Beyond presenting the design and implementation of Scallop, this dissertation demonstrates its versatility through applications in the domains of computer vision, natural language processing, security, program analysis, planning, and bioinformatics. These applications span natural language reasoning, image and video scene graph generation, program vulnerability detection, and RNA secondary structure prediction. Through extensive empirical studies, we demonstrate that Scallop-based neurosymbolic solutions achieve superior accuracy, interpretability, and data efficiency.

Advisor

Naik, Mayur

Date of degree

2025

Collection

Dissertations and Theses