Learning-Aided Program Synthesis And Verification

Si, Xujie

Learning-Aided Program Synthesis And Verification

Files

Si_upenngdas_0175C_14311.pdf (2.19 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Subject

deep learning
machine learning
program analysis
program synthesis
program verification
reinforcement learning
Computer Sciences

Copyright date

2021-08-31T20:20:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/30846

View all metadata

Author

Si, Xujie

Abstract

The enormous rise in the scale, scope, and complexity of software projects has created a thriving marketplace for program reasoning tools. Despite broad adoption by industry, developing such tools remains challenging. For each project, specialized heuristics or analysis rules have to be carefully designed and customized, which requires non-trivial expertise. Recently machine learning, especially deep learning, achieved remarkable successes in many challenging areas such as image recognition and strategy game playing. Inspired by these successes, this thesis is concerned with the following question: can program reasoning be effectively learned and automatically improved over time? This thesis demonstrates that learning-based techniques can be a new driving force for tackling fundamental program reasoning challenges, particularly, program synthesis and program verification. First, this thesis presents a scalable inductive logic programming (ILP) framework, Difflog, which can synthesize a rich set of logical rules used in various important domains like program analysis, relational query and knowledge discovery. Unlike classic program synthesis techniques, which heavily rely on manually designed heuristics or symbolic constraint solvers, Difflog leverages efficient gradient-based approaches, which is possible due to a novel numerical relaxation of logical rules. Second, this thesis presents an end-to-end deep learning framework for program verification, Code2Inv, which directly maps a piece of source code to its related proof without requiring any annotations from human experts. Code2Inv is inspired by the recent AI breakthrough, AlphaGo; however, unlike the two-dimensional game board, programs have sophisticated structures and correct proofs are extremely rare, posing unique challenges on representation learning and reinforcement learning. To address these challenges, we leverage advances of graph neural networks and develop a counterexample-based smooth reward mechanism. Code2Inv outperforms state-of-the-art approaches that are based on manually designed heuristics or decision tree learning, and the learned policy by Code2Inv can generalize to unseen programs. Furthermore, Code2Inv can be flexibly customized as a Constrained Horn Clause (CHC) solver as well as a meta-solver for syntax-guided program synthesis tasks.

Advisor

Mayur Naik

Date of degree

2020-01-01

Collection

Dissertations and Theses