Date of Award
Doctor of Philosophy (PhD)
Computer and Information Science
The enormous rise in the scale, scope, and complexity of software projects has created a thriving marketplace for program reasoning tools. Despite broad adoption by industry, developing such tools remains challenging. For each project, specialized heuristics or analysis rules have to be carefully designed and customized, which requires non-trivial expertise. Recently machine learning, especially deep learning, achieved remarkable successes in many challenging areas such as image recognition and strategy game playing. Inspired by these successes, this thesis is concerned with the following question: can program reasoning be effectively learned and automatically improved over time?
This thesis demonstrates that learning-based techniques can be a new driving force for tackling fundamental program reasoning challenges, particularly, program synthesis and program verification. First, this thesis presents a scalable inductive logic programming (ILP) framework, Difflog, which can synthesize a rich set of logical rules used in various important domains like program analysis, relational query and knowledge discovery. Unlike classic program synthesis techniques, which heavily rely on manually designed heuristics or symbolic constraint solvers, Difflog leverages efficient gradient-based approaches, which is possible due to a novel numerical relaxation of logical rules. Second, this thesis presents an end-to-end deep learning framework for program verification, Code2Inv, which directly maps a piece of source code to its related proof without requiring any annotations from human experts. Code2Inv is inspired by the recent AI breakthrough, AlphaGo; however, unlike the two-dimensional game board, programs have sophisticated structures and correct proofs are extremely rare, posing unique challenges on representation learning and reinforcement learning. To address these challenges, we leverage advances of graph neural networks and develop a counterexample-based smooth reward mechanism. Code2Inv outperforms state-of-the-art approaches that are based on manually designed heuristics or decision tree learning, and the learned policy by Code2Inv can generalize to unseen programs. Furthermore, Code2Inv can be flexibly customized as a Constrained Horn Clause (CHC) solver as well as a meta-solver for syntax-guided program synthesis tasks.
Si, Xujie, "Learning-Aided Program Synthesis And Verification" (2020). Publicly Accessible Penn Dissertations. 3788.