Effective Program Reasoning Using Bayesian Inference
Program analysis tools that statically find bugs in software still report a deluge of false alarms notwithstanding their widespread adoption. This is because they must necessarily make approximations in order to scale to large and complex programs. The focus of this dissertation is to make static program analyses more effective by guiding them towards true bugs and away from false alarms. We do this by augmenting logical program reasoning with probabilistic reasoning. We seek to overcome the incompleteness of a static analysis by associating each alarm it produces with a probability that it is a true alarm. We compute alarm probabilities by performing Bayesian inference on a probabilistic model derived from the execution of the analysis. Moreover, the probabilistic model allows us to recompute the probabilities by conditioning them on new evidence, thereby allowing to tailor the analysis to individual codebases and user needs. The alarms are ranked by the computed probabilities to mitigate the burden of inspecting false alarms.
We demonstrate the effectiveness of our approach in two practical systems. In one system, we leverage user feedback to iteratively improve the alarm ranking. The system starts with an initial ranking of alarms reported by the static analysis. In each iteration, the system seeks user feedback for the top-ranked alarm. Next, it generalizes this feedback by recomputing the probabilities of all the alarms conditioned on this feedback to produce an improved ranking for the next iteration. After a few iterations, true alarms rise to the top of the ranking, thus alleviating the burden of inspecting false alarms. In the second system, we leverage the completeness of dynamic analysis that is capable of observing concrete program executions, to rank the alarms reported by the static analysis. We hypothesize that a reported alarm is at most as complete as the analysis facts it is premised upon. For each analysis fact used by the static analysis in deducing an alarm, the system seeks a probability estimate for its completeness, from a dynamic analysis. The dynamic analysis estimates this by counting the number of times it observes the analysis fact during concrete program executions. The system then uses the estimated probabilities associated with the analysis facts to infer probabilities for alarms, and ranks alarms by the inferred probabilities.