Search results

Now showing 1 - 2 of 2
  • Publication
    MISIM: A Novel Code Similarity System
    (2020-06-01) Ye, Fangke; Zhou, Shengtian; Venkat, Anand; Marcus, Ryan; Tatbul, Nesime; Tithi, Jesmin J; Hasabnis, Niranjan; Petersen, Paul; Mattson, Timothy; Kraska, Tim; Dubey, Pradeep; Gottschlich, Justin E; Gottschlich, Justin E
    Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.
  • Publication
    Precision and Recall for Range-Based Anomaly Detection
    (2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, Stan
    Classical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time.