Search results

Now showing 1 - 6 of 6
  • Publication
    MISIM: A Novel Code Similarity System
    (2020-06-01) Ye, Fangke; Zhou, Shengtian; Venkat, Anand; Marcus, Ryan; Tatbul, Nesime; Tithi, Jesmin J; Hasabnis, Niranjan; Petersen, Paul; Mattson, Timothy; Kraska, Tim; Dubey, Pradeep; Gottschlich, Justin E; Gottschlich, Justin E
    Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.
  • Publication
    The Three Pillars of Machine Programming
    (2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Solar-Lezama, Armando; Tatbul, Nesime; Carbin, Michael; Rinard, Martin; Barzilay, Regina; Amarasinghe, Saman; Tenenbaum, Joshua B; Mattson, Timothy
    In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are:(i) intention,(ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.
  • Publication
    Precision and Recall for Time Series
    (2018-01-01) Tatbul, Nesime; Lee, Tae J; Zdonik, Stan; Gottschlich, Justin E; Gottschlich, Justin E
    Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet, many real-world anomalies are range-based, meaning they occur over a period of time. Motivated by this observation, we present a new mathematical model to evaluate the accuracy of time series classification algorithms. Our model expands the well-known Precision and Recall metrics to measure ranges, while simultaneously enabling customization support for domain-specific preferences.
  • Publication
    Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection
    (2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, Stan
    This short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection.
  • Publication
    Precision and Recall for Range-Based Anomaly Detection
    (2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, Stan
    Classical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time.
  • Publication
    A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions
    (2019-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Turek, Javier S; Mattson, Timothy; Muzahid, Abdullah
    The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf–a novel approach to automate regression testing that utilizes three core techniques:(i) zero-positive learning,(ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf’s generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives.