Search results
Now showing 1 - 6 of 6
Publication MISIM: A Novel Code Similarity System(2020-06-01) Ye, Fangke; Zhou, Shengtian; Venkat, Anand; Marcus, Ryan; Tatbul, Nesime; Tithi, Jesmin J; Hasabnis, Niranjan; Petersen, Paul; Mattson, Timothy; Kraska, Tim; Dubey, Pradeep; Gottschlich, Justin E; Gottschlich, Justin ECode similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.Publication The Three Pillars of Machine Programming(2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Solar-Lezama, Armando; Tatbul, Nesime; Carbin, Michael; Rinard, Martin; Barzilay, Regina; Amarasinghe, Saman; Tenenbaum, Joshua B; Mattson, TimothyIn this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are:(i) intention,(ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.Publication Precision and Recall for Time Series(2018-01-01) Tatbul, Nesime; Lee, Tae J; Zdonik, Stan; Gottschlich, Justin E; Gottschlich, Justin EClassical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet, many real-world anomalies are range-based, meaning they occur over a period of time. Motivated by this observation, we present a new mathematical model to evaluate the accuracy of time series classification algorithms. Our model expands the well-known Precision and Recall metrics to measure ranges, while simultaneously enabling customization support for domain-specific preferences.Publication Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection(2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, StanThis short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection.Publication Precision and Recall for Range-Based Anomaly Detection(2018-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, StanClassical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time.Publication A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions(2019-01-01) Gottschlich, Justin E; Gottschlich, Justin E; Tatbul, Nesime; Turek, Javier S; Mattson, Timothy; Muzahid, AbdullahThe field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf–a novel approach to automate regression testing that utilizes three core techniques:(i) zero-positive learning,(ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf’s generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives.