Search results

Now showing 1 - 10 of 12
  • Publication
    MISIM: A Novel Code Similarity System
    (2020-06-01) Ye, Fangke; Zhou, Shengtian; Venkat, Anand; Marcus, Ryan; Tatbul, Nesime; Tithi, Jesmin J; Hasabnis, Niranjan; Petersen, Paul; Mattson, Timothy; Kraska, Tim; Dubey, Pradeep; Sarkar, Vivek; Gottschlich, Justin E
    Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.
  • Publication
    The Three Pillars of Machine Programming
    (2018-01-01) Gottschlich, Justin E; Solar-Lezama, Armando; Tatbul, Nesime; Carbin, Michael; Rinard, Martin; Barzilay, Regina; Amarasinghe, Saman; Tenenbaum, Joshua B; Mattson, Timothy
    In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are:(i) intention,(ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.
  • Publication
    Toward Scalable Verification for Safety-Critical Deep Networks
    (2018-01-01) Kuper, Lindsey; Katz, Guy; Gottschlich, Justin E; Julian, Kyle; Barrett, Clark; Kochenderfer, Mykel J
    The increasing use of deep neural networks for safety-critical applications, such as autonomous driving and flight control, raises concerns about their safety and reliability. Formal verification can address these concerns by guaranteeing that a deep learning system operates as intended, but the state of the art is limited to small systems. In this work-in-progress report we give an overview of our work on mitigating this difficulty, by pursuing two complementary directions: devising scalable verification techniques, and identifying design choices that result in deep learning systems that are more amenable to verification.
  • Publication
    ControlFlag: A Self-supervised Idiosyncratic Pattern Detection System for Software Control Structures
    (2020-01-01) Hasabnis, Niranjan; Gottschlich, Justin E
    Software debugging has been shown to utilize upwards of 50% of developers’ time. Machine programming, the field concerned with the automation of software (and hardware) development, has recently made progress in both research and production-quality automated debugging systems. In this paper, we present ControlFlag, a system that detects possible idiosyncratic violations in software control structures. ControlFlag also suggests possible corrections in the event a true error is detected. A novelty of ControlFlag is that it is entirely self-supervised; that is, it requires no labels to learn about the potential idiosyncratic programming pattern violations. In addition to presenting ControlFlag’s design, we also provide an abbreviated experimental evaluation.
  • Publication
    Software Language Comprehension using a Program-Derived Semantics Graph
    (2020-01-01) Iyer, Roshni G; Sun, Yizhou; Wang, Wei; Gottschlich, Justin E
    Traditional code transformation structures, such as abstract syntax trees (ASTs), conteXtual flow graphs (XFGs), and more generally, compiler intermediate representations (IRs), may have limitations in extracting higher-order semantics from code. While work has already begun on higher-order semantics lifting (e.g., Aroma’s simplified parse tree (SPT), verified lifting’s lambda calculi, and Halide’s intentional domain specific language (DSL)), research in this area is still immature. To continue to advance this research, we present the program-derived semantics graph (PSG), a new graphical structure to capture semantics of code. The PSG is designed to provide a single structure for capturing program semantics at multiple levels of abstraction. The PSG may be in a class of emerging structural representations that cannot be built from a traditional set of predefined rules and instead must be learned. In this paper, we describe the PSG and its fundamental structural differences compared to state-of-the-art structures. Although our exploration into the PSG is in its infancy, our early results and architectural analysis indicate it is a promising new research direction to automatically extract program semantics.
  • Publication
    Precision and Recall for Time Series
    (2018-01-01) Tatbul, Nesime; Lee, Tae J; Zdonik, Stan; Alam, Mejbah; Gottschlich, Justin E
    Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet, many real-world anomalies are range-based, meaning they occur over a period of time. Motivated by this observation, we present a new mathematical model to evaluate the accuracy of time series classification algorithms. Our model expands the well-known Precision and Recall metrics to measure ranges, while simultaneously enabling customization support for domain-specific preferences.
  • Publication
    MLSys: The New Frontier of Machine Learning Systems
    (2019-01-01) Ratner, Alexander; Alistarh, Dan; Alons, Gustavo; Andersen, David G; Bailis, Peter; Bird, Sarah; Carlini, Nicholas; Catanzaro, Bryan; Chayes, Jennifer; Chung, Eric; Dally, Bill; Dean, Jeff; Dhillon, Inderjit S; Dimakis, Alexandros; Dubey, Pradeep; Elkan, Charles; Fursin, Grigori; Ganger, Gregory R; Getoor, Lise; Gibbons, Phillip B; Gibson, Garth A; Gonzalez, Joseph E; Gottschlich, Justin E; Han, Song; Hazelwood, Kim; Huang, Furong; Jaggi, Martin; Jamieson, Kevin; Jordan, Michael I; Joshi, Gauri; Khalaf, Rania; Knight, Jason; Konecny, Jakub; Kraska, Tim; Kumar, Arun; Kyrillidis, Anastasios; Lakshmiratan, Aparna; Li, Jing; Madden, Samuel; McMahan, H B; Meijer, Erik; Mitliagkas, Ioannis; Monga, Rajat; Murray, Derek; Olukotun, Kunle; Papailiopoulos, Dimitris; Pekhimenko, Gennady; Re, Christopher; Rekatsinas, Theodoros; Rostamizadeh, Afshin; De Sa, Christopher; Sedghi, Hanie; Sen, Siddhartha; Smith, Virginia; Smola, Alex; Song, Dawn; Sparks, Evan; Stoica, Ion; Sze, Vivienne; Udell, Madeleine; Vanschoren, Joaquin; Venkataraman, Shivaram; Vinayak, Rashmi; Weimer, Markus; Wilson, Andrew G; Xing, Eric; Zaharia, Matei; Zhang, Ce; Talwalkar, Ameet
    Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.
  • Publication
    Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection
    (2018-01-01) Lee, Tae J; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, Stan
    This short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection.
  • Publication
    An Abstraction-Based Framework for Neural Network Verification
    (2020-01-01) Elboher, Yizhak Y; Gottschlich, Justin E; Katz, Guy
    Deep neural networks are increasingly being used as controllers for safety-critical systems. Because neural networks are opaque, certifying their correctness is a significant challenge. To address this issue, several neural network verification approaches have recently been proposed. However, these approaches afford limited scalability, and applying them to large networks can be challenging. In this paper, we propose a framework that can enhance neural network verification techniques by using over-approximation to reduce the size of the network—thus making it more amenable to verification. We perform the approximation such that if the property holds for the smaller (abstract) network, it holds for the original as well. The over-approximation may be too coarse, in which case the underlying verification tool might return a spurious counterexample. Under such conditions, we perform counterexample-guided refinement to adjust the approximation, and then repeat the process. Our approach is orthogonal to, and can be integrated with, many existing verification techniques. For evaluation purposes, we integrate it with the recently proposed Marabou framework, and observe a significant improvement in Marabou’s performance. Our experiments demonstrate the great potential of our approach for verifying larger neural networks.
  • Publication
    Precision and Recall for Range-Based Anomaly Detection
    (2018-01-01) Lee, Tae J; Gottschlich, Justin E; Tatbul, Nesime; Metcalf, Eric; Zdonik, Stan
    Classical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time.