Machine programming (MP) is a new field of research that uses automation to improve software development productivity (e.g., the time it takes a developer to write code) and quality (e.g., performance, correctness, security, maintainability, etc.). We generally consider MP as a fusion of machine learning and formal methods, which rely heavily on programming languages and systems.
PublicationMISIM: A Novel Code Similarity System(2020-06-01) Ye, Fangke; Zhou, Shengtian; Venkat, Anand; Marcus, Ryan; Tatbul, Nesime; Tithi, Jesmin J; Hasabnis, Niranjan; Petersen, Paul; Mattson, Timothy; Kraska, Tim; Dubey, Pradeep; Gottschlich, Justin E; Gottschlich, Justin ECode similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems. PublicationMLSys: The New Frontier of Machine Learning Systems(2019-01-01) Ratner, Alexander; Alistarh, Dan; Alons, Gustavo; Andersen, David G; Bailis, Peter; Bird, Sarah; Carlini, Nicholas; Catanzaro, Bryan; Chayes, Jennifer; Chung, Eric; Dally, Bill; Dean, Jeff; Dhillon, Inderjit S; Dimakis, Alexandros; Dubey, Pradeep; Elkan, Charles; Fursin, Grigori; Ganger, Gregory R; Getoor, Lise; Gibbons, Phillip B; Gibson, Garth A; Gottschlich, Justin E; Gottschlich, Justin E; Han, Song; Hazelwood, Kim; Huang, Furong; Jaggi, Martin; Jamieson, Kevin; Jordan, Michael I; Joshi, Gauri; Khalaf, Rania; Knight, Jason; Konecny, Jakub; Kraska, Tim; Kumar, Arun; Kyrillidis, Anastasios; Lakshmiratan, Aparna; Li, Jing; Madden, Samuel; McMahan, H B; Meijer, Erik; Mitliagkas, Ioannis; Monga, Rajat; Murray, Derek; Olukotun, Kunle; Papailiopoulos, Dimitris; Pekhimenko, Gennady; Re, Christopher; Rekatsinas, Theodoros; Rostamizadeh, Afshin; De Sa, Christopher; Sedghi, Hanie; Sen, Siddhartha; Smith, Virginia; Smola, Alex; Song, Dawn; Sparks, Evan; Stoica, Ion; Sze, Vivienne; Udell, Madeleine; Vanschoren, Joaquin; Venkataraman, Shivaram; Vinayak, Rashmi; Weimer, Markus; Wilson, Andrew G; Xing, Eric; Zaharia, Matei; Zhang, Ce; Talwalkar, AmeetMachine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two. PublicationLearned Garbage Collection(2020-01-01) Cen, Lujing; Marcus, Ryan; Gottschlich, Justin E; Gottschlich, Justin E; Alizadeh, Mohammad; Kraska, TimSeveral programming languages use garbage collectors (GCs) to automatically manage memory for the programmer. Such collectors must decide when to look for unreachable objects to free, which can have a large performance impact on some applications. In this preliminary work, we propose a design for a learned garbage collector that autonomously learns over time when to perform collections. By using reinforcement learning, our design can incorporate user-defined reward functions, allowing an autonomous garbage collector to learn to optimize the exact metric the user desires (e.g., request latency or queries per second). We conduct an initial experimental study on a prototype, demonstrating that an approach based on tabular Q learning may be promising.