MISIM: A Novel Code Similarity System

Loading...
Thumbnail Image
Penn collection
Machine Programming
Degree type
Discipline
Subject
Computer Science - Machine Learning; Computer Science - Software Engineering; Statistics - Machine Learning
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Ye, Fangke
Zhou, Shengtian
Venkat, Anand
Marcus, Ryan
Tatbul, Nesime
Tithi, Jesmin J
Hasabnis, Niranjan
Petersen, Paul
Mattson, Timothy
Kraska, Tim
Contributor
Abstract

Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.

Advisor
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
2020-06-01
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation
Collection