MISIM: A Novel Code Similarity System

dc.contributor.authorYe, Fangke
dc.contributor.authorZhou, Shengtian
dc.contributor.authorVenkat, Anand
dc.contributor.authorMarcus, Ryan
dc.contributor.authorTatbul, Nesime
dc.contributor.authorTithi, Jesmin J
dc.contributor.authorHasabnis, Niranjan
dc.contributor.authorPetersen, Paul
dc.contributor.authorMattson, Timothy
dc.contributor.authorKraska, Tim
dc.contributor.authorDubey, Pradeep
dc.contributor.authorGottschlich, Justin E
dc.contributor.authorGottschlich, Justin E
dc.date2023-05-18T00:13:48.000
dc.date.accessioned2023-05-22T13:06:43Z
dc.date.available2023-05-22T13:06:43Z
dc.date.issued2020-06-01
dc.date.submitted2020-12-18T10:51:54-08:00
dc.description.abstractCode similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/8484
dc.legacy.articleid1000
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=1000&context=cps_machine_programming&unstamped=1
dc.source.issue1
dc.source.journalMachine Programming
dc.source.statuspublished
dc.subject.otherComputer Science - Machine Learning; Computer Science - Software Engineering; Statistics - Machine Learning
dc.titleMISIM: A Novel Code Similarity System
dc.typeWorking Paper
digcom.contributor.authorYe, Fangke
digcom.contributor.authorZhou, Shengtian
digcom.contributor.authorVenkat, Anand
digcom.contributor.authorMarcus, Ryan
digcom.contributor.authorTatbul, Nesime
digcom.contributor.authorTithi, Jesmin J
digcom.contributor.authorHasabnis, Niranjan
digcom.contributor.authorPetersen, Paul
digcom.contributor.authorMattson, Timothy
digcom.contributor.authorKraska, Tim
digcom.contributor.authorDubey, Pradeep
digcom.contributor.authorSarkar, Vivek
digcom.contributor.authorisAuthorOfPublication|email:gojustin@cis.upenn.edu|institution:Intel|Gottschlich, Justin E
digcom.identifiercps_machine_programming/1
digcom.identifier.contextkey20687254
digcom.identifier.submissionpathcps_machine_programming/1
digcom.typeworkingpaper
dspace.entity.typePublication
relation.isAuthorOfPublication5cbcf403-a558-4c1c-aa8a-d700e3d50679
relation.isAuthorOfPublication5cbcf403-a558-4c1c-aa8a-d700e3d50679
relation.isAuthorOfPublication.latestForDiscovery5cbcf403-a558-4c1c-aa8a-d700e3d50679
upenn.schoolDepartmentCenterMachine Programming
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2006.05265.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format
Collection