MISIM: A Novel Code Similarity System
dc.contributor.author | Ye, Fangke | |
dc.contributor.author | Zhou, Shengtian | |
dc.contributor.author | Venkat, Anand | |
dc.contributor.author | Marcus, Ryan | |
dc.contributor.author | Tatbul, Nesime | |
dc.contributor.author | Tithi, Jesmin J | |
dc.contributor.author | Hasabnis, Niranjan | |
dc.contributor.author | Petersen, Paul | |
dc.contributor.author | Mattson, Timothy | |
dc.contributor.author | Kraska, Tim | |
dc.contributor.author | Dubey, Pradeep | |
dc.contributor.author | Gottschlich, Justin E | |
dc.contributor.author | Gottschlich, Justin E | |
dc.date | 2023-05-18T00:13:48.000 | |
dc.date.accessioned | 2023-05-22T13:06:43Z | |
dc.date.available | 2023-05-22T13:06:43Z | |
dc.date.issued | 2020-06-01 | |
dc.date.submitted | 2020-12-18T10:51:54-08:00 | |
dc.description.abstract | Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems. | |
dc.identifier.uri | https://repository.upenn.edu/handle/20.500.14332/8484 | |
dc.legacy.articleid | 1000 | |
dc.legacy.fulltexturl | https://repository.upenn.edu/cgi/viewcontent.cgi?article=1000&context=cps_machine_programming&unstamped=1 | |
dc.source.issue | 1 | |
dc.source.journal | Machine Programming | |
dc.source.status | published | |
dc.subject.other | Computer Science - Machine Learning; Computer Science - Software Engineering; Statistics - Machine Learning | |
dc.title | MISIM: A Novel Code Similarity System | |
dc.type | Working Paper | |
digcom.contributor.author | Ye, Fangke | |
digcom.contributor.author | Zhou, Shengtian | |
digcom.contributor.author | Venkat, Anand | |
digcom.contributor.author | Marcus, Ryan | |
digcom.contributor.author | Tatbul, Nesime | |
digcom.contributor.author | Tithi, Jesmin J | |
digcom.contributor.author | Hasabnis, Niranjan | |
digcom.contributor.author | Petersen, Paul | |
digcom.contributor.author | Mattson, Timothy | |
digcom.contributor.author | Kraska, Tim | |
digcom.contributor.author | Dubey, Pradeep | |
digcom.contributor.author | Sarkar, Vivek | |
digcom.contributor.author | isAuthorOfPublication|email:gojustin@cis.upenn.edu|institution:Intel|Gottschlich, Justin E | |
digcom.identifier | cps_machine_programming/1 | |
digcom.identifier.contextkey | 20687254 | |
digcom.identifier.submissionpath | cps_machine_programming/1 | |
digcom.type | workingpaper | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 5cbcf403-a558-4c1c-aa8a-d700e3d50679 | |
relation.isAuthorOfPublication | 5cbcf403-a558-4c1c-aa8a-d700e3d50679 | |
relation.isAuthorOfPublication.latestForDiscovery | 5cbcf403-a558-4c1c-aa8a-d700e3d50679 | |
upenn.schoolDepartmentCenter | Machine Programming |
Files
Original bundle
1 - 1 of 1