Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata

dc.contributor.authorWest, Andrew G.
dc.contributor.authorLee, Insup
dc.contributor.authorKannan, Sampath
dc.date2023-05-17T03:39:22.000
dc.date.accessioned2023-05-22T12:48:23Z
dc.date.available2023-05-22T12:48:23Z
dc.date.issued2010-04-13
dc.date.submitted2010-04-19T15:43:53-07:00
dc.description.abstractBlatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language- processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with nonoffending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.
dc.description.commentsEUROSEC '10: Proceedings of the Third European Workshop on System Security. Paris, France. April 13, 2010. (A preliminary version was also published as UPENN-MS-CIS-10-05).
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/6478
dc.legacy.articleid1461
dc.legacy.fields10.1145/1752046.1752050
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=1461&context=cis_papers&unstamped=1
dc.rights© ACM 2010. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the Third European Workshop on System Security (EUROSEC '10), http://dx.doi.org/10.1145/1752046.1752050.
dc.source.beginpage22
dc.source.endpage28
dc.source.issue428
dc.source.journalDepartmental Papers (CIS)
dc.source.journaltitleProceedings of the Third European Workshop on System Security (EUROSEC '10)
dc.source.peerreviewedtrue
dc.source.statuspublished
dc.subject.otherCPS Internet of Things
dc.subject.otherDesign
dc.subject.otherMeasurement
dc.subject.otherPerformance
dc.subject.otherSecurity
dc.subject.otherWikipedia
dc.subject.otherspatio-temporal reputation
dc.subject.othervandalism
dc.subject.othercollaborative software
dc.subject.othercontent-based access control
dc.titleDetecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata
dc.typePresentation
digcom.contributor.authorisAuthorOfPublication|email:westand@cis.upenn.edu|institution:University of Pennsylvania|West, Andrew G.
digcom.contributor.authorisAuthorOfPublication|email:lee@cis.upenn.edu|institution:University of Pennsylvania|Lee, Insup
digcom.contributor.authorisAuthorOfPublication|email:kannan@cis.upenn.edu|institution:University of Pennsylvania|Kannan, Sampath
digcom.identifiercis_papers/428
digcom.identifier.contextkey1280381
digcom.identifier.submissionpathcis_papers/428
digcom.typeconference
dspace.entity.typePublication
relation.isAuthorOfPublication5584daf2-ea60-4404-a98f-6077e4d91d24
relation.isAuthorOfPublication45a9eed5-3211-4c36-b40d-6394302dfdce
relation.isAuthorOfPublicationc3b357d5-3190-4fa6-86e9-6e6621137786
relation.isAuthorOfPublication.latestForDiscovery5584daf2-ea60-4404-a98f-6077e4d91d24
upenn.schoolDepartmentCenterDepartmental Papers (CIS)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
eurosec_10_final.pdf
Size:
234.49 KB
Format:
Adobe Portable Document Format
Collection