Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
CPS Internet of Things
Wikipedia
wiki
collaboration
vandalism
machine learning
metadata
natural-language processing
reputation
Other Computer Sciences
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract

Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.

Advisor
Date of presentation
2011-02-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T05:57:08.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
CICLing '11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, Tokyo, Japan, February 20-26, 2011.
Recommended citation
Collection