
Departmental Papers (CIS)
Date of this Version
2-2011
Document Type
Conference Paper
Recommended Citation
B. Thomas Adler, Luca de Alfaro, Santiago M. Mola-Velasco, Paolo Rosso, and Andrew G. West, "Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features", Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing 6609, 277-288. February 2011. http://dx.doi.org/10.1007/978-3-642-19437-5_23
Abstract
Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.
Subject Area
CPS Internet of Things
Publication Source
Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing
Volume
6609
Start Page
277
Last Page
288
DOI
10.1007/978-3-642-19437-5_23
Copyright/Permission Statement
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-19437-5_23
Keywords
Wikipedia, wiki, collaboration, vandalism, machine learning, metadata, natural-language processing, reputation
Date Posted: 24 February 2011
This document has been peer reviewed.
Comments
CICLing '11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, Tokyo, Japan, February 20-26, 2011.