What Wikipedia Deletes: Characterizing Dangerous Collaborative Content
Penn collection
Degree type
Discipline
Subject
Wikipedia
user generated content
collaboration
redaction
content removal
copyright
information security
Community-Based Research
Library and Information Science
Numerical Analysis and Scientific Computing
Other Computer Sciences
Other Legal Studies
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract
Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone'' -- but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied.