Departmental Papers (CIS)

Document Type

Conference Paper

Subject Area

CPS Internet of Things

Date of this Version



Andrew G. West and Insup Lee. What Wikipedia Deletes: Characterizing Dangerous Collaborative Content. In WikiSym '11: Proceedings of the Seventh International Symposium on Wikis and Open Collaboration, Mountain View, California, USA. October 2011.

© ACM, 2011. This is the author’s version of the work. It is postedhere by permission of ACM for your personal use. Not for redistribution.The definitive version was published in:WikiSym ‘11, October 3–5, 2011, Mountain View, CA, USA. Email


Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone'' -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information).

Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied.


Wikipedia, user generated content, collaboration, redaction, content removal, copyright, information security


Date Posted: 05 October 2011

This document has been peer reviewed.