Autonomous Link Spam Detection in Purely Collaborative Environments

Loading...
Thumbnail Image

Related Collections

Degree type

Discipline

Subject

CPS Internet of Things
Wikipedia
collaboration
collaborative security
information security
link spam
spam mitigation
reputation
spatio-temporal features
machine-learning
intelligent routing
Databases and Information Systems
Numerical Analysis and Scientific Computing
Other Computer Sciences
Statistical Models

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.

Advisor

Date of presentation

2011-10-05

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-17T06:34:37.000

Conference location

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

Journal Issues

Comments

Seventh International Symposium on Wikis and Open Collaboration, Mountain View, California, USA, October 2011.

Recommended citation

Collection