Autonomous Link Spam Detection in Purely Collaborative Environments

dc.contributor.authorWest, Andrew G.
dc.contributor.authorAgrawal, Avantika
dc.contributor.authorBaker, Phillip
dc.contributor.authorExline, Brittney
dc.contributor.authorLee, Insup
dc.contributor.authorWest, Andrew G.
dc.contributor.authorAgrawal, Avantika
dc.contributor.authorBaker, Phillip
dc.contributor.authorExline, Brittney
dc.contributor.authorLee, Insup
dc.date2023-05-17T06:34:37.000
dc.date.accessioned2023-05-22T12:48:47Z
dc.date.available2023-05-22T12:48:47Z
dc.date.issued2011-10-05
dc.date.submitted2011-10-03T19:58:57-07:00
dc.description.abstractCollaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.
dc.description.commentsSeventh International Symposium on Wikis and Open Collaboration, Mountain View, California, USA, October 2011.
dc.identifier.urihttps://repository.upenn.edu/handle/20.500.14332/6532
dc.legacy.articleid1517
dc.legacy.fields10.1145/2038558.2038574
dc.legacy.fulltexturlhttps://repository.upenn.edu/cgi/viewcontent.cgi?article=1517&context=cis_papers&unstamped=1
dc.rights© ACM 2011. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in <em>Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym '11)</em>, http://dx.doi.org/10.1145/2038558.2038574.
dc.source.beginpage91
dc.source.endpage100
dc.source.issue477
dc.source.journalDepartmental Papers (CIS)
dc.source.journaltitle7th International Symposium on Wikis and Open Collaboration (WikiSym '11)
dc.source.peerreviewedtrue
dc.source.statuspublished
dc.subject.otherCPS Internet of Things
dc.subject.otherWikipedia
dc.subject.othercollaboration
dc.subject.othercollaborative security
dc.subject.otherinformation security
dc.subject.otherlink spam
dc.subject.otherspam mitigation
dc.subject.otherreputation
dc.subject.otherspatio-temporal features
dc.subject.othermachine-learning
dc.subject.otherintelligent routing
dc.subject.otherDatabases and Information Systems
dc.subject.otherNumerical Analysis and Scientific Computing
dc.subject.otherOther Computer Sciences
dc.subject.otherStatistical Models
dc.titleAutonomous Link Spam Detection in Purely Collaborative Environments
dc.typePresentation
digcom.contributor.authorisAuthorOfPublication|email:westand@cis.upenn.edu|institution:University of Pennsylvania|West, Andrew G.
digcom.contributor.authorisAuthorOfPublication|email:aagrawal@seas.upenn.edu|institution:University of Pennsylvania|Agrawal, Avantika
digcom.contributor.authorisAuthorOfPublication|email:phills@seas.upenn.edu|institution:University of Pennsylvania|Baker, Phillip
digcom.contributor.authorisAuthorOfPublication|email:kexline@seas.upenn.edu|institution:University of Pennsylvania|Exline, Brittney
digcom.contributor.authorisAuthorOfPublication|email:lee@cis.upenn.edu|institution:University of Pennsylvania|Lee, Insup
digcom.identifiercis_papers/477
digcom.identifier.contextkey2272576
digcom.identifier.submissionpathcis_papers/477
digcom.typeconference
dspace.entity.typePublication
relation.isAuthorOfPublication5584daf2-ea60-4404-a98f-6077e4d91d24
relation.isAuthorOfPublication8bca22ec-3947-4f94-b5f6-f39cb320142e
relation.isAuthorOfPublication1b1100a1-5ea8-41f7-a756-551c4b3aa25c
relation.isAuthorOfPublication90230729-a282-4fe9-a86d-9968be2afd41
relation.isAuthorOfPublication45a9eed5-3211-4c36-b40d-6394302dfdce
relation.isAuthorOfPublication.latestForDiscovery5584daf2-ea60-4404-a98f-6077e4d91d24
upenn.schoolDepartmentCenterDepartmental Papers (CIS)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Autonomous_Link_Spam.pdf
Size:
337.95 KB
Format:
Adobe Portable Document Format
Collection