
Departmental Papers (CIS)
Date of this Version
9-1-2011
Document Type
Conference Paper
Recommended Citation
Andrew G. West and Insup Lee, "Towards the Effective Temporal Association Mining of Spam Blacklists", 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference , 73-82. September 2011. http://dx.doi.org/10.1145/2030376.2030385
Abstract
IP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we collected data on roughly *half a billion* listing events. In this paper, that data serves two purposes.
First, we conduct a measurement study on the dynamics of blacklists and email spam at-large. The magnitude/duration of the data enables scrutiny of long-term trends, at scale. Further, these statistics help parameterize our second task: the mining of blacklist history for temporal association rules. That is, we search for IP addresses with correlated histories. Strong correlations would suggest group members are not independent entities and likely share botnet membership.
Unfortunately, we find that statistically significant groupings are rare. This result is reinforced when rules are evaluated in terms of their ability to: (1) identify shared botnet members, using ground-truth from botnet infiltrations and sinkholes, and (2) predict future blacklisting events. In both cases, performance improvements over a control classifier are nominal. This outcome forces us to re-examine the appropriateness of blacklist data for this task, and suggest refinements to our mining model that may allow it to better capture the dynamics by which botnets operate.
Subject Area
CPS Internet of Things
Publication Source
8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference
Start Page
73
Last Page
82
DOI
10.1145/2030376.2030385
Copyright/Permission Statement
© ACM 2011. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference, http://dx.doi.org/10.1145/2030376.2030385.
Keywords
email spam, IP blacklists, measurement study, temporal data mining, association rule learning, negative result
Included in
Numerical Analysis and Scientific Computing Commons, Other Computer Sciences Commons, Theory and Algorithms Commons
Date Posted: 07 September 2011
This document has been peer reviewed.
Comments
8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference, Perth, Australia, September 2011.