Towards the Effective Temporal Association Mining of Spam Blacklists

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
CPS Internet of Things
email spam
IP blacklists
measurement study
temporal data mining
association rule learning
negative result
Numerical Analysis and Scientific Computing
Other Computer Sciences
Theory and Algorithms
Funder
Grant number
License
Copyright date
Distributor
Related resources
Contributor
Abstract

IP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we collected data on roughly half a billion listing events. In this paper, that data serves two purposes. First, we conduct a measurement study on the dynamics of blacklists and email spam at-large. The magnitude/duration of the data enables scrutiny of long-term trends, at scale. Further, these statistics help parameterize our second task: the mining of blacklist history for temporal association rules. That is, we search for IP addresses with correlated histories. Strong correlations would suggest group members are not independent entities and likely share botnet membership. Unfortunately, we find that statistically significant groupings are rare. This result is reinforced when rules are evaluated in terms of their ability to: (1) identify shared botnet members, using ground-truth from botnet infiltrations and sinkholes, and (2) predict future blacklisting events. In both cases, performance improvements over a control classifier are nominal. This outcome forces us to re-examine the appropriateness of blacklist data for this task, and suggest refinements to our mining model that may allow it to better capture the dynamics by which botnets operate.

Advisor
Date of presentation
2011-09-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-17T06:31:02.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
8th Annual Collaboration, Electronic Messaging, Anti-Abuse, and Spam Conference, Perth, Australia, September 2011.
Recommended citation
Collection