West, Andrew G.
Now showing 1 - 10 of 18
PublicationTowards the Effective Temporal Association Mining of Spam Blacklists(2011-09-01) West, Andrew G.; Lee, Insup; West, Andrew G.; Lee, InsupIP blacklists are a well-regarded anti-spam mechanism that capture global spamming patterns. These properties make such lists a practical ground-truth by which to study email spam behaviors. Observing one blacklist for nearly a year-and-a-half, we collected data on roughly *half a billion* listing events. In this paper, that data serves two purposes. First, we conduct a measurement study on the dynamics of blacklists and email spam at-large. The magnitude/duration of the data enables scrutiny of long-term trends, at scale. Further, these statistics help parameterize our second task: the mining of blacklist history for temporal association rules. That is, we search for IP addresses with correlated histories. Strong correlations would suggest group members are not independent entities and likely share botnet membership. Unfortunately, we find that statistically significant groupings are rare. This result is reinforced when rules are evaluated in terms of their ability to: (1) identify shared botnet members, using ground-truth from botnet infiltrations and sinkholes, and (2) predict future blacklisting events. In both cases, performance improvements over a control classifier are nominal. This outcome forces us to re-examine the appropriateness of blacklist data for this task, and suggest refinements to our mining model that may allow it to better capture the dynamics by which botnets operate. PublicationQuanTM: A Quantitative Trust Management System(2009-03-01) West, Andrew G; Aviv, Adam J; Chang, Jian; Prabhu, Vinayak S; Blaze, Matthew A; Kannan, Sampath; Lee, Insup; Smith, Jonathan M; Sokolsky, Oleg; West, Andrew G; Aviv, Adam J; Chang, Jian; Prabhu, Vinayak S; Blaze, Matthew A; Kannan, Sampath; Lee, Insup; Smith, Jonathan M; Sokolsky, OlegQuantitative Trust Management (QTM) provides a dynamic interpretation of authorization policies for access control decisions based on upon evolving reputations of the entities involved. QuanTM, a QTM system, selectively combines elements from trust management and reputation management to create a novel method for policy evaluation. Trust management, while effective in managing access with delegated credentials (as in PolicyMaker and KeyNote), needs greater flexibility in handling situations of partial trust. Reputation management provides a means to quantify trust, but lacks delegation and policy enforcement. This paper reports on QuanTM’s design decisions and novel policy evaluation procedure. A representation of quantified trust relationships, the trust dependency graph, and a sample QuanTM application specific to the KeyNote trust management language, are also proposed. PublicationTowards Content-Driven Reputation for Collaborative Code Repositories(2012-08-28) West, Andrew G.; Lee, Insup; West, Andrew G.; Lee, InsupAs evidenced by SourceForge and GitHub, code repositories now integrate Web 2.0 functionality that enables global participation with minimal barriers-to-entry. To prevent detrimental contributions enabled by crowdsourcing, reputation is one proposed solution. Fortunately this is an issue that has been addressed in analogous version control systems such as the *wiki* for natural language content. The WikiTrust algorithm ("content-driven reputation"), while developed and evaluated in wiki environments operates under a possibly shared collaborative assumption: actions that "survive" subsequent edits are reflective of good authorship. In this paper we examine WikiTrust's ability to measure author quality in collaborative code development. We first define a mapping from repositories to wiki environments and use it to evaluate a production SVN repository with 92,000 updates. Analysis is particularly attentive to reputation loss events and attempts to establish ground truth using commit comments and bug tracking. A proof-of-concept evaluation suggests the technique is promising (about two-thirds of reputation loss is justified) with false positives identifying areas for future refinement. Equally as important, these false positives exemplify differences in content evolution and the cooperative process between wikis and code repositories. PublicationLink Spamming Wikipedia for Profit(2011-09-01) West, Andrew G.; Chang, Jian; Venkatasubramanian, Krishna; Sokolsky, Oleg; Lee, Insup; West, Andrew G.; Chang, Jian; Venkatasubramanian, Krishna; Sokolsky, Oleg; Lee, InsupCollaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks. Our analysis focuses on the "wiki" model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement. Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies. PublicationCleanURL: A Privacy Aware Link Shortener(2012-01-01) Kim, Daniel; Su, Kevin; West, Andrew G.; Aviv, Adam; Kim, Daniel; Su, Kevin; West, Andrew G.; Aviv, AdamWhen URLs containing application parameters are posted in public settings privacy can be compromised if the those arguments contain personal or tracking data. To this end we describe a privacy aware link shortening service that attempt to strip sensitive and non-essential parameters based on difference algorithms and human feedback. Our implementation, CleanURL, allows users to validate our automated logic and provides them with intuition about how these otherwise opaque arguments function. Finally, we apply CleanURL over a large Twitter URL corpus to measure the prevalence of such privacy leaks and further motivate our tool. PublicationAutonomous Link Spam Detection in Purely Collaborative Environments(2011-10-05) West, Andrew G.; Agrawal, Avantika; Baker, Phillip; Exline, Brittney; Lee, Insup; West, Andrew G.; Agrawal, Avantika; Baker, Phillip; Exline, Brittney; Lee, InsupCollaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis. Recent research has exposed vulnerabilities in Wikipedia's link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination). In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using "wiki" metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC=0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed. PublicationSpamming for Science: Active Measurement in Web 2.0 Abuse Research(2012-03-02) West, Andrew G.; Hayati, Pedram; Potdar, Vidyasagar; Lee, Insup; West, Andrew G.; Hayati, Pedram; Potdar, Vidyasagar; Lee, InsupSpam and other electronic abuses have long been a focus of computer security research. However, recent work in the domain has emphasized an economic analysis of these operations in the hope of understanding and disrupting the profit model of attackers. Such studies do not lend themselves to passive measurement techniques. Instead, researchers have become middle-men or active participants in spam behaviors; methodologies that lie at an interesting juncture of legal, ethical, and human subject e.g., IRB) guidelines. In this work two such experiments serve as case studies: One testing a novel link spam model on Wikipedia and another using blackhat software to target blog comments and forums. Discussion concentrates on the experimental design process, especially as in uenced by human-subject policy. Case studies are used to frame related work in the area, and scrutiny reveals the computer science community requires greater consistency in evaluating research of this nature. PublicationWhat Wikipedia Deletes: Characterizing Dangerous Collaborative Content(2011-10-04) West, Andrew G.; Lee, Insup; West, Andrew G.; Lee, InsupCollaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply ``undone'' -- but *deleted* from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information). Herein, we analyze one year of Wikipedia's public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia's approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied. PublicationSpatio-Temporal Analysis of Wikipedia Metadata and the STiki Anti-Vandalism Tool(2010-07-01) West, Andrew G.; Kannan, Sampath; Lee, Insup; West, Andrew G.; Kannan, Sampath; Lee, InsupThe bulk of Wikipedia anti-vandalism tools require natural language processing over the article or diff text. However, our prior work demonstrated the feasibility of using spatio-temporal properties to locate malicious edits. STiki is a real-time, on-Wikipedia tool leveraging this technique. The associated poster reviews STiki's methodology and performance. We find competing anti-vandalism tools inhibit maximal performance. However, the tool proves particularly adept at mitigating long-term embedded vandalism. Further, its robust and language-independent nature make it well-suited for use in less-patrolled Wiki installations. PublicationAS-CRED: Reputation and Alert Service for Inter-Domain Routing(2013-09-01) Venkatasubramanian, Krishna; West, Andrew G; Kannan, Sampath; Lee, Insup; Loo, Boon Thau; Sokolsky, Oleg; Venkatasubramanian, Krishna; West, Andrew G; Kannan, Sampath; Lee, Insup; Loo, Boon Thau; Sokolsky, OlegBeing the backbone routing system of the Internet, the operational aspect of the inter-domain routing is highly complex. Building a trustworthy ecosystem for inter-domain routing requires the proper maintenance of trust relationships among tens of thousands of peer IP domains called Autonomous Systems (ASes). ASes today implicitly trust any routing information received from other ASes as part of the Border Gateway Protocol (BGP) updates. Such blind trust is problematic given the dramatic rise in the number of anomalous updates being disseminated, which pose grave security consequences for the inter-domain routing operation. In this paper, we present ASCRED, an AS reputation and alert service that not only detects anomalous BGP updates, but also provides a quantitative view of AS’ tendencies to perpetrate anomalous behavior. AS-CRED focuses on detecting two types of anomalous updates (1)hijacked: updates where ASes announcing a prefix that they do not own; and (2) vacillating: updates that are part of a quick succession of announcements and withdrawals involving a specific prefix, rendering the information practically ineffective for routing. AS-CRED works by analyzing the past updates announced by ASes for the presence of these anomalies. Based on this analysis, it generates AS reputation values that provide an aggregate and quantitative view of the AS’ anomalous behavior history. The reputation values are then used in a tiered alert system for tracking any subsequent anomalous updates observed. Analyzing AS-CRED’s operation with real-world BGP traffic over six months, we demonstrate the effectiveness and improvement of the proposed approach over similar alert systems.