Loo, Boon Thau
Email Address
ORCID
Disciplines
Search Results
Now showing 1 - 10 of 13
Publication Public Health for the Internet φ Towards A New Grand Challenge for Information Management(2007-01-07) Hellerstein, Joseph M; Condie, Tyson; Garofalakis, Minos; Loo, Boon Thau; Maniatis, Petros; Roscoe, Timothy; Taft, Nina ABusiness incentives have brought us within a small factor of achieving the database community's Grand Challenge set out in the Asilomar Report of 1998. This paper makes the case for a new, focused Grand Challenge: Public Health for the Internet. The goal of PHI (or φ) is to enable collectives of hosts on the Internet to jointly monitor and promote network health by sharing information on network conditions in a peer-to-peer fashion. We argue that this will be a positive effort for the research community for a variety of reasons, both in terms of its technical reach and its societal impact. This version of the φ vision is targeted at readers in the database research community, but the effort is clearly multidisciplinary. A more generalist version of this paper will be maintained at http://openphi.net.Publication Declarative Routing: Extensible Routing with Declarative Queries(2005-08-22) Loo, Boon Thau; Hellerstein, Joseph M; Stoica, Ion; Ramakrishnan, RaghuThe Internet's core routing infrastructure, while arguably robust and efficient, has proven to be difficult to evolve to accommodate the needs of new applications. Prior research on this problem has included new hard-coded routing protocols on the one hand, and fully extensible Active Networks on the other. In this paper, we explore a new point in this design space that aims to strike a better balance between the extensibility and robustness of a routing infrastructure. The basic idea of our solution, which we call declarative routing, is to express routing protocols using a database query language. We show that our query language is a natural fit for routing, and can express a variety of well-known routing protocols in a compact and clean fashion. We discuss the security of our proposal in terms of its computational expressive power and language design. Via simulation, and deployment on PlanetLab, we demonstrate that our system imposes no fundamental limits relative to traditional protocols, is amenable to query optimizations, and can sustain long-lived routes under network churn and congestion.Publication Declarative Networking: Language, Execution and Optimization(2006-06-01) Loo, Boon Thau; Condie, Tyson; Garofalakis, Minos; Gay, David E; Hellerstein, Joseph M; Maniatis, Petros; Ramakrishnan, Raghu; Roscoe, Timothy; Stoica, IonThe networking and distributed systems communities have recently explored a variety of new network architectures, both for application-level overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query processing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifiations. Second, we introduce and prove correct relaxed versions of the traditional semi-naïve query evaluation technique, to overcome fundamental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the “"eventual consistency"” of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applications of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed.Publication The Architecture of PIER: an Internet-Scale Query Processor(2005-01-05) Huebsch, Ryan; Chun, Brent; Hellerstein, Joseph M; Loo, Boon Thau; Maniatis, Petros; Roscoe, Timothy; Shenker, Scott; Stoica, Ion; Yumerefendi, Aydan RPublication Complex Queries in DHT-based Peer-to-Peer Networks(2002-03-07) Harren, Matthew; Hellerstein, Joseph M; Huebsch, Ryan; Loo, Boon Thau; Shenker, Scott; Stoica, IonRecently a new generation of P2P systems, offering distributed hash table (DHT) functionality, have been proposed. These systems greatly improve the scalability and exact-match accuracy of P2P systems, but offer only the exact-match query facility. This paper outlines a research agenda for building complex query facilities on top of these DHT-based P2P systems. We describe the issues involved and outline our research plan and current status.Publication Implementing Declarative Overlays(2005-10-01) Loo, Boon Thau; Condie, Tyson; Hellerstein, Joseph M; Maniatis, Petros; Roscoe, Timothy; Stoica, IonOverlay networks are used today in a variety of distributed systems ranging from file-sharing and storage systems to communication infrastructures. However, designing, building and adapting these overlays to the intended application and the target environment is a difficult and time consuming process. To ease the development and the deployment of such overlay networks we have implemented P2, a system that uses a declarative logic language to express overlay networks in a highly compact and reusable form. P2 can express a Naradastyle mesh network in 16 rules, and the Chord structured overlay in only 47 rules. P2 directly parses and executes such specifications using a dataflow architecture to construct and maintain overlay networks. We describe the P2 approach, how our implementation works, and show by experiment its promising trade-off point between specification complexity and performance.Publication Querying the Internet with PIER(2003-09-09) Huebsch, Ryan; Hellerstein, Joseph M; Lanham, Nick; Loo, Boon Thau; Shenker, Scott; Stoica, IonThe database research community prides itself on scalable technologies. Yet database systems traditionally do not excel on one important scalability dimension: the degree of distribution. This limitation has hampered the impact of database technologies on massively distributed systems like the Internet. In this paper, we present the initial design of PIER, a massively distributed query engine based on overlay networks, which is intended to bring database query processing facilities to new, widely distributed environments. We motivate the need for massively distributed queries, and argue for a relaxation of certain traditional database research goals in the pursuit of scalability and widespread adoption. We present simulation results showing PIER gracefully running relational queries across thousands of machines, and show results from the same software base in actual deployment on a large experimental cluster.Publication Enhancing P2P File-Sharing with an Internet-Scale Query Processor(2004-09-01) Loo, Boon Thau; Hellerstein, Joseph M; Huebsch, Ryan; Shenker, Scott; Stoica, IonIn this paper, we address the problem of designing a scalable, accurate query processor for peer-to-peer filesharing and similar distributed keyword search systems. Using a globally-distributed monitoring infrastructure, we perform an extensive study of the Gnutella filesharing network, characterizing its topology, data and query workloads. We observe that Gnutella's query processing approach performs well for popular content, but quite poorly for rare items with few replicas. We then consider an alternate approach based on Distributed Hash Tables (DHTs). We describe our implementation of PIERSearch, a DHT-based system, and propose a hybrid system where Gnutella is used to locate popular items, and PIERSearch for handling rare items. We develop an analytical model of the two approaches, and use it in concert with our Gnutella traces to study the tradeoff between query recall and system overhead of the hybrid system. We evaluate a variety of localized schemes for identifying items that are rare and worth handling via the DHT. Lastly, we show in a live deployment on fifty nodes on two continents that it nicely complements Gnutella in its ability to handle rare items.Publication Querying at Internet Scale(2004-06-01) Chun, Brent; Hellerstein, Joseph M; Huebsch, Ryan; Jeffery, Shawn R; Loo, Boon Thau; Mardanbeigi, Sam; Roscoe, Timothy; Rhea, Sean; Shenker, Scott; Stoica, IonWe are developing a distributed query processor called PIER, which is designed to run on the scale of the entire Internet. PIER utilizes a Distributed Hash Table (DHT) as its communication substrate in order to achieve scalability, reliability, decentralized control, and load balancing. PIER enhances DHTs with declarative and algebraic query interfaces, and underneath those interfaces implements multihop, in-network versions of joins, aggregation, recursion, and query/result dissemination. PIER is currently being used for diverse applications, including network monitoring, keyword-based filesharing search, and network topology mapping. We will demonstrate PIER's functionality by showing system monitoring queries running on PlanetLab, a testbed of over 300 machines distributed across the globe.Publication The Case for a Hybrid P2P Search Infrastructure(2004-02-01) Loo, Boon Thau; Huebsch, Ryan; Stoica, Ion; Hellerstein, Joseph MPopular P2P file-sharing systems like Gnutella and Kazaa use unstructured network designs. These networks typically adopt flooding-based search techniques to locate files. While flooding-based techniques are effective for locating highly replicated items, they are poorly suited for locating rare items. As an alternative, a wide variety of structured P2P networks such as distributed hash tables (DHTs) have been recently proposed. Structured networks can efficiently locate rare items, but they incur significantly higher overheads than unstructured P2P networks for popular files. Through extensive measurements of the Gnutella network from multiple vantage points, we argue for a hybrid search solution, where structured search techniques are used to index and locate rare items, and flooding techniques are used for locating highly replicated content. To illustrate, we present experimental results of a prototype implementation that runs at multiple sites on PlanetLab and participates live on the Gnutella network.