Ives, Zachary

Loading...
Profile Picture
Email Address
ORCID
Disciplines
Computer Sciences
Research Projects
Organizational Units
Position
Assistant Professor
Introduction
Zachary Ives is an Assistant Professor at the University of Pennsylvania and an Associated Faculty Member of the Penn Center for Bioinformatics. He received his B.S. from Sonoma State University and his PhD from the University of Washington. His research interests include data integration, peer-to-peer models of data sharing, processing and security of heterogeneous sensor streams, and data exchange between autonomous systems. He is a recipient of the NSF CAREER award and a member of the DARPA Computer Science Study Panel.
Research Interests
Databases, data integration, peer-to-peer computing, sensor networks

Search Results

Now showing 1 - 10 of 43
  • Publication
    Update Exchange With Mappings and Provenance
    (2007-11-27) Green, Todd J; Karvounarakis, Grigoris; Ives, Zachary G; Tannen, Val
    We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [21]. In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the Orchestra prototype system. This technical report supersedes the version which appeared in VLDB 2007 [17] and corrects certain technical claims regarding the semantics of our system (see errata in Sections [3.1] and [4.1.1]).
  • Publication
    Sideways Information Passing for Push-Style Query Processing
    (2007-11-20) Ives, Zachary G; Taylor, Nicholas E
    In many modern data management settings, data is queried from a central node or nodes, but is stored at remote sources. In such a setting it is common to perform "push-style" query processing, using multithreaded pipelined hash joins and bushy query plans to compute parts of the query in parallel; to avoid idling, the CPU can switch between them as delays are encountered. This works well for simple select-project-join queries, but increasingly, Web and integration applications require more complex queries with multiple joins and even nested subqueries. As we demonstrate in this paper, push-style execution of complex queries can be improved substantially via sideways information passing; push-style queries provide many opportunities for information passing that have not been studied in the past literature. We present adaptive information passing, a general runtime decisionmaking technique for reusing intermediate state from one query subresult to prune and reduce computation of other subresults. We develop two alternative schemes for performing adaptive information passing, which we study in several settings under a variety of workloads.
  • Publication
    Recursive Computation of Regions and Connectivity in Networks
    (2008-10-31) Taylor, Nicholas E; Zhou, Wenchao; Ives, Zachary G; Liu, Mengmeng; Loo, Boon Thau
    In recent years, data management has begun to consider situations in which data access is closely tied to network routing and distributed acquisition: sensor networks, in which reachability and contiguous regions are of interest; declarative networking, in which shortest paths and reachability are key; distributed and peer-to-peer stream systems, in which we may monitor for associations among data at the distributed sources (e.g., transitive relationships). In each case, the fundamental operation is to maintain a view over dynamic network state; the view is frequently distributed, recursive and may contain aggregation, e.g., describing transitive connectivity, shortest paths, least costly paths, or region membership. Surprisingly, solutions to this problem are often domain-specific, expensive to compute, and incomplete. In this paper, we recast the problem as one of incremental recursive view maintenance in the presence of distributed streams of updates to tuples: new stream data becomes insert operations and tuple expirations become deletions. We develop a set of techniques that maintain information about tuple derivability—a compact form of data provenance. We complement this with techniques to reduce communication: aggregate selections to prune irrelevant aggregation tuples, provenance-aware operators that can determine when tuples are no longer derivable and remove them from their state, and shipping operators that greatly reduce the tuple and provenance information being propagated while still maintaining correct answers. We validate our work in a distributed setting with sensor and network router queries, showing significant gains in bandwidth consumption without sacrificing performance.
  • Publication
    Integrating Ontologies and Relational Data
    (2007-11-01) Auer, Sören; Ives, Zachary G
    In recent years, an increasing number of scientific and other domains have attempted to standardize their terminology and provide reasoning capabilities through ontologies, in order to facilitate data exchange. This has spurred research into Web-based languages, formalisms, and especially query systems based on ontologies. Yet we argue that DBMS techniques can be extended to provide many of the same capabilities, with benefits in scalability and performance. We present OWLDB, a lightweight and extensible approach for the integration of relational databases and description logic based ontologies. One of the key differences between relational databases and ontologies is the high degree of implicit information contained in ontologies. OWLDB integrates the two schemes by codifying ontologies' implicit information using a set of sound and complete inference rules for SHOIN (the description logic behind OWL ontologies. These inference rules can be translated into queries on a relational DBMS instance, and the query results (representing inferences) can be added back to this database. Subsequently, database applications can make direct use of this inferred, previously implicit knowledge, e.g., in the annotation of biomedical databases. As our experimental comparison to a native description logic reasoner and a triple store shows, OWLDB provides significantly greater scalability and query capabilities, without sacrifcing performance with respect to inference.
  • Publication
    MOSAIC: Multiple Overlay Selection and Intelligent Composition
    (2007-10-24) Loo, Boon Thau; Ives, Zachary G; Mao, Yun; Smith, Jonathan M
    Today, the most effective mechanism for remedying shortcomings of the Internet, or augmenting it with new networking capabilities, is to develop and deploy a new overlay network. This leads to the problem of multiple networking infrastructures, each with independent advantages, and each developed in isolation. A greatly preferable solution is to have a single infrastructure under which new overlays can be developed, deployed, selected, and combined according to application and administrator needs. MOSAIC is an extensible infrastructure that enables not only the specification of new overlay networks, but also dynamic selection and composition of such overlays. MOSAIC provides declarative networking: it uses a unified declarative language (Mozlog) and runtime system to enable specification of new overlay networks, as well as their composition in both the control and data planes. Importantly, it permits dynamic compositions with both existing overlay networks and legacy applications. This paper demonstrates the dynamic selection and composition capabilities of MOSAIC with a variety of declarative overlays: an indirection overlay that supports mobility (i3), a resilient overlay (RON), and a transport-layer proxy. Using a remarkably concise specification, MOSAIC provides the benefits of runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability with low performance overhead, demonstrated with deployment and measurement on both a local cluster and the PlanetLab testbed.
  • Publication
    A Substrate for In-Network Sensor Data Integration
    (2008-08-24) Mihaylov, Svilen; Jacob, Marie; Ives, Zachary G; Guha, Sudipto
    With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.
  • Publication
    MOSAIC: Declarative Platform for Dynamic Overlay Composition
    (2012-05-27) Loo, Boon Thau; Ives, Zachary G; Mao, Yun; Smith, Jonathan M
    Overlay networks create new networking services using nodes that communicate using pre-existing networks. They are often optimized for specific applications and targeted at niche vertical domains, but lack interoperability with which their functionalities can be shared. MOSAIC is a declarative platform for constructing new overlay networks from multiple existing overlays, each possessing a subset of the desired new network’s characteristics. This paper focuses on the design and implementation of MOSAIC: composition and deployment of control and/or data plane functions of different overlay networks, dynamic compositions of overlay networks to meet changing application needs and network conditions, and seamless support for legacy applications. MOSAIC overlays are specified using Mozlog, a new declarative language for expressing overlay properties independently from their particular implementation or underlying network. MOSAIC is validated experimentally using compositions specified in Mozlog in order to create new overlay networks with compositions of their functions: the i3 indirection overlay that supports mobility, the resilient overlay network (RON) overlay for robust routing, and the Chord distributed hash table for scalable lookups. MOSAIC uses runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability. We further demonstrate MOSAIC’s dynamic composition capabilities by Chord switching its underlay from IP to RON at runtime. MOSAIC’s benefits are obtained at a low performance cost, as demonstrated by measurements on both a local cluster environment and the PlanetLab global testbed.
  • Publication
    NetTrails: A Declarative Platform for Maintaining and Querying Provenance in Distributed Systems
    (2011-06-01) Zhuo, Wenchao; Fei, Qiong; Haeberlen, Andreas; Sun, Shengzhi; Ives, Zachary G; Tao, Tao; Loo, Boon Thau; Sherr, Micah
    We demonstrate NetTrails, a declarative platform for maintaining and interactively querying network provenance in a distributed system. Network provenance describes the history and derivations of network state that result from the execution of a distributed protocol. It has broad applicability in the management, diagnosis, and security analysis of networks. Our demonstration shows the use of NetTrails for maintaining and querying network provenance in a variety of distributed settings, ranging from declarative networks to unmodified legacy distributed systems. We conclude our demonstration with a discussion of our ongoing research on enhancing the query language and security guarantees.
  • Publication
    Piazza: Data Management Infrastructure for Semantic Web Applications
    (2003-05-20) Halevy, Alon Y; Ives, Zachary G; Mork, Peter; Tatarinov, Igor
    The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world's data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.
  • Publication
    Sharing Work in Keyword Search Over Databases
    (2011-01-01) Jacobs, Marie; Ives, Zachary G
    An important means of allowing non-expert end-users to pose ad hoc queries — whether over single databases or data integration systems—is through keyword search. Given a set of keywords, the query processor finds matches across different tuples and tables. It computes and executes a set of relational sub-queries whose results are combined to produce the k highest ranking answers. Work on keyword search primarily focuses on single-database, single-query settings: each query is answered in isolation, despite possible overlap between queries posed by different users or at different times; and the number of relevant tables is assumed to be small, meaning that sub-queries can be processed without using cost-based methods to combine work. As we apply keyword search to support ad hoc data integration queries over scientific or other databases on the Web, we must reuse and combine computation. In this paper, we propose an architecture that continuously receives sets of ranked keyword queries, and seeks to reuse work across these queries. We extend multiple query optimization and continuous query techniques, and develop a new query plan scheduling module we call the ATC (based on its analogy to an air traffic controller). The ATC manages the flow of tuples among a multitude of pipelined operators, minimizing the work needed to return the top-k answers for all queries. We also develop techniques to manage the sharing and reuse of state as queries complete and input data streams are exhausted. We show the effectiveness of our techniques in handling queries over real and synthetic data sets.