Ives, Zachary

Loading...
Profile Picture
Email Address
ORCID
Disciplines
Computer Sciences
Research Projects
Organizational Units
Position
Assistant Professor
Introduction
Zachary Ives is an Assistant Professor at the University of Pennsylvania and an Associated Faculty Member of the Penn Center for Bioinformatics. He received his B.S. from Sonoma State University and his PhD from the University of Washington. His research interests include data integration, peer-to-peer models of data sharing, processing and security of heterogeneous sensor streams, and data exchange between autonomous systems. He is a recipient of the NSF CAREER award and a member of the DARPA Computer Science Study Panel.
Research Interests
Databases, data integration, peer-to-peer computing, sensor networks

Search Results

Now showing 1 - 10 of 43
  • Publication
    SmartCIS: Integrating Digital and Physical Environments
    (2010-01-01) Liu, Mengmeng; Mihaylov, Svilen; Ives, Zachary G; Bao, Zhuowei; Loo, Boon Thau; Jacob, Marie; Guha, Sudipto
  • Publication
    Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks
    (2010-01-01) Mihaylov, Svilen; Ives, Zachary G; Jacob, Marie; Guha, Sudipto
    To enable smart environments and self-tuning data centers, we are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for limited-bandwidth, possibly battery-powered devices with multiple hop wireless radio communications. This query processor is given a portion of a data integration query, possibly including joins among sensors, to execute. Several recent papers have developed techniques for computing joins in sensors, but these techniques are static and are only appropriate for specific join selectivity ratios. We consider the problem of dynamic join optimization for sensor networks, developing solutions that employ cost modeling, as well as adaptive learning and self-tuning heuristics to choose the best algorithm under real and variable selectivity values. We focus on in-network join computation, but our architecture extends to other approaches (and we compare against these). We develop basic techniques assuming selectivities are uniform and known in advance, and optimization can be done on a pairwise basis; we then extend the work to handle joins between multiple pairs, when selectivities are not fully known. We experimentally validate our work at scale using standard datasets.
  • Publication
    Automatically Incorporating New Sources in Keyword Search-Based Data Integration
    (2010-06-01) Talukdar, Partha; Ives, Zachary G; Pereira, Fernando
    Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a user’s view — in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.
  • Publication
    Provenance in ORCHESTRA
    (2010-01-01) Green, Todd J; Ives, Zachary G; Karvounarakis, Grigoris; Tannen, Val
    Sharing structured data today requires agreeing on a standard schema, then mapping and cleaning all of the data to achieve a single queriable mediated instance. However, for settings in which structured data is collaboratively authored by a large community, such as in the sciences, there is seldom con- sensus about how the data should be represented, what is correct, and which sources are authoritative. Moreover, such data is dynamic: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. A key aspect of ORCHESTRA’s design is that the provenance of data is recorded at every step. In this paper we describe ORCHESTRA’s provenance model and architecture, emphasizing its integral use of provenance in enforcing trust policies and translating updates efficiently.
  • Publication
    Update Exchange With Mappings and Provenance
    (2007-11-27) Green, Todd J; Karvounarakis, Grigoris; Ives, Zachary G; Tannen, Val
    We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [21]. In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the Orchestra prototype system. This technical report supersedes the version which appeared in VLDB 2007 [17] and corrects certain technical claims regarding the semantics of our system (see errata in Sections [3.1] and [4.1.1]).
  • Publication
    Reliable Storage and Querying for Collaborative Data Sharing Systems
    (2010-03-01) Taylor, Nicolas; Ives, Zachary G
    The sciences, business confederations, and medicine urgently need infrastructure for sharing data and updates among collaborators’ constantly changing, heterogeneous databases. The ORCHESTRA system addresses these needs by providing data transformation and exchange capabilities across DBMSs, combined with archived storage of all database versions. ORCHESTRA adopts a peer-to-peer architecture in which individual collaborators contribute data and compute resources, but where there may be no dedicated server or compute cluster. We study how to take the combined resources of ORCHESTRA’s autonomous nodes, as well as PCs from “cloud” services such as Amazon EC2, and provide reliable, cooperative storage and query processing capabilities. We guarantee reliability and correctness as in distributed or cloud DBMSs, while also supporting cross-domain deployments, replication, and transparent failover, as provided by peer-to-peer systems. Our storage and query subsystem supports dozens to hundreds of nodes across different domains, possibly including nodes on cloud services. Our contributions include (1) a modified data partitioning substrate that combines cluster and peer-to-peer techniques, (2) an efficient implementation of replicated, reliable, versioned storage of relational data, (3) new query processing and indexing techniques over this storage layer, and (4) a mechanism for incrementally recomputing query results that ensures correct, complete, and duplicate-free results in the event of node failure during query execution. We experimentally validate query processing performance, failure detection methods, and the performance benefits of incremental recovery in a prototype implementation.
  • Publication
    Sensor Network Security: More Interesting Than You Think
    (2006-07-31) Anand, Madhukar; Cronin, Eric; Sherr, Micah; Blaze, Matthew A; Ives, Zachary G; Lee, Insup
    With the advent of low-power wireless sensor networks, a wealth of new applications at the interface of the real and digital worlds is emerging. A distributed computing platform that can measure properties of the real world, formulate intelligent inferences, and instrument responses, requires strong foundations in distributed computing, artificial intelligence, databases, control theory, and security. Before these intelligent systems can be deployed in critical infrastructures such as emergency rooms and powerplants, the security properties of sensors must be fully understood. Existing wisdom has been to apply the traditional security models and techniques to sensor networks. However, sensor networks are not traditional computing devices, and as a result, existing security models and methods are ill suited. In this position paper, we take the first steps towards producing a comprehensive security model that is tailored for sensor networks. Incorporating work from Internet security, ubiquitous computing, and distributed systems, we outline security properties that must be considered when designing a secure sensor network. We propose challenges for sensor networks – security obstacles that, when overcome, will move us closer to decreasing the divide between computers and the physical world.
  • Publication
    Interviewing During a Tight Job Market
    (2002-09-01) Ives, Zachary G; Luo, Qiong
    Various tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one's mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview.
  • Publication
    MOSAIC: Multiple Overlay Selection and Intelligent Composition
    (2007-10-24) Loo, Boon Thau; Ives, Zachary G; Mao, Yun; Smith, Jonathan M
    Today, the most effective mechanism for remedying shortcomings of the Internet, or augmenting it with new networking capabilities, is to develop and deploy a new overlay network. This leads to the problem of multiple networking infrastructures, each with independent advantages, and each developed in isolation. A greatly preferable solution is to have a single infrastructure under which new overlays can be developed, deployed, selected, and combined according to application and administrator needs. MOSAIC is an extensible infrastructure that enables not only the specification of new overlay networks, but also dynamic selection and composition of such overlays. MOSAIC provides declarative networking: it uses a unified declarative language (Mozlog) and runtime system to enable specification of new overlay networks, as well as their composition in both the control and data planes. Importantly, it permits dynamic compositions with both existing overlay networks and legacy applications. This paper demonstrates the dynamic selection and composition capabilities of MOSAIC with a variety of declarative overlays: an indirection overlay that supports mobility (i3), a resilient overlay (RON), and a transport-layer proxy. Using a remarkably concise specification, MOSAIC provides the benefits of runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability with low performance overhead, demonstrated with deployment and measurement on both a local cluster and the PlanetLab testbed.
  • Publication
    Ronciling Differences
    (2011-01-01) Ives, Zachary G; Green, Todd J.; Tannen, Val
    In this paper we study a problem motivated by the management of changes in databases. It turns out that several such change scenarios, e.g., the separately studied problems of view maintenance (propagation of data changes) and view adaptation (propagation of view definition changes) can be unified as instances of query reformulation using views provided that support for the relational difference operator exists in the context of query reformulation. Exact query reformulation using views in positive relational languages is well understood, and has a variety of applications in query optimization and data sharing. Unfortunately, most questions about queries become undecidable in the presence of difference (or negation), whether we use the foundational set semantics or the more practical bag semantics. We present a new way of managing this difficulty by defining a novel semantics, Z- relations, where tuples are annotated with positive or negative integers. Z-relations conveniently represent data, insertions, and deletions in a uniform way, and can apply deletions with the union operator (deletions are tuples with negative counts). We show that under Z-semantics relational algebra (R A) queries have a normal form consisting of a single difference of positive queries, and this leads to the decidability of their equivalence.We provide a sound and complete algorithm for reformulating R A queries, including queries with difference, over Z-relations. Additionally, we show how to support standard view maintenance