Ives, Zachary
Loading...
Email Address
ORCID
Disciplines
Computer Sciences
relationships.isProjectOf
relationships.isOrgUnitOf
Position
Assistant Professor
Introduction
Zachary Ives is an Assistant Professor at the University of Pennsylvania and an Associated Faculty Member of the Penn Center for Bioinformatics. He received his B.S. from Sonoma State University and his PhD from the University of Washington. His research interests include data integration, peer-to-peer models of data sharing, processing and security of heterogeneous sensor streams, and data exchange between autonomous systems. He is a recipient of the NSF CAREER award and a member of the DARPA Computer Science Study Panel.
Research Interests
Databases, data integration, peer-to-peer computing, sensor networks
Collection
43 results
Search Results
Now showing 1 - 10 of 43
Publication Sideways Information Passing for Push-Style Query Processing(2007-11-20) Ives, Zachary G; Taylor, Nicholas EIn many modern data management settings, data is queried from a central node or nodes, but is stored at remote sources. In such a setting it is common to perform "push-style" query processing, using multithreaded pipelined hash joins and bushy query plans to compute parts of the query in parallel; to avoid idling, the CPU can switch between them as delays are encountered. This works well for simple select-project-join queries, but increasingly, Web and integration applications require more complex queries with multiple joins and even nested subqueries. As we demonstrate in this paper, push-style execution of complex queries can be improved substantially via sideways information passing; push-style queries provide many opportunities for information passing that have not been studied in the past literature. We present adaptive information passing, a general runtime decisionmaking technique for reusing intermediate state from one query subresult to prune and reduce computation of other subresults. We develop two alternative schemes for performing adaptive information passing, which we study in several settings under a variety of workloads.Publication Update Exchange With Mappings and Provenance(2007-11-27) Green, Todd J; Karvounarakis, Grigoris; Ives, Zachary G; Tannen, ValWe consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [21]. In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the Orchestra prototype system. This technical report supersedes the version which appeared in VLDB 2007 [17] and corrects certain technical claims regarding the semantics of our system (see errata in Sections [3.1] and [4.1.1]).Publication Interviewing During a Tight Job Market(2002-09-01) Ives, Zachary G; Luo, QiongVarious tips for interviewing for PhD graduates, seeking an academic position in a research university in Asia or North America are discussed. It is suggested that having the dissertation done before interviews gives a large degree of relief on one's mind. It is found that to be practical about job research package and keep a close eye on applications increases the confidence level. It is also observed that the questions during the talk provides opportunity to clarify and strengthen the talk and show this ability during the interview.Publication MOSAIC: Multiple Overlay Selection and Intelligent Composition(2007-10-24) Loo, Boon Thau; Ives, Zachary G; Mao, Yun; Smith, Jonathan MToday, the most effective mechanism for remedying shortcomings of the Internet, or augmenting it with new networking capabilities, is to develop and deploy a new overlay network. This leads to the problem of multiple networking infrastructures, each with independent advantages, and each developed in isolation. A greatly preferable solution is to have a single infrastructure under which new overlays can be developed, deployed, selected, and combined according to application and administrator needs. MOSAIC is an extensible infrastructure that enables not only the specification of new overlay networks, but also dynamic selection and composition of such overlays. MOSAIC provides declarative networking: it uses a unified declarative language (Mozlog) and runtime system to enable specification of new overlay networks, as well as their composition in both the control and data planes. Importantly, it permits dynamic compositions with both existing overlay networks and legacy applications. This paper demonstrates the dynamic selection and composition capabilities of MOSAIC with a variety of declarative overlays: an indirection overlay that supports mobility (i3), a resilient overlay (RON), and a transport-layer proxy. Using a remarkably concise specification, MOSAIC provides the benefits of runtime composition to simultaneously deliver application-aware mobility, NAT traversal and reliability with low performance overhead, demonstrated with deployment and measurement on both a local cluster and the PlanetLab testbed.Publication SmartCIS: Integrating Digital and Physical Environments(2010-01-01) Liu, Mengmeng; Mihaylov, Svilen; Ives, Zachary G; Bao, Zhuowei; Loo, Boon Thau; Jacob, Marie; Guha, SudiptoPublication Dynamic Join Optimization in Multi-Hop Wireless Sensor Networks(2010-01-01) Mihaylov, Svilen; Ives, Zachary G; Jacob, Marie; Guha, SudiptoTo enable smart environments and self-tuning data centers, we are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for limited-bandwidth, possibly battery-powered devices with multiple hop wireless radio communications. This query processor is given a portion of a data integration query, possibly including joins among sensors, to execute. Several recent papers have developed techniques for computing joins in sensors, but these techniques are static and are only appropriate for specific join selectivity ratios. We consider the problem of dynamic join optimization for sensor networks, developing solutions that employ cost modeling, as well as adaptive learning and self-tuning heuristics to choose the best algorithm under real and variable selectivity values. We focus on in-network join computation, but our architecture extends to other approaches (and we compare against these). We develop basic techniques assuming selectivities are uniform and known in advance, and optimization can be done on a pairwise basis; we then extend the work to handle joins between multiple pairs, when selectivities are not fully known. We experimentally validate our work at scale using standard datasets.Publication A Substrate for In-Network Sensor Data Integration(2008-08-24) Mihaylov, Svilen; Jacob, Marie; Ives, Zachary G; Guha, SudiptoWith the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.Publication Reliable Storage and Querying for Collaborative Data Sharing Systems(2010-03-01) Taylor, Nicolas; Ives, Zachary GThe sciences, business confederations, and medicine urgently need infrastructure for sharing data and updates among collaborators’ constantly changing, heterogeneous databases. The ORCHESTRA system addresses these needs by providing data transformation and exchange capabilities across DBMSs, combined with archived storage of all database versions. ORCHESTRA adopts a peer-to-peer architecture in which individual collaborators contribute data and compute resources, but where there may be no dedicated server or compute cluster. We study how to take the combined resources of ORCHESTRA’s autonomous nodes, as well as PCs from “cloud” services such as Amazon EC2, and provide reliable, cooperative storage and query processing capabilities. We guarantee reliability and correctness as in distributed or cloud DBMSs, while also supporting cross-domain deployments, replication, and transparent failover, as provided by peer-to-peer systems. Our storage and query subsystem supports dozens to hundreds of nodes across different domains, possibly including nodes on cloud services. Our contributions include (1) a modified data partitioning substrate that combines cluster and peer-to-peer techniques, (2) an efficient implementation of replicated, reliable, versioned storage of relational data, (3) new query processing and indexing techniques over this storage layer, and (4) a mechanism for incrementally recomputing query results that ensures correct, complete, and duplicate-free results in the event of node failure during query execution. We experimentally validate query processing performance, failure detection methods, and the performance benefits of incremental recovery in a prototype implementation.Publication Sensor Network Security: More Interesting Than You Think(2006-07-31) Anand, Madhukar; Cronin, Eric; Sherr, Micah; Blaze, Matthew A; Ives, Zachary G; Lee, InsupWith the advent of low-power wireless sensor networks, a wealth of new applications at the interface of the real and digital worlds is emerging. A distributed computing platform that can measure properties of the real world, formulate intelligent inferences, and instrument responses, requires strong foundations in distributed computing, artificial intelligence, databases, control theory, and security. Before these intelligent systems can be deployed in critical infrastructures such as emergency rooms and powerplants, the security properties of sensors must be fully understood. Existing wisdom has been to apply the traditional security models and techniques to sensor networks. However, sensor networks are not traditional computing devices, and as a result, existing security models and methods are ill suited. In this position paper, we take the first steps towards producing a comprehensive security model that is tailored for sensor networks. Incorporating work from Internet security, ubiquitous computing, and distributed systems, we outline security properties that must be considered when designing a secure sensor network. We propose challenges for sensor networks – security obstacles that, when overcome, will move us closer to decreasing the divide between computers and the physical world.Publication Orchestra: Facilitating Collaborative Data Sharing(2007-06-11) Green, Todd J; Karvounarakis, Grigoris; Taylor, Nicholas E; Biton, Olivier; Ives, Zachary G; Tannen, ValOne of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.

