Departmental Papers (CIS)

Date of this Version

June 2001

Document Type

Journal Article


Copyright 2001 IEEE. Reprinted from Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, Volume 24, Issue 2, June 2001, pages 20-26.

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

NOTE: At the time of publication, author Zachary Ives was affiliated with the University of Washington. Currently (April 2005), he is a faculty member in the Department of Computer and Information Science at the University of Pennsylvania.


Although XML was originally envisioned as a replacement for HTML on the web, to this point it has instead been used primarily as a format for on-demand interchange of data between applications and enterprises. The web is rather sparsely populated with static XML documents, but nearly every data management application today can export XML data. There is great interest in integrating such exported data across applications and administrative boundaries, and as a result, efficient techniques for integrating XML data across local- and wide-area networks are an important research focus.

In this paper, we provide an overview of the Tukwila data integration system, which is based on the first XML query engine designed specifically for processing network-bound XML data sources. In contrast to previous approaches, which must read, parse, and often store XML data before querying it, the Tukwila XML engine can return query results even as the data is streaming into the system. Tukwila features a new system architecture that extends relational query processing techniques, such as pipelining and adaptive query processing, into the XML realm. We compare the focus of the Tukwila project to that of other XML research systems, and then we present our system architecture and novel query operators, such as the x-scan operator. We conclude with a description of our current research directions in extending XML-based adaptive query processing.



Date Posted: 09 May 2005

This document has been peer reviewed.