Database Research Group (CIS)

Document Type

Working Paper

Date of this Version

October 2002


Postprint version. Proceedings of PLANX 2002, October 2002, 10 pages. URL:

At the time of publication, author Todd Green, was affiliated with Xyleme SA. Currently February 2007 he is a student in the Department of Computer and Information Sciences at the University of Pennsylvania.


We describe a toolkit for highly scalable XML data processing, consisting of two components. The first is a collection of stand-alone XML tools, s.a. sort- ing, aggregation, nesting, and unnesting, that can be chained to express more complex restructurings. The second is a highly scalable XPath processor for XML streams that can be used to develop scalable solutions for XML stream applications. In this paper we dis- cuss the tools, and some of the techniques we used to achieve high scalability. The toolkit is freely available as an open-source project.



Date Posted: 23 February 2007

This document has been peer reviewed.