Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

January 2003


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-03-17.


The usual method for storing tables in a relational database is to store each tuple contiguously in secondary storage. A simple alternative is to store the columns contiguously, so that a table is represented as a set of vectors all of the same length. It has been shown that such a representation performs well on queries requiring few columns. This paper reviews the shredding scheme used in XMill, an XML compressor, which represents the document structure by using a set of files, consisting of a file describing the structure, and files describing the character data to be found on designated paths (corresponding to the column data). We consider such a shredding as a storage model –- XML vectorization –- by presenting an indexing scheme and a physical algebra associated with a detailed cost model. We study query processing on the XML vectorization, in particular the XML join queries. XML join queries are often translated into a few relational join operations in the relational-based XML storage systems. The use of columns enables us to develop a fast join algorithm for vectorized XML based on two hashbased join algorithms. The important feature of the join algorithm is that the disk access of the algorithm is mostly sequential and the data not needed are not read from disk. Experimental results demonstrate the effectiveness of the join algorithm for vectorized XML.



Date Posted: 04 August 2005