XML Vectorization: A Column-Based XML Storage Model

Choi, Byron; Buneman, Peter

XML Vectorization: A Column-Based XML Storage Model

Files

t13.pdf (441.32 KB)

Penn collection

Technical Reports (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/7380

View all metadata

Author

Choi, Byron

Buneman, Peter

Abstract

The usual method for storing tables in a relational database is to store each tuple contiguously in secondary storage. A simple alternative is to store the columns contiguously, so that a table is represented as a set of vectors all of the same length. It has been shown that such a representation performs well on queries requiring few columns. This paper reviews the shredding scheme used in XMill, an XML compressor, which represents the document structure by using a set of files, consisting of a file describing the structure, and files describing the character data to be found on designated paths (corresponding to the column data). We consider such a shredding as a storage model –- XML vectorization –- by presenting an indexing scheme and a physical algebra associated with a detailed cost model. We study query processing on the XML vectorization, in particular the XML join queries. XML join queries are often translated into a few relational join operations in the relational-based XML storage systems. The use of columns enables us to develop a fast join algorithm for vectorized XML based on two hashbased join algorithms. The important feature of the join algorithm is that the disk access of the algorithm is mostly sequential and the data not needed are not read from disk. Experimental results demonstrate the effectiveness of the join algorithm for vectorized XML.

Publication date

2003-01-01

Comments

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-03-17.
University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-03-17.

Collection

Reports