Efficient XPath query processing on stored and streaming XML data

Yi Chen, University of Pennsylvania


With XML emerging as the de facto standard for data representation format, efficient query evaluation on XML data is a very important research challenge. Different applications require queries to be evaluated using different processing models. In some applications, XML data needs to be stored locally for query processing, for example, scientific data. In other applications, XML data arrives continuously as a stream and requires on-line processing, such as the streams for stock and traffic information. In this thesis we discuss how to efficiently evaluate XPath queries on locally stored XML data and XML streams. We also exploit an encoding format to speed up query processing. For locally stored XML data, a Bi-LAbeling based System (BLAS) is proposed to efficiently process XPath queries by leveraging well-developed relational technology. Compared to existing work on XPath processing using relational databases, BLAS translates an XPath query to a more efficient SQL query that contains fewer selections and joins and requires fewer disk accesses to execute. For streaming XML data, an XPath query engine (TwigM) is presented. We observe that XPath evaluation algorithms on XML streams must record a potentially exponential number of pattern matches. We propose a compact data structure to encode these pattern matches rather than storing them explicitly. We then design a polynomial time algorithm to evaluate queries by probing the data structure in a lazy fashion. An Encoded XML Processing system (EXPedite) is then proposed to further speed up the performance. It consists of a general and effective encoding scheme and efficient parsing and query processing algorithms on encoded XML streams. By leveraging the encoding, processing queries over the encoded data is much more efficient than processing them over the original XML stream. We believe that the “one size fits all” concept in the traditional database model is no longer suitable for various application needs today. This thesis provides a comprehensive solution for XPath query processing and gives guidelines as how to choose the right techniques according to application requirements.

Subject Area

Computer science

Recommended Citation

Chen, Yi, "Efficient XPath query processing on stored and streaming XML data" (2005). Dissertations available from ProQuest. AAI3179714.