XML query reformulation over mixed and redundant storage

Alin Bernard Deutsch, University of Pennsylvania


XML is widely accepted as the standard for data exchange between businesses on the Internet. However, most corporations publish only selected portions of their proprietary business data as XML documents, and even then only virtually, that is by exposing a schema against which queries can be formulated. In order to be answered, such XML queries must be reformulated as queries on the actual proprietary data. Existing XML publishing systems conform to the Global-As-View data integration scenario, in which the correspondence between published (global) and proprietary (local) data is given by expressing the former as a view of the latter. However, an ideal publishing system should enhance query execution by allowing for redundancy in storage which enables multiple reformulations, some potentially cheaper to execute than others. Redundancy requires the complementary, Local-As-View approach to data integration, in which the proprietary data is expressed as a view of the published data. We are led to consider XML publishing systems according to a combined Global-and-Local-As-View approach. Building such a system means facing the following challenges. Existing reformulation algorithms developed for the Global-As-View scenario are said to perform composition-with-views, and they are seemingly unrelated to reformulation algorithms for the Local-As-View scenario, which do rewriting-with-views. Moreover, it turns out that picking the optimal reformulation among the possible candidates requires query minimization. We present MARS, a system implementing a novel reformulation algorithm which achieves the combined effect of rewriting-with-views, composition-with-views and minimization. The algorithm works even when the proprietary storage is a mix of XML documents and relational databases. We prove a completeness theorem which guarantees that under certain conditions, our algorithm will find a minimal reformulation if one exists. Moreover, we study the complexity of the problem and identify conditions when this algorithm achieves best complexity bounds. We report on experiments that show the practicality of the approach.

Subject Area

Computer science

Recommended Citation

Deutsch, Alin Bernard, "XML query reformulation over mixed and redundant storage" (2002). Dissertations available from ProQuest. AAI3072989.