IRCS Technical Reports Series
Thesis or dissertation
Date of this Version
Traditionally, query optimizers assume a direct mapping from the logical entities modeling the data (e.g. relations) and the physical entities storing the data (e.g. indexes), each physical entity corresponding precisely to one logical entity. This assumption is no longer true in non-traditional applications (object-oriented and semi-structured databases, data integration), which often exhibit a mismatch between the logical view and the actual storage of data. In addition, there is an increased amount of redundancy, even at the logical level, that can greatly enhance optimization opportunities, if exploited. To deal with all this, we propose a novel architecture for query optimization, in which physical optimization is leveraged at the level of query rewriting. As a consequence, the other important aspect of query optimization, semantic optimization (that takes advantage of the redundancy at the logical level), can be naturally incorporated. The optimizer can then make global decisions based on both semantic and physical knowledge, leading to plans of higher quality than those obtainable by a traditional two-level approach.
The main idea is to describe the relationship between physical and logical schemas by constraints, with the same syntactic form as the semantic constraints describing the logical schema. Many physical structures such as indexes, materialized views, access support relations, GMAPs, etc. can be captured in this way. The search space for query plans is then defined and enumerated in a novel way: First, the input query is rewritten by chase with constraints into a "universal" plan that integrates all the relevant physical and logical structures. In a second phase (backchase), minimal plans are produced by eliminating, exhaustively, the various combinations of redundancies from the universal plan.
We proved the completeness of the method for "path-conjunctive" queries, views and constraints. This class is expressive enough to handle complex objects and dictionaries (modeling OO classes and index-like structures). It has the same properties regarding containment, chase, constraint implication, rewriting with views, that hold for the conjunctive relational case. Therefore, it is a natural candidate for further theoretical and practical development of query optimization in complex environments.
We have implemented our method and examined how far we can push it in terms of complexity of schemas and queries. We employed our optimization framework in two main sets of experiments. In the first one, we measured the performance of the chase/backchase as a procedure for enumeration of minimal plans. No cost information is required in this case. Since the size of the universal plan can often become large, we developed "stratification" techniques that work by reducing the enumeration problem to several subproblems each with smaller universal plan. This resembles the dynamic programming approach of traditional optimizers. The experimental results demonstrate that the method is practical, i.e feasible and worthwhile. In the second case, we combined the chase/backchase optimization with a cost-based pruning strategy, in order to avoid the enumeration of all minimal plans. The experimental results show a considerable improvement in performance over the first situation. The cost-based version of the chase/backchase optimizer is shown to be practical even when no stratification is possible.
Date Posted: 07 August 2006
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-01-02.