Adapting to Source Properties in Processing Data Integration Queries

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Halevy, Alon Y
Weld, Daniel S
Contributor
Abstract

An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources’ properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.

Advisor
Date of presentation
2004-06-01
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-16T22:25:09.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Copyright ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 395-406. Publisher URL: http://doi.acm.org/10.1145/1007568.1007613
Recommended citation
Collection