Adapting to Source Properties in Processing Data Integration Queries

Ives, Zachary G; Halevy, Alon Y; Weld, Daniel S

Adapting to Source Properties in Processing Data Integration Queries

Files

d1.pdf (181.4 KB)

Penn collection

Departmental Papers (CIS)

Permalink

https://repository.upenn.edu/handle/20.500.14332/6135

View all metadata

Author

Ives, Zachary G

Halevy, Alon Y

Weld, Daniel S

Abstract

An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources’ properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.

Date of presentation

2004-06-01

Conference name

Departmental Papers (CIS)

Conference dates

2023-05-16T22:25:09.000

Comments

Copyright ACM, 2004. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 395-406. Publisher URL: http://doi.acm.org/10.1145/1007568.1007613

Collection

Presentations