Data annotations, provenance, and archiving
Abstract
This dissertation examines the problem of data provenance and two main issues related to provenance: Annotation and archiving. The provenance of data is the description of the origins of that piece of data. Our contribution is the distinction between two kinds of provenance: Why-provenance and where-provenance. The why-provenance of a piece of output data is the set of all witnesses to why that piece of data exists in the output. Where-provenance describes which pieces of source data contribute to a piece of output data. We showed that why-provenance and where-provenance can be computed by generating a new query from the original query and applying the new query on the same database. Provenance is related to the view updates. In particular, where-provenance is related to the annotation placement problem, and why-provenance is related to the view deletion problem. When an annotation is placed on a piece of data in the output, we wish to attach the annotation back to the source. The right source to attach the annotation is one that will not unnecessarily spread that annotation to other output data. The annotation placement problem is to find the right source to attach the annotation so that it will spread to the least number of other view data. Our results show that there is a dichotomy in the complexity of the annotation placement problem depending on the type of query that is used to generate the view. The view deletion problem is concerned with finding the right sources to delete in order to delete a piece of view data. Our results also show that there is a dichotomy in the complexity of the view deletion problem depending on the type of query that is used to generate the view. Moreover, computing why-provenance and where-provenance is intractable in general. We have developed a technique for specifying key constraints for hierarchical data that generalizes the way keys are specified in relational databases. (Abstract shortened by UMI.)
Subject Area
Computer science
Recommended Citation
Tan, Wang-Chiew, "Data annotations, provenance, and archiving" (2002). Dissertations available from ProQuest. AAI3073058.
https://repository.upenn.edu/dissertations/AAI3073058