Path Constraints for Databases With or Without Schemas
This dissertation introduces a path constraint language and investigates its associated implication and finite implication problems. This path constraint language has proven useful in a variety of database contexts, ranging from semistructured data as found for instance on the Web, to structured data such as data in object-oriented databases. It is capable of expressing natural integrity constraints that are not only a fundamental part of the semantics of the data, but are also important in query optimization. Path constraint implication is investigated for two models for semistructured data: the semistructured data model and the deterministic data model. Databases in these models are unconstrained by any type system or schema. For the semistructured data model, it is shown that, despite the simple syntax of the constraint language, its associated implication problem is r.e. complete and its finite implication problem is co-r.e. complete. However, in light of these undecidability results, several decidable fragments of the constraint language are identified. These fragments suffice to express many important integrity constraints such as referential integrity, inverse relationships and local database constraints. For the deterministic data model, it is shown that the implication and finite implication problems for the path constraint language are finitely axiomatizable and decidable in cubic-time. Path constraint implication is also studied for structured data, i.e., data constrained by a schema. In the context of three practical object-oriented data models, a number of complexity results on the implication and finite implication problems for the path constraint language are established. In addition, the interaction between path constraints and type systems is investigated. It is demonstrated that adding a type to the data may in some cases simplify the analysis of path constraint implication, and in other cases make it harder. More specifically, it is shown that there is a path constraint implication problem that is decidable in PTIME in the untyped context, but that becomes undecidable when a type system is added. On the other hand, there is an implication problem that is undecidable in the untyped context, but becomes not only decidable in cubic-time but also finitely axiomatizable when a type system is imposed.