Date of this Version
We investigate languages for querying and transforming unstructured data, by which we mean languages than can be used without knowledge of the structure (schema) of the database. Such data can be represented using labeled trees, as suggested by ACeDB (A C. elegans Database), a database system popular with biologists, and more recently in Tsimmis, a system developed at Stanford for heterogeneous data integration. The approach we take is to extend structural recursion to labeled trees. This poses some interesting problems: first, it is no longer ``flat’’ structural recursion, so that the usual syntactic forms and optimizations for collection types such as lists bags and sets may not be relevant. Second, we shall want to examine the possibility that the values we are manipulating may be cyclic. It is common in ACeDB, and generally in object-oriented databases, for objects to refer to each other, allowing the possibility of arbitrarily ``deep’’ queries. Of course, such cyclic structures are usually constructed through the use of a reference/pointer type; however query languages are insensitive to these object identities and perform automatic dereferencing. We therefore want to understand what programs are well defined when we are allowed to make unbounded searches in the database.
Date Posted: 14 September 2006