Programming Constructs for Unstructured Data

Thumbnail Image
Penn collection
IRCS Technical Reports Series
Degree type
Grant number
Copyright date
Related resources

We investigate languages for querying and transforming unstructured data, by which we mean languages than can be used without knowledge of the structure (schema) of the database. Such data can be represented using labeled trees, as suggested by ACeDB (A C. elegans Database), a database system popular with biologists, and more recently in Tsimmis, a system developed at Stanford for heterogeneous data integration. The approach we take is to extend structural recursion to labeled trees. This poses some interesting problems: first, it is no longer flat’’ structural recursion, so that the usual syntactic forms and optimizations for collection types such as lists bags and sets may not be relevant. Second, we shall want to examine the possibility that the values we are manipulating may be cyclic. It is common in ACeDB, and generally in object-oriented databases, for objects to refer to each other, allowing the possibility of arbitrarily deep’’ queries. Of course, such cyclic structures are usually constructed through the use of a reference/pointer type; however query languages are insensitive to these object identities and perform automatic dereferencing. We therefore want to understand what programs are well defined when we are allowed to make unbounded searches in the database.

Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Publication date
Volume number
Issue number
Publisher DOI
Journal Issue
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-06.
Recommended citation