Buneman, Peter

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 7 of 7
  • Publication
    Programming Constructs for Unstructured Data
    (1995-03-01) Buneman, Peter; Davidson, Susan B; Suciu, Dan
    We investigate languages for querying and transforming unstructured data, by which we mean languages than can be used without knowledge of the structure (schema) of the database. Such data can be represented using labeled trees, as suggested by ACeDB (A C. elegans Database), a database system popular with biologists, and more recently in Tsimmis, a system developed at Stanford for heterogeneous data integration. The approach we take is to extend structural recursion to labeled trees. This poses some interesting problems: first, it is no longer ``flat’’ structural recursion, so that the usual syntactic forms and optimizations for collection types such as lists bags and sets may not be relevant. Second, we shall want to examine the possibility that the values we are manipulating may be cyclic. It is common in ACeDB, and generally in object-oriented databases, for objects to refer to each other, allowing the possibility of arbitrarily ``deep’’ queries. Of course, such cyclic structures are usually constructed through the use of a reference/pointer type; however query languages are insensitive to these object identities and perform automatic dereferencing. We therefore want to understand what programs are well defined when we are allowed to make unbounded searches in the database.
  • Publication
    Adding Structure to Unstructured Data
    (1996-12-19) Buneman, Peter; Davidson, Susan B; Fernandez, Mary; Suciu, Dan
    We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.
  • Publication
    Adding Structure to Unstructured Data
    (1997-01-08) Buneman, Peter; Davidson, Susan B; Fernandez, Mary; Suciu, Dan
    We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.
  • Publication
    A Data Transformation System for Biological Data Sources
    (1995-09-11) Buneman, Peter; Davidson, Susan B; Hart, Kyle; Overton, Chris; Wong, L.
    Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data.
  • Publication
    Beyond XML Query Languages
    (1998-11-18) Buneman, Peter; Deutsch, Alin; Fan, Wenfei; Liefke, Hartmut; Sahuguet, Arnaud; Tan, Wang-Chiew
    A query language is essential, if XML is to serve effectively as an exchange medium for large data sets. The design of query languages for XML is in its infancy, and the choice of a standard may be governed more by user acceptance than by any understanding of underlying principles. One would hope that expressive power, performance, and compatibility with other languages will be considered in choosing among alternatives, but it is likely that several contenders will co-exist for some time. It is worth observing that, during the 20-year development of relational query languages, several competing languages were developed; and even today there are several relational query language standards. In spite of this, a great deal of technology was developed that was independent of the surface syntax of a query language. This included technology "below" the language such as efficient execution models and work "above" the level of language - such as techniques for view definition and maintenance, triggers, etc. At Penn we are working on some of these language-independent issues. We include a summary of them here. They include execution and data models to support XML and semistructured query languages; the use of schemas and constraints in optimizing XML query languages; and tools for extracting data form existing sources and presenting it as XML.
  • Publication
    Towards A Query Language for Annotation Graphs
    (2000-07-01) Bird, Steven; Buneman, Peter; Tan, Wang-Chiew
    The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been proposed as a general-purpose representational framework for speech databases. Typical queries on annotation graphs require path expressions similar to those used in semistructured query languages. However, the underlying model is rather different from the customary graph models for semistructured data: the graph is acyclic and unrooted, and both temporal and inclusion relationships are important. We develop a query language and describe optimization techniques for an underlying relational representation.
  • Publication
    Semantics of Database Transformations
    (1998) Davidson, Susan; Buneman, Peter; Kosky, Anthony S
    Database transformations arise in many different settings including database integration, evolution of database systems, and implementing user views and data entry tools. This paper surveys approaches that have been taken to problems in these settings, assesses their strengths and weaknesses, and develops require ments on a formal model for specifying and implementing database transformations. We also consider the problem of insuring the correctness of database transformations. In particular, we demonstrate that the usefulness of correctness conditions such as information preservation is hindered by the interactions of transformations and database constraints, and the limited expressive power of established database constraint languages. We conclude that more general notions of correctness are required, and that there is a need for a uniform formalism for expressing both database transformations and constraints, and reasoning about their interactions, Finally we introduce WOL, a declarative language for specifying and implementing database transformations and constraints. We briefly describe the WOL language and its semantics, and argue that it addresses many of the requirements on a formalism for dealing with general database transformations.