Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

December 1971


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-72-20.


Currently, the structure of stored data is determined implicitly by the software which accesses and processes it. This data structuring technology has given rise to two outstanding problems in data processing. First, there is the communication of the exact structure of data to users and machines, and secondly, the interchange of the data itself.

This work contributed to overcoming these problems by developing a technique for describing the structure of data explicitly and independently of machines and software. This aim is reflected in the following objectives:

1) To understand data structures by developing a model which not only characterizes current data organizational techniques, but also provides a framework within which new data structures can be defined.

2) To use this model to develop a language which can explicitly describe the organization of data.

3) To use this model to study how data can be converted from one structure to another, with a view towards developing a method for describing data conversions.

This model unifies the diverse area of data structures by including the record, file and storage organizations of data. Furthermore, the model clearly separates at each level the conceptual part, which is the logical structure imposed by a user, from the implementation part, which is the method by which the logical structure is I encoded as a binary representation. This separation leads to n straightforward mapping of a file onto storage. From an analysis of the state-of-the-art in data organization, it is shown that the model can express not only the data structures of current systems, but also certain useful generalizations which might well be produced by future systems.

The model treats records as hierarchies of data items. These hierarchies are expressed by production systems based on a generalized notion of attribute-value pairs. Files are treated as graphs whose nodes are records. The connections between the nodes are expressed using a powerful production system which generates criteria for determining when any two records are to be linked. The structure of storage is generalized as a hierarchy since this structure is common to all storage media. The mapping of files onto storage is expressed in terms of rules for distributing the records of the file within the slots provided by the storage structure.

The language, called Generalized Data Description Language (GDDL) is a realization of the model, and thus possesses all its capabilities . In particular, the language can describe the implementation of any aspect of a file as being dependent on any other aspect. The language is presented in an appendix in the form of a user's manual.

Data conversion is studied in terms of transforming data in one structure to another, where both structures are expressed in the model. This study shows that to fully specify a conversion the relationship between the components of the two structures must be specified. In certain cases, such as the reorganization of a file, this relationship can be very elaborate. A method is developed for specifying such relationships, and a corresponding capability is built into GDDL. Thus, WDL has the ability not only to fully describe data structures, but also to specify data conversion.



Date Posted: 23 January 2008