Archiving Scientific Data

Loading...
Thumbnail Image
Penn collection
Departmental Papers (CIS)
Degree type
Discipline
Subject
algorithms
documentation
theory
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Buneman, Peter
Tajima, Keishi
Tan, Wang-Chiew
Contributor
Abstract

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.

Advisor
Date of presentation
2002-06-04
Conference name
Departmental Papers (CIS)
Conference dates
2023-05-16T22:24:00.000
Conference location
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Copyright ACM, 2002. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 1-12. Publisher URL: http://doi.acm.org/10.1145/564691.564693 NOTE: At the time of publication, the author Peter Buneman was affiliated with the University of Edinburgh. Currently June 2007 he is a faculty member in the Department of Computer and Information Science at the University of Pennsylvania.
Copyright ACM, 2002. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 1-12. Publisher URL: http://doi.acm.org/10.1145/564691.564693
Recommended citation
Collection