Database Research Group (CIS)

Document Type

Conference Paper

Date of this Version

June 2007

Comments

Postprint version. Copyright ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in Proceedings of the 2007 ACM SIGMOD International Conference on the Management of Data (SIGMOD/PODS 2007), June 2007, 4 pages.
Publisher URL: http://sigmod07.riit.tsinghua.edu.cn/acceptedPaperForSIGMOD.shtml

Abstract

One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.

Keywords

Data exchange, data integration, data sharing, reconciliation, schema mappings

Share

COinS
 

Date Posted: 04 June 2007