Orchestra: Facilitating Collaborative Data Sharing

One of the most elusive goals of structured data management has been sharing among large, heterogeneous populations: while data integration [4, 10] and exchange [3] are gradually being adopted by corporations or small confederations, little progress has been made in integrating broader communities. Yet the need for large-scale sharing of heterogeneous data is increasing: most of the sciences, particularly biology and astronomy, have become data-driven as they have attempted to tackle larger questions. The field of bioinformatics, in particular, has seen a plethora of different databases emerge: each is focused on a related but subtly different collection of organisms (e.g., CryptoDB, TIGR, FlyNome), genes (GenBank, GeneDB), proteins (UniProt, RCSB Protein Databank), diseases (OMIM, GeneDis), and so on. Such communities have a pressing need to interlink their heterogeneous databases in order to facilitate scientific discovery.

Date of presentation

2007-06-11

Conference name

Database Research Group (CIS)

Conference dates

2023-05-17T00:43:04.000

Comments

Postprint version. Copyright ACM, 2007. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in Proceedings of the 2007 ACM SIGMOD International Conference on the Management of Data (SIGMOD/PODS 2007), June 2007, 4 pages. Publisher URL: http://sigmod07.riit.tsinghua.edu.cn/acceptedPaperForSIGMOD.shtml

Collection

Presentations