
Database Research Group (CIS)
Document Type
Conference Paper
Date of this Version
June 2006
Abstract
In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others - but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally \emph{consistent} data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users.
In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others. Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance. We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in settings involving the sharing of curated data.
Keywords
databases, data integration, data sharing, peer-to-peer systems, collaborative data sharing, orchestra, reconciliation, transactions, updates
Date Posted: 23 February 2007
This document has been peer reviewed.
Comments
Postprint version. Copyright ACM 2006. This is the author's version of the work. It is posted here by permissino of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 13-24.
Publisher URL: http://doi.acm.org/10.1145/1142473.1142476