
Departmental Papers (CIS)
Document Type
Conference Paper
Date of this Version
July 2006
Abstract
Scientific experiments are becoming increasingly large and complex, with a commensurate increase in the amount and complexity of data generated. Data, both intermediate and final results, is derived by chaining and nesting together multiple database searches and analytical tools. In many cases, the means by which the data are produced is not known, making the data difficult to interpret and the experiment impossible to reproduce. Provenance in scientific workflows is thus of paramount importance.
In this paper, we provide a formal model of provenance for scientific workflows which is general (i.e. can be used with existing workflow systems, such as Kepler, myGrid and Chimera) and sufficiently expressive to answer the provenance queries we encountered in a number of case studies. Interestingly, our model not only takes into account the chained and nested structure of scientific workflows, but allows asks for provenance at different levels of abstraction (user views).
Keywords
scientific workflows, provenance, database
Date Posted: 09 February 2007
This document has been peer reviewed.

Comments
Postprint version. Published in Lecture Notes in Computer Science, Volume 4076, July 2006, pages 264-279.
Publisher URL: http://dx.doi.org/10.1007/11799511