Departmental Papers (CIS)

Date of this Version

July 2006

Document Type

Conference Paper


Postprint version. Published in Lecture Notes in Computer Science, Volume 4076, July 2006, pages 264-279.
Publisher URL:


Scientific experiments are becoming increasingly large and complex, with a commensurate increase in the amount and complexity of data generated. Data, both intermediate and final results, is derived by chaining and nesting together multiple database searches and analytical tools. In many cases, the means by which the data are produced is not known, making the data difficult to interpret and the experiment impossible to reproduce. Provenance in scientific workflows is thus of paramount importance.

In this paper, we provide a formal model of provenance for scientific workflows which is general (i.e. can be used with existing workflow systems, such as Kepler, myGrid and Chimera) and sufficiently expressive to answer the provenance queries we encountered in a number of case studies. Interestingly, our model not only takes into account the chained and nested structure of scientific workflows, but allows asks for provenance at different levels of abstraction (user views).


scientific workflows, provenance, database



Date Posted: 09 February 2007

This document has been peer reviewed.