Stoyanovich, Julia
Email Address
ORCID
Disciplines
Search Results
Now showing 1 - 4 of 4
Publication Enabling Privacy in Provenance-Aware Workflow Systems(2011-01-01) Davidson, Susan B; Khanna, Sanjeev; Stoyanovich, Julia; Tannen, Val; Roy, Sudeepa; Chen, Yi; Milo, TovaPublication Keyword Search in Workflow Repositories with Access Control(2011-05-01) Davidson, Susan B; Stoyanovich, Julia; Lee, SoohyunA number of on-line repositories of scientific workflows are emerging. These repositories enable sharing and reuse of workflows, and aid in the design of new workflows. The growing size of these repositories, the complex hierarchical structure of the workflows, and the need to incorporate access control mechanisms make information discovery in these repositories an interesting challenge. This paper formalizes keyword search in repositories of hierarchical workflows. We start by defining what it means for a single hierarchical workflow to match a keyword query while accounting for access control, and discuss options for displaying the resulting matches within the workflow. We extend this to search over workflow repositories, by proposing various ranking semantics that build on techniques from XML search and from information retrieval, and adapting them to our setting.Publication Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance(2011-12-01) Davidson, Susan B; Amsterdamer, Yael; Stoyanovich, Julia; Tannen, Val; Deutch, DanielWorkflow provenance typically assumes that each module is a “black-box”, so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (finegrained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting “what-if” workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.Publication On Provenance and Privacy(2011-03-01) Davidson, Susan B; Khanna, Sanjeev; Stoyanovich, Julia; Tannen, Val; Roy, Sudeepa; Chen, YiProvenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and reproducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows – data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.