Reuse through workflow provenance sharing
Provenance records provide detailed accounts of workflow execution episodes that facilitate sharing and reuse of workflows as well as their data products. By analyzing provenance records, a scientist can understand the assumptions made by others in their reported results, and could attempt to reproduce those results with reasonable fidelity. Therefore, standard representations of workflow provenance would be very beneficial to promote workflow sharing.
We have collaborated with a group of researchers on developing a provenance model that can be shared across workflow systems. The Open Provenance Model (OPM) is a model of provenance that is designed to meet the following requirements: (1) to allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model; (2) to allow developers to build and share tools that operate on such a provenance model; (3) to define provenance in a precise, technology-agnostic manner; (4) to support a digital representation of provenance for any “thing", whether produced by computer systems or not; (5) to allow multiple levels of description to coexist; and (6) to define a core set of rules that identify the valid inferences that can be made on provenance representation. OPM is the result of a series of Provenance Challenges held as part of the International Provenance and Annotation Workshops, and represents the effort of a broad community of workflow researchers. The core concepts of OPM are Process (actions that are executed), Artifact (any object used and produced by a process), and Agent (entities that control processes). OPM represents the provenance of objects (whether digital or not) by an annotated causality graph, which is a directed acyclic graph, enriched with annotations capturing further information pertaining to execution.
We have also collaborated with the broader provenance community to develop general representations of provenance records. We participated in the World Wide Web Consortium (W3C) Provenance Incubator Group. The W3C is an international standards body for Web Architecture and promotes the establishment of community-driven activities that may lead to standardization efforts. This new Incubator Group on Provenance charted the path to the establishment of possible standardization proposals in this area. The group developed to date more than 30 use cases and derived more than 200 requirements out of the use cases. A joint report about requirements for provenance on the web was made widely available. The group also designed mappings across provenance models and vocabularies, using OPM as the reference model. The Final Report of this W3C Provenance Incubator Group includes details on the use cases, requirements, and provenance vocabulary mappings. It also proposed the creation of a Working Group to develop a provenance standard based on 17 core terms that were found to be common in existing vocabularies and necessary to support a broad range of the use cases collected.
Based on that recommendation, the W3C Provenance Working Group was established in April 2011 to develop a provenance standard for the Web. The group has released several documents, including a Primer document and a Provenance Model document in December 2011. The W3C standardization work is ongoing and could change how trust, licensing, and information integration is done on the Web.
This work is reported in the following publications:
* “The Open Provenance Model Core Specification (v1.1)”. Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. Future Generation Computer Systems, 27(6). Available as a preprint.
* “Provenance Requirements for the Next Version of RDF”. Jun Zhao, Christian Bizer, Yolanda Gil, Paolo Missier, Satya Sahoo. W3C Workshop on RDF Next Steps, Stanford, CA, June 2010. Available as a preprint.
* “Final Report of the W3C Provenance Incubator Group.” Yolanda Gil, James Cheney, Paul Groth, Olaf Hartig, Simon Miles, Luc Moreau, and Paulo Pinheiro da Silva. Available as a W3C Technical Report, 2010.
* “The PROV Data Model and Abstract Syntax Notation.” Luc Moreau, Paolo Missier, Khalid Belhajjame, Stephen Cresswell, Yolanda Gil, Ryan Golden, Paul Groth, Graham Klyne, Jim McCusker, Simon Miles, James Myers, and Satya Sahoo. Published as a W3C Working Draft, December 15, 2011.
* “A Primer for the PROV Provenance Model.” Yolanda Gil, Simon Miles, Khalid Belhajjame, Helena Deus, Daniel Garijo, Graham Klyne, Paolo Missier, Stian Soiland-Reyes, and Stephan Zednik. Published as a W3C Working Draft, 15 December 2011.
* “Requirements for Provenance on the Web.” Paul Groth, Yolanda Gil, James Cheney, and Simon Miles. International Journal of Digital Curation, Vol 7, No 1, 2012. Available as a preprint