Towards Shared Repositories of Computational Workflows

Scientific computing has entered a new era of scale and sharing with the arrival of cyberinfrastructure for computational experimentation. A key emerging concept is scientific workflows, which provide a declarative representation of scientific applications as complex compositions of software components and the dataflow among them. Workflow systems manage their execution in distributed resources, track provenance of analysis products, and enable rapid reproducibility of results. In current cyberinfrastructure, there are well-understood mechanisms for sharing data, instruments, and computing resources. This is not the case for sharing workflows, though there is an emerging movement for sharing analysis processes in the scientific community.

This project investigated computational mechanisms for sharing workflows as a key missing element of cyberinfrastructure for scientific research. We explored three major research topics. First, we contributed to a community effort to develop standards to publish provenance, including workflow provenance and its connections to data and publications. Second, we investigated the design of shared workflow catalogs. We developed new algorithms for retrieval of workflows returning ranked partial matches. Third, we explored different sharing paradigms for workflow that might be appropriate for scientific communities. Fourth, we investigated workflow reuse and workflow usability using a library of workflows for text analytics. Fifth, we investigated reproducibility through workflow sharing, with special attention to measuring the effort savings when workflows are reused.


We investigated several major research topics:

  1. Reuse through workflow provenance sharing
  2. Design of shared workflow catalogs
  3. Paradigms for workflow sharing
  4. Making Expertise Accessible through Workflows
  5. Reproducibility through workflow sharing
  6. Quantifying reproducibility and effort saved by reusing workflows


  • "Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome." Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, and Yolanda Gil. PLOS ONE, November 2013. Available from the PLOS ONE site
  • "Similarity Assessment and Efficient Retrieval of Semantic Workflows." Ralph Bergmann and Yolanda Gil. Information Systems Journal, Vol. 40, March 2014. Available as a preprint.
  • “Requirements for Provenance on the Web.” Paul Groth, Yolanda Gil, James Cheney, and Simon Miles. International Journal of Digital Curation, Vol 7, No 1, 2012. Available as a preprint.
  • "Making Data Analysis Expertise Broadly Accessible through Workflows". Matheus Hauder, Yolanda Gil, Ricky Sethi, Yan Liu, and Hyunjoon Jo. Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5-8, 2011. Available as a preprint.
  • “A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data.” Daniel Garijo and Yolanda Gil. Proceedings of the Sixth Workshop on Workflows in Support of Large-Scale Science (WORKS'11), held in conjunction with Supercomputing, Seattle, Washington, November 2011. Available as a preprint.
  • "Linked Data for Network Science". Paul Groth and Yolanda Gil. Proceedings of Workshop on Linked Science Data (LISD) of the International Semantic Web Conference, Bonn, Germany, 2011. Available as a preprint.
  • “Retrieval of Semantic Workflows with Knowledge Intensive Similarity Metrics”. Ralph Bergmann and Yolanda Gil. Proceedings of the Nineteenth International Conference on Case Based Reasoning (ICCBR), Greenwich, London, September 2011. Available as a preprint.
  • “The Open Provenance Model Core Specification (v1.1)”. Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. Future Generation Computer Systems, 27(6), 2011. Available as a preprint.
  • "A Social Collaboration Argumentation System for Generating Multi-Faceted Answers in Question and Answer Communities". Ricky Sethi and Yolanda Gil. 2011. To appear in Proceedings of the AAAI Workshop on Computational Models of Natural Argument, San Francisco, CA. Available as a preprint.
  • "LinkedDataLens: Linked Data as a Network of Networks". Paul Groth and Yolanda Gil. Proceedings of the ACM International Conference on Knowledge Capture (K-CAP), Banff, Alberta, Canada, 2011. Available as a preprint.
  • “Provenance Requirements for the Next Version of RDF”. Jun Zhao, Christian Bizer, Yolanda Gil, Paolo Missier, Satya Sahoo. W3C Workshop on RDF Next Steps, Stanford, CA, June 2010. Available as a preprint.
  • “Social Task Networks: Personal and Collaborative Task Formulation and Management in Social Networking Sites”. Yolanda Gil, Paul Groth, and Varun Ratnakar. AAAI Fall Symposium Series on Proactive Assistant Agents, Arlington, VA, November 2010. Available as a preprint.

Points of Contact

Yolanda Gil (PI)


  • Daniel Garijo (PhD student), Polytechnic University of Madrid.
  • Christian Fritz (Post-doctoral student), University of Southern California
  • Denny Vrandecic (Post-doctoral student), University of Southern California


  • Ralph Bergmann, University of Trier (Germany)
  • Phil Bourne, University of California San Diego
  • Pedro Gonzalez, Universidad Complutense de Madrid (Spain)
  • Christopher Mason, Cornell University
  • Joel Saltz, Emory University
  • Paul Groth, Free University of Amsterdam (Netherlands)
  • Luc Moreau, University of Southampton (UK)
  • Simon Miles, Kings College London (UK)


This work was done under the grant Towards Shared Repositories of Computational Workflows, funded by the National Science Foundation with grant number IIS-0948429 from September 2009 to August 2011.

