Quantifying reproducibility and effort saved by reusing workflows

From WorkflowSharing
Revision as of 19:25, 9 December 2013 by Gil (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Our goal was to quantify the effort required to reproduce a scientific article of interest.

We selected an article that had been proposed at the [https://sites.google.com/site/beyondthepdf/ "Beyond the PDF" workshop" which was attended by many researchers interested in scholarly communication and changing how scientific articles are published.

The article is openly available from PLOS, a preprint is also available, as are the original datasets and software in the project web site).

The article describes a computational pipeline that accesses data from the Protein Data Base (PDB) and carries out a systematic analysis of the proteome of Mycobacterium tuberculosis (TB) against all approved drugs. The process uncovers protein receptors in the organism that could be targeted by drugs currently in use for other purposes. The result is a drug-target network (a “drugome”) that includes all known approved drugs. Although the article focuses on a particular organism (TB), the method itself can be used for other pathogens or pathways and has the potential to be a key resource to develop new more comprehensive treatments for other diseases of interest.

With a workflow, the method could be reproduced as new drugs become available. It could also be reused to create many drugomes for other organisms. In essence, the paper represents a novel method that takes a comprehensive and systematic approach to drug discovery, moving away from current practice which is neither.

With the help of the authors of the article, we created a workflow that reflects the steps that were described in the original article and run it with data used in the original experiments.

We used the “methods” section that describes conceptually what computations were carried out, which is usual in computational biology. However, we needed clarifications from the authors in order to reproduce the computations. Moreover, we found that some of the software originally used in the experiments is no longer available in the lab, so some of the steps already needed to be done differently. We found that it required approximately 280 hours to reproduce the method described in this paper.

Details about this work, including the workflows, the datasets, and reproduced results are available from this site.

This work is reported in the following publications:

"Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis 
Drugome."   Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, 
and Yolanda Gil.  PLOS ONE, November 2013. Available from the PLOS ONE site
Personal tools