2. Working reproducible¶

2.1. What is the problem?¶

The current norm, when publishing your research manuscript does, for the most part, not require you to submit a detailed analysis protocol to the journal. This is true for experimental as well as computational work. This inhibits anyone (including the original authors) to reproduce the exact analysis steps that were performed using only the information contained within the journal article. Hence, deriving at the same results as the original study is based on luck, and for the most parts currently impossible.

While experimentalists, generally speaking, take care of recording their analyses in lab notebooks, these are by no means publicly available and often buried somewhere in the lab facility and often lost. The situation is worse in computational research, where lab notebooks are not the norm, and analyses are often done “as you go”. This situation seems strange, as on the computational side one would expect that keeping score of the steps involved in an analysis should be easier.

2.2. How to tackle this problem?¶

In this tutorial we are dealing with the computational side of things. We will focus on using a set of tools that helps us developing analyses workflows that are reproducible, or at least as close as possible to being reproducible. The tools here are by no means the only ones and many other are available to help you in the task of keeping score of what you have done.

Our approach for getting a reproducible analysis in place will require:

Keeping track of the used tools and their versions. (addressed in Tool and package management)
Keeping track of the commands used to analyse the data, including tool parameters. (addressed in Creating analysis workflows)
Publishing & versioning the workflow information, as to keep track of when workflows change and what changes occurred. (addressed in Creating analysis workflows)

2.3. Background reading on reproducibility¶

NIH plans to enhance reproducibility. [COLLINS2014].
A Framework for Improving the Quality of Research in the Biological Sciences. [CASADEVALL2016]
All hail reproducibility in microbiome research. [RAVEL2014]
Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. [GARIJO2013]
A quick guide to organizing computational biology projects. [NOBLE2009]
Investigating reproducibility and tracking provenance. [KANWAL2017]

References

[COLLINS2014]

Collins FS, Tabak LA. NIH plans to enhance reproducibility. Nature. 2014 Jan;505:612-613. doi: 10.1038/505612a

[CASADEVALL2016]

Casadevall A, Ellis LM, Davies EW, McFall-Ngai M, Fang FC. A Framework for Improving the Quality of Research in the Biological Sciences. MBio. 2016 Aug 30;7(4). pii: e01256-16. doi: 10.1128/mBio.01256-16.

[RAVEL2014]

Ravel J, Wommack KE. All hail reproducibility in microbiome research. Microbiome. 2014 Mar 7;2(1):8. doi: 10.1186/2049-2618-2-8.

[GARIJO2013]

Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, Gil Y. Quantifying reproducibility in computational biology: The case of the tuberculosis drugome. PLOS ONE. 2013 Nov;505:612-613. doi: 10.1371/journal.pone.0080278.

[NOBLE2009]

Noble WS. A quick guide to organizing computational biology projects. PLoS Comput Biol. 2009 Jul;5(7):e1000424. doi: 10.1371/journal.pcbi.1000424.

[KANWAL2017]

Kanwal S, Zaib F, Lonie A, Sinnott RO. Investigating reproducibility and tracking provenance – A genomic workflow case study. BMC Bioinformatics, 2017, 18:337, doi: 10.1186/s12859-017-1747-0.