2.1. What is the problem?
The current norm, when publishing your research manuscript does, for the most part, not require you to submit a detailed analysis protocol to the journal.
This is true for experimental as well as computational work.
This inhibits anyone (including the original authors) to reproduce the exact analysis steps that were performed using only the information contained within the journal article.
Hence, deriving at the same results as the original study is based on luck, and for the most parts currently impossible.
While experimentalists, generally speaking, take care of recording their analyses in lab notebooks, these are by no means publicly available and often buried somewhere in the lab facility and often lost.
The situation is worse in computational research, where lab notebooks are not the norm, and analyses are often done “as you go”.
This situation seems strange, as on the computational side one would expect that keeping score of the steps involved in an analysis should be easier.
2.2. How to tackle this problem?
In this tutorial we are dealing with the computational side of things.
We will focus on using a set of tools that helps us developing analyses workflows that are reproducible, or at least as close as possible to being reproducible.
The tools here are by no means the only ones and many other are available to help you in the task of keeping score of what you have done.
Our approach for getting a reproducible analysis in place will require:
- Keeping track of the used tools and their versions. (addressed in Tool and package management)
- Keeping track of the commands used to analyse the data, including tool parameters. (addressed in Creating analysis workflows)
- Publishing & versioning the workflow information, as to keep track of when workflows change and what changes occurred. (addressed in Creating analysis workflows)