Bioinformatics pipeline frameworks
A bioinformatics pipeline framework, (AKA workflow engine or workflow management system, or pipeline management system) is a system for building pipelines. Here are a list of such frameworks that may be useful for building bioinformatics pipelines.
The Big 4
In my experience, the most common frameworks I see used and cited are these 4:
My philosophy
My group uses a more modular approach that we’ve developed. It differs from the more widespread approach in that we divide a workflow into separate components: sample handling is the responsibility of one tool; the workflow itself (the sequence of commands) is another; and computing environment and dependencies are handled by another. I think this modular approach adds a lot of power and flexibility, improving reuse and interoperability across workflows. If you’re interested in exploring this approach, We use:
- Looper for sample handling and job submission
- Pypiper for actual workflow development
- Bulker for computational environment management
- And a few other related helper tools to glue things together along the way
One nice thing about this is that you can mix and match these tools with other systems. For example, you can use looper
to run pipelines written using anything, even a shell script – you aren’t required to use pypiper
. So, I feel like the system is a bit less opinionated than a monolithic framework. I’m happy to talk further if you’re interested.
Some others
Here are a few other related systems I’ve come across over the years.