Jordan got data from the sequencer today. |
He sits down at the terminal to process it.
|
Nice! |
Jordan got data from the sequencer today. |
He sits down at the terminal to process it. |
Hmm... what did I do last time? |
Jordan has an idea. |
He sits down at the terminal to process script it. |
Nice! |
Jordan got data from the sequencer today. |
He gets the script running... |
The server crashes. It would be nice if the script could pick up where it left off... |
Now Jordan has 500 samples for a time series experiment. |
He starts writing some looping functions to handle cluster submission. |
This is going to take awhile... |
In the meantime, Jordan generates other samples requiring slightly different parameters. |
No problem, I'll just duplicate this script... |
Stop! There is a better way... |
No record of the output of the tools | |
Failed steps do not halt the entire pipeline | |
Difficult to scale to 500 samples | |
Two pipelines running simultaneously may interfere | |
Tracking which version was used with which samples | |
Memory use is left unmonitored and unchecked | |
Requires custom parsers to extract results |
shuf -i 1-500000000 -n 10000000 > outfile.txt
pm.run("shuf -i 1-500000000 -n 10000000 > outfile.txt")
target = os.path.join(outfolder, "outfile.txt") # output file
cmd = "shuf -i 1-500000000 -n 10000000 > " + target
pm.run(command, target)
Pipeline will thus pick up where it left off.
reads = count_reads(unaligned_file)
aligned = count_reads(aligned_file)
pm.report_result("aligned_reads", aligned)
pm.report_result("alignment_rate", aligned/reads)
aligned_reads 2526232
alignment_rate 0.64234
import pypiper, os
outfolder = "pipeline_output/" # folder for results
pm = pypiper.PipelineManager(name="shuf", outfolder)
target = os.path.join(outfolder, "outfile.txt") # output file
command = "shuf -i 1-500000000 -n 10000000 > " + target
pm.run(command, target)
pm.stop_pipeline()
protocol_mappings:
RNA-seq: rna-seq
pipelines:
rna-seq:
name: RNA-seq_pipeline
path: path/to/rna-seq.py
arguments:
"--option1": sample_attribute
"--option2": sample_attribute2
looper run project_config.yaml
protocol_mappings:
RRBS: rrbs
WGBS: wgbs
EG: wgbs.py
SMART-seq: rnaBitSeq -f; rnaTopHat -f
ATAC-SEQ: atacseq
DNase-seq: atacseq
CHIP-SEQ: chipseq
pipeline_key:
name: pipeline_name
arguments:
"--option" : value
resources:
default:
file_size: "0"
cores: "2"
mem: "6000"
time: "01:00:00"
large_input:
file_size: "2000"
cores: "4"
mem: "12000"
time: "08:00:00"
compute:
slurm:
submission_template: templates/slurm_template.sub
submission_command: sbatch
localhost:
submission_template: templates/localhost_template.sub
submission_command: sh
> looper run project_config.yaml --compute localhost
looper check project_config.yaml
looper summarize project_config.yaml