| Jordan got data from the sequencer today. |
He sits down at the terminal to process it.
|
| Nice! |
| Jordan got data from the sequencer today. |
| He sits down at the terminal to process it. |
| Hmm... what did I do last time? |
| Jordan has an idea. |
| He sits down at the terminal to process script it. |
| Nice! |
| Jordan got data from the sequencer today. |
| He gets the script running... |
| The server crashes. It would be nice if the script could pick up where it left off... |
| Now Jordan has 500 samples for a time series experiment. |
| He starts writing some looping functions to handle cluster submission. |
| This is going to take awhile... |
| In the meantime, Jordan generates other samples requiring slightly different parameters. |
| No problem, I'll just duplicate this script... |
| Stop! There is a better way... |
| No record of the output of the tools | |
| Failed steps do not halt the entire pipeline | |
| Difficult to scale to 500 samples | |
| Two pipelines running simultaneously may interfere | |
| Tracking which version was used with which samples | |
| Memory use is left unmonitored and unchecked | |
| Requires custom parsers to extract results |
shuf -i 1-500000000 -n 10000000 > outfile.txtpm.run("shuf -i 1-500000000 -n 10000000 > outfile.txt")target = os.path.join(outfolder, "outfile.txt") # output file
cmd = "shuf -i 1-500000000 -n 10000000 > " + target
pm.run(command, target)Pipeline will thus pick up where it left off.
reads = count_reads(unaligned_file)
aligned = count_reads(aligned_file)
pm.report_result("aligned_reads", aligned)
pm.report_result("alignment_rate", aligned/reads)aligned_reads 2526232
alignment_rate 0.64234 import pypiper, os
outfolder = "pipeline_output/" # folder for results
pm = pypiper.PipelineManager(name="shuf", outfolder)
target = os.path.join(outfolder, "outfile.txt") # output file
command = "shuf -i 1-500000000 -n 10000000 > " + target
pm.run(command, target)
pm.stop_pipeline()
protocol_mappings:
RNA-seq: rna-seq
pipelines:
rna-seq:
name: RNA-seq_pipeline
path: path/to/rna-seq.py
arguments:
"--option1": sample_attribute
"--option2": sample_attribute2looper run project_config.yamlprotocol_mappings:
RRBS: rrbs
WGBS: wgbs
EG: wgbs.py
SMART-seq: rnaBitSeq -f; rnaTopHat -f
ATAC-SEQ: atacseq
DNase-seq: atacseq
CHIP-SEQ: chipseqpipeline_key:
name: pipeline_name
arguments:
"--option" : value
resources:
default:
file_size: "0"
cores: "2"
mem: "6000"
time: "01:00:00"
large_input:
file_size: "2000"
cores: "4"
mem: "12000"
time: "08:00:00"compute:
slurm:
submission_template: templates/slurm_template.sub
submission_command: sbatch
localhost:
submission_template: templates/localhost_template.sub
submission_command: sh> looper run project_config.yaml --compute localhostlooper check project_config.yamllooper summarize project_config.yaml