Project config file:
Implied columns divide project and sample metadata
All samples with `human` under the `organism` attribute will add a `genome` attribute with value `hg38`, etc.
Lets you define multiple projects in a single file
How is this portable and encapsulated?
Encapsulated: The vision of a project as an extensible object, with samples, configurations, etc. as members of the Project object.
Portable in two senses:
A project should be easily moved from one analysis tool to another
A project can be moved from one computing environment to another
Connects the Gene Expression Omnibus (GEO)
and Sequence Read Archive (SRA)
with PEP format
Run your entire project with one line:
Resources can vary by input file size
Adjust compute package on-the-fly:
Looper only submits jobs for samples not already flagged as running, completed, or failed.
PEP format is a novel approach to standardize projects.
Initial tools like geofetch and looper build PEP projects and connect them to pipelines
Python and R packages provide a universal interface to PEP metadata for tools and analysis