Software & Data
You can find most of our software spread across several github organizations: databio, refgenie and pepkit. Here is a list of polished software aggregated in one place and sorted by purpose:
Project management and pipeline development
Pypiper is a development-oriented pipeline framework. It is a Python package that helps you write robust pipelines directly in Python, handling mundane tasks like restartability, monitoring for time and memory use, monitoring job status, copious log output, robust error handling, easy debugging tools, and guaranteed file output integrity. Language: Python
A pipeline submitting engine. Looper deploys any command-line pipeline for each sample in a project organized in standard sample metadata format (PEP). You can think of looper as providing a single user interface to running, summarizing, monitoring, and otherwise managing all of your sample-intensive research projects the same way, regardless of data type or pipeline used. Language: Python
A multi-container computing environment manager. A bulker environment consists of an individual container image for each command. Bulker environments are portable, interactive, and independent of any specific workflow. Bulker simplifies both interactive analysis and workflow development by building drop-in replacements to command-line tools that act like native tools, but run in containers. Think of bulker as a lightweight wrapper for docker/singularity to simplify sharing complete, containerized environments. Language: Python
peppy
A Python package that provides an API for handling standardized project and sample metadata. If you define your project in Portable Encapsulated Project (PEP) format, you can use the peppy package to instantiate an in-memory representation of your project and sample metadata. You can then use peppy for interactive analysis, or to develop Python tools so you don't have to handle sample processing. Peppy is useful to tool developers and data analysts who want a standard way of representing sample-intensive research project metadata. Language: Python
Divvy is a computing resource configuration manager. It organizes your computing resources and populates job submission templates. It makes it easy for users to toggle among any computing resource (laptop, cluster, cloud). Divvy provides both an interactive Python API and a command-line interface. Language: Python
Data sharing and API software
Refgenie is full-service reference genome manager that organizes storage, access, and transfer of reference genomes. It provides command-line and Python interfaces to download pre-built reference genome "assets" like indexes used by bioinformatics tools. It can also build assets for custom genome assemblies. Language: Python
Data analysis software
pararead
A Python package that simplifies parallel processing of DNA sequencing reads (BAM or SAM files), by parallelizing across chromosomes. Pararead is built for developers of Python scripts that process data read-by-read. It enables you to quickly and easily parallelize your script. Language: Python
Pipelines
PEPPRO is a pipeline designed to process PRO-seq data. It is optimized on unique features of PRO-seq to be fast and accurate. It performs adapter removal, including UMI of variable length, read deduplication, trimming, mapping, and signal tracks (bigWig) for plus and minus strands using scaled (based on mappability information) or unscaled read count patterns. Language: Python
PEPATAC is an ATAC-seq pipeline. It trims adapters, maps reads, calls peaks, and creates bigwig tracks, TSS enrichment files, and other outputs. It is optimized on unique features of ATAC-seq data to be fast and accurate and provides several unique analytical approaches. Language: Python
dnameth
Pipelines for Whole Genome and Reduced Representation Bisulfite-seq. Language: Python
Web resources and services
- LOLAweb - A server with public hosting of our shiny interface to the LOLA R-package.
- Refgenie reference genome asset server - Implementation of refgenieserver, hosting various genome-related resources.
- Regulatory Elements Database - A database of DNase hypersensitivity data
- Ewing Sarcoma Epigenome Resources - Comprehensive epigenome mapping of Ewing sarcoma.
- Region Databases - Curated databases of region sets for use with LOLA and other tools.
Papers that published raw or processed data
Year | Journal | Title | Data |
---|---|---|---|
2017 | Nature Medicine | DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma | Data site GEO:GSE88826 |
2015 | Cell Reports | Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1 | Data site |
2015 | Cell Reports | Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics | GEO:GSE65196 |
2013 | PLoS Genetics | Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection | GEO:GSE54908 |