GenomicDistributions: Calculate and plot distributions of genomic ranges

Nathan Sheffield, PhD
www.databio.org/slides

GenomicDistributions

## Vision: Modular plotting functions ### Integrated: ``` myData = GRanges(...) package::distPlot(myData) ``` ### Modular: ``` myData = GRanges(...) myDistCalc = GenomicDistributions::calcDist(myData) GenomicDistributions::plotDist(myDistCalc) ``` Advantage: You can easily manipulate the data and plot separately.
## Implementation - *calculate* functions, named `calc____()` * *plot* functions, named `plot____()`
## Get some data `GenomicDistributions` functions use a `GRanges` or `GRangesList` object. ```R library("GenomicDistributions") queryFile = system.file("extdata", "vistaEnhancers.bed.gz", package="GenomicDistributions") query = rtracklayer::import(queryFile) ```
## Chromosome distribution plots ``` x = calcChromBinsRef(query, "hg19") # calculate plotChromBins(x) # plot ```
![](/images/presentations/genomic_distributions/chromBins1.png)
## Plot multiple with a GRangesList ``` query2 = GenomicRanges::shift(query, 1e6) queryList = GRangesList(vistaEnhancers=query, shifted=query2) x2 = calcChromBinsRef(queryList, "hg19") plotChromBins(x2) ```
![](/images/presentations/genomic_distributions/chromBins2.png)
## Feature distance distribution plots ``` TSSDist = calcFeatureDistRefTSS(query, "hg19") # calculate plotFeatureDist(TSSDist, featureName="TSS") # plot ```
![](/images/presentations/genomic_distributions/featureDist1.png)
## Feature distance distribution plots ``` TSSdist2 = calcFeatureDistRefTSS(queryList, "hg19") plotFeatureDist(TSSdist2) ```
![](/images/presentations/genomic_distributions/featureDist2.png)
## Partition distribution plots ``` gp = calcPartitionsRef(query, "hg19") plotPartitions(gp) ```
![](/images/presentations/genomic_distributions/partitions.png)
## Using custom features The `calc___` functions used a string to indicate a reference: ``` gpart = calcPartitionsRef(query, "hg19") cbins = calcChromBinsRef(query, "hg19") fdist = calcFeatureDistRefTSS(query, "hg19") ``` What if we want to plot relative to custom features? ``` gpart = calcPartitions(query, *partitions*) cbins = calcChromBins(query, *bins*) fdist = calcFeatureDist(query, *features*) ```
## Feature distance distribution plots ``` featureExample = GenomicRanges::shift(query, round(rnorm(length(query), 0,1000))) fdd = calcFeatureDist(query, featureExample) plotFeatureDist(fdd) ```
![](/images/presentations/genomic_distributions/featureDist3.png)
## GenomicDistributions conclusion - One package with lots of genomic distribution plots - Built-in features or custom features for every function - Separates calculation from plotting for modular control - Clean and consistent R API - Returns customizable ggplot objects - Faster calculations using `data.table`