Coordinate covariation analysis

John Lawson
www.databio.org/slides

We want to understand variation between individuals.


Differential analysis.

Continuous variation.

Unsupervised analysis.

COCOA focuses on continuous variation and doesn't require known groups.
#### DNA methylation is high-dimensional and low-interpretable ![](/images/presentations/cocoa/pca-example.svg) Can we use region sets to biologically annotate the source of that variation?

Big picture steps:

1. Quantify variation with PCA.
2. Annotate PCs with region sets.
#### COCOA workflow ![](/images/presentations/cocoa/cocoa-workflow.svg)
#### PCA for breast cancer data from TCGA
![](/images/presentations/cocoa/cocoa-brca-pca.svg)
#### Top hits 1. Gata3 2. H3R17me2 3. ER 4. Foxa1 5. AR - FOXA1 is a key determinant of estrogen receptor function and endocrine response. [@Hurtado2011] - GATA-3 expression in breast cancer has a strong association with estrogen receptor [@Voduc2008] - Upon estrogen stimulation, the E2F1 promoter is subject to H3R17me2... [@Frietze2008]
# PC1 does indeed split samples by ER status ![](/images/presentations/cocoa/cocoa-brca-pca-color.svg)
#### Raw DNA Methylation in ER binding regions ![](/images/presentations/cocoa/ESR1_heatmap.svg)
#### Rank distribution for ER-related regions ![](/images/presentations/cocoa/er-hist.svg) ER-related regions have higher loadings on PC1
#### But is the variation specific to the binding sites? ![](/images/presentations/cocoa/metaregion_ER_raw_657.svg) A peak suggests specificity to that genomic locus
#### Conclusions - COCOA provides a method to understand continuous regulatory variation - Availability: Bioconductor, http://code.databio.org/COCOA/