We want to understand variation between individuals.
Differential analysis.
Continuous variation.
Unsupervised analysis.
COCOA focuses on continuous variation and doesn't require known groups.
#### DNA methylation is high-dimensional and low-interpretable
![](/images/presentations/cocoa/pca-example.svg)
Can we use region sets to biologically annotate the source of that variation?
Big picture steps:
1. Quantify variation with PCA.
2. Annotate PCs with region sets.
#### COCOA workflow
![](/images/presentations/cocoa/cocoa-workflow.svg)
#### PCA for breast cancer data from TCGA
![](/images/presentations/cocoa/cocoa-brca-pca.svg)
#### Top hits
1. Gata3
2. H3R17me2
3. ER
4. Foxa1
5. AR
- FOXA1 is a key determinant of estrogen receptor function and endocrine response. [@Hurtado2011]
- GATA-3 expression in breast cancer has a strong association with estrogen receptor [@Voduc2008]
- Upon estrogen stimulation, the E2F1 promoter is subject to H3R17me2... [@Frietze2008]
# PC1 does indeed split samples by ER status
![](/images/presentations/cocoa/cocoa-brca-pca-color.svg)
#### Raw DNA Methylation in ER binding regions
![](/images/presentations/cocoa/ESR1_heatmap.svg)
#### Rank distribution for ER-related regions
![](/images/presentations/cocoa/er-hist.svg)
ER-related regions have higher loadings on PC1
#### But is the variation specific to the binding sites?
![](/images/presentations/cocoa/metaregion_ER_raw_657.svg)
A peak suggests specificity to that genomic locus
#### Conclusions
- COCOA provides a method to understand continuous regulatory variation
- Availability: Bioconductor, http://code.databio.org/COCOA/