We want to understand variation between individuals.
Differential analysis.
Continuous variation.
Unsupervised analysis.
COCOA focuses on continuous variation and doesn't require known groups.
#### DNA methylation is high-dimensional and low-interpretable

Can we use region sets to biologically annotate the source of that variation?
Big picture steps:
1. Quantify variation with PCA.
2. Annotate PCs with region sets.
#### COCOA workflow

#### PCA for breast cancer data from TCGA

#### Top hits
1. Gata3
2. H3R17me2
3. ER
4. Foxa1
5. AR
- FOXA1 is a key determinant of estrogen receptor function and endocrine response. [@Hurtado2011]
- GATA-3 expression in breast cancer has a strong association with estrogen receptor [@Voduc2008]
- Upon estrogen stimulation, the E2F1 promoter is subject to H3R17me2... [@Frietze2008]
# PC1 does indeed split samples by ER status

#### Raw DNA Methylation in ER binding regions

#### Rank distribution for ER-related regions

ER-related regions have higher loadings on PC1
#### But is the variation specific to the binding sites?

A peak suggests specificity to that genomic locus
#### Conclusions
- COCOA provides a method to understand continuous regulatory variation
- Availability: Bioconductor, http://code.databio.org/COCOA/