Epigenomic data: high-dimensional and low-interpretable
Dimensionality reduction
Even with known groups
How can we annotate the source of variation?
COCOA Overview
John Lawson
Coordinate Covariation Analysis
Quantify variation into a 'target variable'
Supervised (e.g. clincial variable).
Unsupervised (e.g. PCA)
Annotate target variable with region sets.
What is epigenetic signal covariation?
What is epigenetic signal covariation?
What is epigenetic signal covariation?
What is epigenetic signal covariation?
What is epigenetic signal covariation?
Covariation informs source of observed variation
1. Choose target variable
What is the variation we'd like to explain?
Supervised target
Unsupervised target
2. Quantify correlation with target variable
2. Quantify correlation with target variable
Permutation tests establish significance
Case studies
Breast cancer DNA methylation (Unsupervised) Breast cancer ATAC-seq (Unsupervised) Kidney cancer DNA methylation (Supervised) Pan-cancer EZH2 analysis
Breast cancer DNA methylation PCA
COCOA results for PC1
ER-related regions have higher loadings on PC1
Raw DNA Methylation in ER binding regions
COCOA results for PC1-4
COCOA results for PC1-4
COCOA meta-region plots for PC1-4
Case studies
Breast cancer DNA methylation (Unsupervised) Breast cancer ATAC-seq (Unsupervised) Kidney cancer DNA methylation (Supervised) Pan-cancer EZH2 analysis
Breast cancer ATAC-seq PCA
COCOA results for ATAC-seq
ER-related regions have higher loadings on PC1
COCOA results for ATAC-seq
Case studies
Breast cancer DNA methylation (Unsupervised) Breast cancer ATAC-seq (Unsupervised) Kidney cancer DNA methylation (Supervised) Pan-cancer EZH2 analysis
Kidney cancer DNA methylation (Supervised)
Rank region sets for methylation that correlates with cancer stage
COCOA results for cancer stage
COCOA results for cancer stage
COCOA results for cancer stage
Case studies
Breast cancer DNA methylation (Unsupervised) Breast cancer ATAC-seq (Unsupervised) Kidney cancer DNA methylation (Supervised) Pan-cancer EZH2 analysis
DNA methylation in EZH2 regions and survival
DNA methylation in EZH2-binding regions most often positively correlated with risk of death.
#### Conclusions: COCOA...
- can an interpret continuous regulatory variation
- can use any signal that annotates genomic coordinates
- can work on supervised and unsupervised data
- is available from Bioconductor
- depends critically on the database of region sets...
Thank You
Collaborators Vince Reuter
Andre Rendeiro
Levi Waldron
Sheffield lab Erfaneh Gharavi
Michal Stolarczyk
John Lawson
Jason Smith
Kristyna Kupkova
John Stubbs
Bingjie Xue
Jose Verdezoto
Nathan LeRoy
Oleksandr Khoroshevskyi