Annotating inter-sample epigenetic heterogeneity in cancer using genome coordinate covariation analysis

Nathan Sheffield
www.databio.org/slides

Goal: understand variation among individuals


Supervised differential analysis

Supervised continuous analysis

Unsupervised analysis

Epigenomic data: high-dimensional
and low-interpretable


Dimensionality reduction

Even with known groups
How can we annotate the source of variation?

COCOA Overview

John Lawson

Coordinate Covariation Analysis

  1. Quantify variation into a 'target variable'
    1. Supervised (e.g. clincial variable).
    2. Unsupervised (e.g. PCA)
  2. Annotate target variable with region sets.

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


Covariation informs source of observed variation

1. Choose target variable

What is the variation we'd like to explain?
Supervised target
Unsupervised target

 

2. Quantify correlation with target variable


 

2. Quantify correlation with target variable


 

Permutation tests establish significance


 


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Breast cancer DNA methylation PCA


COCOA results for PC1


ER-related regions have higher loadings on PC1

Raw DNA Methylation in ER binding regions


COCOA results for PC1-4


COCOA results for PC1-4


COCOA meta-region plots for PC1-4


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Breast cancer ATAC-seq PCA


COCOA results for ATAC-seq


ER-related regions have higher loadings on PC1

COCOA results for ATAC-seq


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Kidney cancer DNA methylation (Supervised)


Rank region sets for methylation
that correlates with cancer stage

COCOA results for cancer stage

COCOA results for cancer stage

COCOA results for cancer stage

Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

DNA methylation in EZH2 regions and survival


DNA methylation in EZH2-binding regions most often positively correlated with risk of death.
#### Conclusions: COCOA... - can an interpret continuous regulatory variation - can use any signal that annotates genomic coordinates - can work on supervised and unsupervised data - is available from Bioconductor - depends critically on the database of region sets...

Thank You


Collaborators
Fran Garrett-Bakelman
Stefan Bekiranov
Sheffield lab
John Lawson
Jason Smith
Jianglin Feng
Michal Stolarczyk
Kristyna Kupkova
Aaron Gu
Jose Verdezoto
Tessa Danehy
Funding:


UVA Cancer Center

nsheff · databio.org · nsheffield@virginia.edu