Methods in computational epigenetics

Nathan Sheffield, PhD
www.databio.org/slides

Mission statement

We develop and apply computational methods
to organize, analyze, and understand large epigenomic data.


Biological motivation




Cells alter phenotype by using DNA differently.

Breakdowns lead to disease

Full-stack bioinformatics


Full-stack bioinformatics


Full-stack bioinformatics






Full-stack bioinformatics


Augmented Interval List

Full-stack bioinformatics





Full-stack bioinformatics


Analysis of DNA methylation in Ewing sarcoma

Full-stack bioinformatics




Locus Overlap Analysis

Sheffield and Bock (2016). Bioinformatics.
Nagraj, Magee, and Sheffield (2018). Nucleic Acids Research.

Methylation-based Inference of Regulatory Activity (MIRA)

Lawson et al. (2018). Bioinformatics.

Sheffield et al. (2017). Nature Medicine.

Coordinate Covariation Analysis (COCOA)


Lawson et al. (2020). Genome Biology.

Goal: understand variation among individuals


Supervised differential analysis

Supervised continuous analysis

Unsupervised analysis

Epigenomic data: high-dimensional
and low-interpretable


Dimensionality reduction

Even with known groups
How can we annotate the source of variation?

COCOA Overview

John Lawson

Coordinate Covariation Analysis

  1. Quantify variation into a 'target variable'
    1. Supervised (e.g. clincial variable).
    2. Unsupervised (e.g. PCA)
  2. Annotate target variable with region sets.

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


 

What is epigenetic signal covariation?


Covariation informs source of observed variation

1. Choose target variable

What is the variation we'd like to explain?
Supervised target
Unsupervised target

 

2. Quantify correlation with target variable


 

2. Quantify correlation with target variable


 

Permutation tests establish significance


 


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Breast cancer DNA methylation PCA


COCOA results for PC1


ER-related regions have higher loadings on PC1

Raw DNA Methylation in ER binding regions


COCOA results for PC1-4


COCOA results for PC1-4


COCOA meta-region plots for PC1-4


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Breast cancer ATAC-seq PCA


COCOA results for ATAC-seq


ER-related regions have higher loadings on PC1

COCOA results for ATAC-seq


Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

Kidney cancer DNA methylation (Supervised)


Rank region sets for methylation
that correlates with cancer stage

COCOA results for cancer stage

COCOA results for cancer stage

COCOA results for cancer stage

Case studies

Breast cancer DNA methylation (Unsupervised)
Breast cancer ATAC-seq (Unsupervised)
Kidney cancer DNA methylation (Supervised)
Pan-cancer EZH2 analysis

DNA methylation in EZH2 regions and survival


DNA methylation in EZH2-binding regions most often positively correlated with risk of death.
#### Conclusions: COCOA... - can an interpret continuous regulatory variation - can use any signal that annotates genomic coordinates - can work on supervised and unsupervised data - is available from Bioconductor - depends critically on the database of region sets...

Thank You

Collaborators
Vince Reuter
Andre Rendeiro
Levi Waldron

Alumni
Aaron Gu
Jianglin Feng
Ognen Duzlevski
Tessa Danehy
Sheffield lab
Erfaneh Gharavi
Michal Stolarczyk
John Lawson
Jason Smith
Kristyna Kupkova
John Stubbs
Bingjie Xue
Jose Verdezoto
Nathan LeRoy
Oleksandr Khoroshevskyi
Funding:



NIGMS R35-GM128636

nsheff · databio.org · nsheffield@virginia.edu