Research
The Sheffield Lab uses computation to ask and answer biological questions. Our biological interest is to understand gene regulation: How does DNA encode regulatory networks that enable cellular differentiation? Gene regulatory systems are finely tuned, and when they break down, it can lead to diseases like cancer. To better understand normal and diseased gene regulation, we collect high-throughput genome-scale data in single cells and cell populations, and then harness the power of supercomputing, machine learning, and software engineering to answer questions about biological systems. This research is inherently interdisciplinary, approaching questions in biology and medicine with tools from computer science and statistics.
Biology
- Gene regulation, chromatin, and epigenetics
- Cancer epigenomics
- Cell state and fate in development
- Single-cell heterogeneity
Tools
- R/Bioconductor tools for large-scale bioinformatic analysis
- Integrating large genome-scale datasets using high performance computing
- Applied machine learning
- Scientific computing, reproducibility, open data, and data sharing in genomics
Publications
Interest areas
Gene regulation and chromatin structure
The group studies how DNA encodes regulatory networks that enable cellular differentiation, and how these systems break down in disease. We ask fundamental questions about gene regulation, such as how regulatory DNA interacts to drive cellular programs, or how cells develop and respond to stimuli through chromatin remodeling at the single-cell level.
Selected relevant publications:- Smith et al. (2021). PEPPRO: quality control and processing of nascent RNA profiling data
Genome Biology. DOI: 10.1186/s13059-021-02349-4 - Smith et al. (2020). PEPATAC: An optimized ATAC-seq pipeline with serial alignments
bioRxiv. DOI: 10.1101/2020.10.21.347054 - Lawson et al. (2020). COCOA: coordinate covariation analysis of epigenetic heterogeneity
Genome Biology. DOI: 10.1186/s13059-020-02139-4 - Smith and Sheffield (2020). Analytical Approaches for ATAC-seq Data Analysis
Current Protocols in Human Genetics. DOI: 10.1002/cphg.101 - Lawson et al. (2018). MIRA: An R package for DNA methylation-based inference of regulatory activity
Bioinformatics. DOI: 10.1093/bioinformatics/bty083 - Wang et al. (2018). BART: a transcription factor prediction tool with query gene sets or epigenomic profiles
Bioinformatics. DOI: 10.1093/bioinformatics/bty194 - Nagraj et al. (2018). LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis
Nucleic Acids Research. DOI: 10.1093/nar/gky464 - Sheffield et al. (2017). DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma
Nature Medicine. DOI: 10.1038/nm.4273 - Sheffield and Bock (2016). LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
Bioinformatics. DOI: 10.1093/bioinformatics/btv612 - Klughammer et al. (2015). Differential DNA Methylation Analysis without a Reference Genome
Cell Reports. DOI: 10.1016/j.celrep.2015.11.024 - Schmidl et al. (2015). ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors
Nat. Methods. DOI: 10.1038/nmeth.3542 - Tomazou et al. (2015). Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1
Cell Reports. DOI: 10.1016/j.celrep.2015.01.042 - Sheffield et al. (2013). Patterns of regulatory activity across diverse human cell-types predict tissue identity, transcription factor binding, and long-range interactions
Genome Res. DOI: 10.1101/gr.152140.112 - Tewari et al. (2012). Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity
Genome Biol. DOI: 10.1186/gb-2012-13-10-r88 - ENCODE Consortium (2012). An integrated encyclopedia of DNA elements in the human genome
Nature. DOI: 10.1038/nature11247 - Natarajan et al. (2012). Predicting cell-type-specific gene expression from regions of open chromatin
Genome Res. DOI: 10.1101/gr.135129.111 - Thurman et al. (2012). The accessible chromatin landscape of the human genome
Nature. DOI: 10.1038/nature11232 - Shibata et al. (2012). Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection
PLos Genet. DOI: 10.1371/journal.pgen.1002789 - Sheffield and Furey (2012). Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays
Genes. DOI: 10.3390/genes3040651 - Song et al. (2011). Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity
Genome Res. DOI: 10.1101/gr.121541.111
Computational cancer epigenomics
Driven by biological interests in epigenomics and gene regulation, we analyze DNA methylation and chromatin accessibility and how these signals characterize cancers. Cancer is caused by a regulatory process run amok, and we study these regulatory programs in their normal and diseased state.
Selected relevant publications:- Lawson et al. (2020). COCOA: coordinate covariation analysis of epigenetic heterogeneity
Genome Biology. DOI: 10.1186/s13059-020-02139-4 - Corces et al. (2018). The chromatin accessibility landscape of primary human cancers
Science. DOI: 10.1126/science.aav1898 - Klughammer et al. (2018). The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space
Nature Medicine. DOI: 10.1038/s41591-018-0156-x - Sheffield et al. (2017). DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma
Nature Medicine. DOI: 10.1038/nm.4273 - Kovar et al. (2016). The second European interdisciplinary Ewing sarcoma research summit – A joint effort to deconstructing the multiple layers of a complex disease
Oncotarget. DOI: 10.18632/oncotarget.6937 - Tomazou et al. (2015). Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein EWS-FLI1
Cell Reports. DOI: 10.1016/j.celrep.2015.01.042
Single-cell sequencing analysis
Using new microfluidics and sequencing technology technology, we are interested in asking fundamental questions about how cells differentiate and respond to their environments at the single cell level.
Selected relevant publications:- Litzenburger et al. (2017). Single-cell epigenomic variability reveals functional cancer heterogeneity
Genome Biol. DOI: 10.1186/s13059-016-1133-7 - Bock et al. (2016). Multi-Omics of Single Cells: Strategies and Applications
Trends Biotechnol. DOI: 10.1016/j.tibtech.2016.04.004 - Schmidl et al. (2015). ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors
Nat. Methods. DOI: 10.1038/nmeth.3542 - Farlik et al. (2015). Single-cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics
Cell Reports. DOI: 10.1016/j.celrep.2015.02.001
Scientific computing and large-scale biomedical data management
We are interested in research infrastructure to enable broader scientific computing in genomics and beyond, particularly the interoperability of data and analysis. The group is developing core infrastructure to solve general problems in scientific computing. As genomic and multi-omic data have increased in size, we develop novel models of genomic data and to build state-of-the-art APIs and systems that help biologists get the most out of data.
Selected relevant publications:- Sheffield et al. (2021). Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects
bioRxiv. DOI: 10.1101/2020.10.08.331322 - Feng and Sheffield (2020). IGD: high-performance search for large-scale genomic interval datasets
Bioinformatics. DOI: 10.1093/bioinformatics/btaa1062 - Stolarczyk et al. (2020). Refgenie: a reference genome resource manager
GigaScience. DOI: 10.1093/gigascience/giz149 - Sheffield (2019). Bulker: a multi-container environment manager
OSF Preprints. DOI: 10.31219/osf.io/natsj - Feng et al. (2019). Augmented Interval List: a novel data structure for efficient genomic interval search
Bioinformatics. DOI: 10.1093/bioinformatics/btz407 - Lawson et al. (2018). MIRA: An R package for DNA methylation-based inference of regulatory activity
Bioinformatics. DOI: 10.1093/bioinformatics/bty083 - Sheffield et al. (2018). simpleCache: R caching for reproducible, distributed, large-scale projects
The Journal of Open Source Software. DOI: 10.21105/joss.00463 - Sheffield and Bock (2016). LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
Bioinformatics. DOI: 10.1093/bioinformatics/btv612
Literature threads
Here are some lists of papers on some relevant topics that I am interested in: