-
Fast, memory-efficient genomic interval tokenizers for modern machine learning
arXiv (2025) -
Taming the reference genome jungle: the refget sequence collection standard
bioRxiv (2025) DOI: 10.1101/2025.10.06.680641 -
Atacformer: A transformer-based foundation model for analysis and interpretation of ATAC-seq data
(2025) -
Decoding cell identity with region accessibility embedding distances
(2025) -
Differential chromatin accessibility analysis elucidates mechanisms of coronary artery disease-associated genetic variation
(2025) -
Model Context Protocol: the unexpected catalyst of a bioinformatics interoperability revolution
(2025) -
Methods for evaluating unsupervised vector representations of genomic regions
NAR Genom Bioinform (2024) DOI: 10.1093/nargab/lqae086 -
Methods for evaluating unsupervised vector representations of genomic regions
NAR Genomics and Bioinformatics (2024) DOI: 10.1093/nargab/lqae086 -
Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets
Bioengineering (2024) DOI: 10.3390/bioengineering11030263 -
Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings
NAR Genomics and Bioinformatics (2024) DOI: 10.1093/nargab/lqae073 -
Methods for constructing and evaluating consensus genomic interval sets
Nucleic Acids Research (2024) DOI: 10.1093/nar/gkae685 -
PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata
GigaScience (2024) DOI: 10.1093/gigascience/giae033 -
Opportunities and challenges in sharing and reusing genomic interval data
Frontiers in Genetics (2023) DOI: 10.3389/fgene.2023.1155809 -
Challenges to sharing sample metadata in computational genomics
Front Genet (2023) DOI: 10.3389/fgene.2023.1154198 -
GEOfetch: A command-line tool for downloading data and standardized metadata from GEO and SRA
Bioinformatics (2023) DOI: 10.1093/bioinformatics/btad069 -
Refget: standardized access to reference sequences
Bioinformatics (2022) DOI: 10.1093/bioinformatics/btab524 -
GA4GH: International policies and standards for data sharing across genomic research and healthcare
Cell Genomics (2022) DOI: 10.1016/j.xgen.2021.100029 -
Detecting molecular subtypes from multi-omics datasets using {SUMO}
Cell Reports Methods 2: 100152 (2022) DOI: 10.1016/j.crmeth.2021.100152 -
{GenomicDistributions}: fast analysis of genomic intervals with Bioconductor
{BMC} Genomics 23 (2022) DOI: 10.1186/s12864-022-08467-y -
Refget: standardized access to reference sequences
Bioinformatics (2021) DOI: 10.1093/bioinformatics/btab524 -
Embeddings of genomic region sets capture rich biological associations in lower dimensions
Bioinformatics (2021) DOI: 10.1093/bioinformatics/btab439 -
Identity and compatibility of reference genome resources
NAR Genom Bioinform (2021) DOI: 10.1093/nargab/lqab036 -
Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects
Gigascience (2021) DOI: 10.1093/gigascience/giab077 -
Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines
GigaScience (2021) DOI: 10.1093/gigascience/giab022 -
IGD: high-performance search for large-scale genomic interval datasets
Bioinformatics (2021) DOI: 10.1093/bioinformatics/btaa1062 -
Supporting data for "Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects"
GigaScience Database (2021) DOI: 10.5524/100936 -
DNA Methyltransferase 1 and 3a Expression in the Frontal Cortex Regulates Palatable Food Consumption
bioRxiv (2021) DOI: 10.1101/2021.05.23.445176 -
PEPPRO: quality control and processing of nascent RNA profiling data
Genome Biology 22 (2021) DOI: 10.1186/s13059-021-02349-4 -
Bedshift: perturbation of genomic interval sets
Genome Biology 22 (2021) DOI: 10.1186/s13059-021-02440-w -
{PEPATAC}: an optimized pipeline for {ATAC}-seq data analysis with serial alignments
{NAR} Genomics and Bioinformatics 3 (2021) DOI: 10.1093/nargab/lqab101 -
Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden
Nature Communications 12 (2021) DOI: 10.1038/s41467-021-23445-w -
Refgenie: a reference genome resource manager
Gigascience (2020) DOI: 10.1093/gigascience/giz149 -
Seqpare: a novel metric of similarity between genomic interval sets
F1000Research 9: 581 (2020) DOI: 10.1101/2020.04.05.026732 -
Analytical Approaches for {ATAC}-seq Data Analysis
Current Protocols in Human Genetics 106 (2020) DOI: 10.1002/cphg.101 -
{COCOA}: coordinate covariation analysis of epigenetic heterogeneity
Genome Biology 21 (2020) DOI: 10.1186/s13059-020-02139-4 -
Augmented Interval List: a novel data structure for efficient genomic interval search
Bioinformatics (2019) DOI: 10.1093/bioinformatics/btz407 -
Augmented Interval List: a novel data structure for efficient genomic interval search
Bioinformatics (2019) DOI: 10.1093/bioinformatics/btz407 -
Bulker: a multi-container environment manager
OSF Preprints (2019) DOI: 10.31219/osf.io/natsj -
The chromatin accessibility landscape of primary human cancers
Science (2018) DOI: 10.1126/science.aav1898 -
The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space
Nature Medicine (2018) DOI: 10.1038/s41591-018-0156-x -
{simpleCache}: {R} caching for reproducible, distributed, large-scale projects
The Journal of Open Source Software 3: 463 (2018) DOI: 10.21105/joss.00463 (corresponding author) -
Coloc-stats: a unified web interface to perform colocalization analysis of genomic features
Nucleic Acids Research 46: W186–W193 (2018) DOI: 10.1093/nar/gky474 -
MIRA: An {R} package for {DNA} methylation-based inference of regulatory activity
Bioinformatics bty083 (2018) DOI: 10.1093/bioinformatics/bty083 -
LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis
Nucleic Acids Research (2018) DOI: 10.1093/nar/gky464 -
{BART}: a transcription factor prediction tool with query gene sets or epigenomic profiles
Bioinformatics (2018) DOI: 10.1093/bioinformatics/bty194 -
Single-cell epigenomic variability reveals functional cancer heterogeneity
Genome Biol. 18: 15 (2017) DOI: 10.1186/s13059-016-1133-7 -
{DNA} methylation heterogeneity defines a disease spectrum in {Ewing} sarcoma
Nature Medicine 23: 386–395 (2017) DOI: 10.1038/nm.4273 -
LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
Bioinformatics (2016) DOI: 10.1093/bioinformatics/btv612 -
The second European interdisciplinary {Ewing} sarcoma research summit -- A joint effort to deconstructing the multiple layers of a complex disease
Oncotarget (2016) DOI: 10.18632/oncotarget.6937 -
Multi-Omics of Single Cells: Strategies and Applications.
Trends Biotechnol. 34: 605–608 (2016) DOI: 10.1016/j.tibtech.2016.04.004 -
{ChIPmentation}: fast, robust, low-input {ChIP}-seq for histones and transcription factors
Nat. Methods 12: 963–965 (2015) DOI: 10.1038/nmeth.3542 -
Single-cell {DNA} methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics.
Cell Reports 10: 1386–1397 (2015) DOI: 10.1016/j.celrep.2015.02.001 (co-first author) -
Differential {DNA} Methylation Analysis without a Reference Genome.
Cell Reports 13: 2621–2633 (2015) DOI: 10.1016/j.celrep.2015.11.024 -
Epigenome mapping reveals distinct modes of gene regulation and widespread enhancer reprogramming by the oncogenic fusion protein {EWS-FLI1}.
Cell Reports 10: 1082–1095 (2015) DOI: 10.1016/j.celrep.2015.01.042 (co-first author) -
The Interaction between Base Compositional Heterogeneity and Among-Site Rate Variation in Models of Molecular Evolution
ISRN Evolutionary Biology 2013: 1–8 (2013) DOI: 10.5402/2013/391561 -
Patterns of regulatory activity across diverse human cell-types predict tissue identity, transcription factor binding, and long-range interactions
Genome Res. 23: 777–88 (2013) DOI: 10.1101/gr.152140.112 -
The accessible chromatin landscape of the human genome
Nature (2012) DOI: 10.1038/nature11232 -
An integrated encyclopedia of DNA elements in the human genome
Nature (2012) DOI: 10.1038/nature11247 -
An Integrated Encyclopedia of DNA Elements in the Human Genome
Nature (2012) DOI: 10.1038/nature11247 -
Predicting cell-type-specific gene expression from regions of open chromatin.
Genome Res. 22: 1711–22 (2012) DOI: 10.1101/gr.135129.111 -
Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays.
Genes 3: 651–70 (2012) DOI: 10.3390/genes3040651 -
Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection
PLos Genet. 8: e1002789 (2012) DOI: 10.1371/journal.pgen.1002789 (co-first author) -
Chromatin accessibility reveals insights into androgen receptor activation and transcriptional specificity.
Genome Biol. 13: R88 (2012) DOI: 10.1186/gb-2012-13-10-r88 -
A user's guide to the encyclopedia of {DNA} elements (ENCODE).
PLoS Biol. 9: e1001046 (2011) DOI: 10.1371/journal.pbio.1001046 -
Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity.
Genome Res. 21: 1757–67 (2011) DOI: 10.1101/gr.121541.111 (co-first author) -
Mitochondrial genomics in Orthoptera using MOSAS
Mitochondrial DNA 21: 87–104 (2010) DOI: 10.3109/19401736.2010.500812 -
When phylogenetic assumptions are violated: base compositional heterogeneity and among-site rate variation in beetle mitochondrial phylogenomics
Syst. Entomol. 35: 429–448 (2010) DOI: 10.1111/j.1365-3113.2009.00517.x -
Nonstationary evolution and compositional heterogeneity in beetle mitochondrial phylogenomics.
Syst. Biol. 58: 381–94 (2009) DOI: 10.1093/sysbio/syp037 -
A comparative analysis of mitochondrial genomes in Coleoptera (Arthropoda: Insecta) and genome descriptions of six new beetles.
Mol. Biol. Evol. 25: 2499–509 (2008) DOI: 10.1093/molbev/msn198 -
Calculating expected {DNA} remnants from ancient founding events in human population genetics.
BMC Genet. 9: 66 (2008) DOI: 10.1186/1471-2156-9-66 (co-first author) -
Single-cell transcriptome and accessible chromatin dynamics during endocrine pancreas development
119 () DOI: 10.1073/pnas.2201267119 -
Expanding the Galaxy's reference data
2 () DOI: 10.1093/bioadv/vbac030 -
AI-readiness for Biomedical Data: Bridge2AI Recommendations
() DOI: 10.1101/2024.10.23.619844 -
excluderanges: exclusion sets for T2T-{CHM}13, {GRCm}39, and other genome assemblies
39 () DOI: 10.1093/bioinformatics/btad198 -
Renin Cell Development: Insights From Chromatin Accessibility and Single-Cell Transcriptomics
133: 369–371 () DOI: 10.1161/circresaha.123.322827 -
Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis
42: 113380 () DOI: 10.1016/j.celrep.2023.113380 -
Interoperability starts with identifiers
() -
Inhibition of Renin Expression Is Regulated by an Epigenetic Switch From an Active to a Poised State
81: 1869–1882 () DOI: 10.1161/hypertensionaha.124.22886 -
Inhibition of Renin Expression Is Regulated by an Epigenetic Switch From an Active to a Poised State
81: 1869–1882 () DOI: 10.1161/hypertensionaha.124.22886 -
BEDMS: A metadata standardizer for genomic region attributes
() DOI: 10.1101/2024.09.18.613791 -
DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data
40 () DOI: 10.1093/bioinformatics/btae434 -
DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data
40 () DOI: 10.1093/bioinformatics/btae434 -
From biomedical cloud platforms to microservices: next steps in {FAIR} data and analysis
9 () DOI: 10.1038/s41597-022-01619-5
Publications
Find our publications on PubMed, Google Scholar, or ORCID.