Assessment of computational methods for the analysis of single-cell ATAC-seq data

Description

<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1–10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10–45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, <jats:italic>Cusanovich2018</jats:italic>, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).</jats:p> </jats:sec>

Journal

  • Genome Biology

    Genome Biology 20 (1), 241-, 2019-11-18

    Springer Science and Business Media LLC

Citations (2)*help

See more

Report a problem

Back to top