<i>SAMBLASTER</i>: fast duplicate marking and structural variant read extraction

  • Gregory G. Faust
    1  1Department of Biochemistry and Molecular Genetics and 2Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
  • Ira M. Hall
    1  1Department of Biochemistry and Molecular Genetics and 2Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA

説明

<jats:p>Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.</jats:p> <jats:p>Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.</jats:p> <jats:p>Availability and implementation: SAMBLASTER is open-source C++ code and freely available for download from https://github.com/GregoryFaust/samblaster.</jats:p> <jats:p>Contact:  imh4y@virginia.edu</jats:p>

収録刊行物

  • Bioinformatics

    Bioinformatics 30 (17), 2503-2505, 2014-05-07

    Oxford University Press (OUP)

被引用文献 (8)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ