SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

Gabriel A. Al-Ghalith, Benjamin Hillmann, Kaiwei Ang, Robin Shields-Cutler, Dan Knights

doi:10.1128/msystems.00202-17

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

Gabriel A. Al-Ghalith

Bioinformatics and Computational Biology, University of Minnesota—Twin Cities, Minneapolis, Minnesota, USA
Benjamin Hillmann

Computer Science, University of Minnesota—Twin Cities, Minneapolis, Minnesota, USA
Kaiwei Ang

Computer Science, University of Minnesota—Twin Cities, Minneapolis, Minnesota, USA
Robin Shields-Cutler

Biotechnology Institute, University of Minnesota—Twin Cities, Minneapolis, Minnesota, USA
Dan Knights

Bioinformatics and Computational Biology, University of Minnesota—Twin Cities, Minneapolis, Minnesota, USA
Marcus J. Claesson

editor

説明

<jats:p>Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.</jats:p>