6533b835fe1ef96bd129f42f

RESEARCH PRODUCT

RabbitQC: high-speed scalable quality control for sequencing data

Bertil SchmidtYanjie WeiHaidong LanWeiguo LiuZekun YinSong HongleiLiu MeiyangBeifang NiuZhang HaoZhang Wen

subject

Quality ControlStatistics and ProbabilityFASTQ formatDownstream (software development)Exploitmedia_common.quotation_subjectBiochemistryNanopores03 medical and health sciencesSoftwareQuality (business)Molecular Biology030304 developmental biologymedia_common0303 health sciencesbusiness.industry030302 biochemistry & molecular biologyHigh-Throughput Nucleotide SequencingSequence Analysis DNAComputer Science ApplicationsComputational MathematicsTask (computing)Computational Theory and MathematicsComputer architectureScalabilityNanopore sequencingbusinessSoftware

description

Abstract Motivation Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes. Results We present RabbitQC, an extremely fast integrated quality control tool for FASTQ files, which can take full advantage of modern hardware. It includes a variety of operations and supports different sequencing technologies (Illumina, Oxford Nanopore and PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools. Availability and implementation C++ sources and binaries are available at https://github.com/ZekunYin/RabbitQC. Supplementary information Supplementary data are available at Bioinformatics online.

https://doi.org/10.1093/bioinformatics/btaa719