6533b7d7fe1ef96bd1267a3f

RESEARCH PRODUCT

Accelerating metagenomic read classification on CUDA-enabled GPUs.

Robin KobusBertil SchmidtChristian HundtAndré Müller

subject

0301 basic medicineTheoretical computer scienceWorkstationGPUsComputer scienceContext (language use)CUDAParallel computingBiochemistryGenomelaw.invention03 medical and health sciencesCUDAUser-Computer Interface0302 clinical medicineStructural BiologylawTaxonomic assignmentHumansMicrobiomeMolecular BiologyInternetXeonApplied MathematicsHigh-Throughput Nucleotide SequencingSequence Analysis DNAExact k-mer matchingComputer Science Applications030104 developmental biologyTitan (supercomputer)Metagenomics030220 oncology & carcinogenesisMetagenomicsDNA microarraySoftware

description

Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation. cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at free of charge.

10.1186/s12859-016-1434-6https://pubmed.ncbi.nlm.nih.gov/28049411