6533b836fe1ef96bd12a12a7

RESEARCH PRODUCT

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

Zejun ZhengBertil SchmidtThuy-diem Nguyen

subject

CUDADistance matrixComputer scienceMetagenomicsPipeline (computing)Pairwise comparisonParallel computingCluster analysisQuantitative Biology::GenomicsMassively parallelSparse matrix

description

Pyrosequencing technologies are frequently used for sequencing the 16S rRNA marker gene for metagenomic studies of microbial communities. Computing a pairwise genetic distance matrix from the produced reads is an important but highly time consuming task. In this paper, we present a parallelized tool (called CRiSPy) for scalable pairwise genetic distance matrix computation and clustering that is based on the processing pipeline of the popular ESPRIT software package. To achieve high computational efficiency, we have designed massively parallel CUDA algorithms for pairwise k-mer distance and pairwise genetic distance computation. We have also implemented a memory-efficient sparse matrix clustering program to process the distance matrix. On a single-GPU, CRiSPy achieves speedups of around two orders of magnitude compared to the sequential ESPRIT program for both the time-consuming pairwise genetic distance module and the whole processing pipeline, thus making CRiSPy particularly suitable for high-throughput microbial studies.

https://doi.org/10.1007/978-3-642-24855-9_4