6533b85efe1ef96bd12bf3d5

RESEARCH PRODUCT

SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

Yongchao LiuBertil Schmidt

subject

Smith–Waterman algorithmFOS: Computer and information sciencesMulti-core processorCoprocessorSpeedupSequence databaseComputer scienceParallel computingIntrinsicsComputer Science - Distributed Parallel and Cluster ComputingScalabilitySIMDDistributed Parallel and Cluster Computing (cs.DC)Xeon Phi

description

The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.

https://dx.doi.org/10.48550/arxiv.1404.4152