6533b86cfe1ef96bd12c7fd1

RESEARCH PRODUCT

SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search

Bertil SchmidtYongchao LiuHaidong LanWeiguo Liu

subject

0301 basic medicineXeonSequence databasebusiness.industryComputer scienceInterface (computing)Symmetric multiprocessor systemParallel computingSet (abstract data type)03 medical and health sciences030104 developmental biologySoftwareComputer architectureSIMDbusinessMassively parallel

description

Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified programming interface implemented in C++, to abstract underlying architectural differences. Performance evaluation reveals that SWhybrid (i) outperforms all other tested state-of-the-art tools on both homogeneous and heterogeneous computing platforms, (ii) achieves an efficiency of over 80% on all tested CPUs and GPUs and over 70% on Xeon Phis, and (iii) achieves utlization rates of over 80% on all tested heterogeneous platforms. Our results demonstrate that there is enough commonality between vector-like instructions across CPUs and GPUs that one can develop higher-level abstractions and still specialize with close-to-peak performance. SWhybrid is open-source software and freely available at https://github.com/turbo0628/swhybrid.

https://doi.org/10.1109/ipdps.2017.42