6533b829fe1ef96bd12896a0

RESEARCH PRODUCT

XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters

Zekun YinHao ZhangKai XuYuandong ChanShaoliang PengXiaoning WangBertil SchmidtWeiguo Liu

subject

Longest common subsequence problemDynamic programmingSpeedupComputer scienceComputer clusterAsynchronous I/OCacheSupercomputerAlgorithmXeon Phi

description

Finding the longest common subsequence (LCS) of two strings is a classical problem in bioinformatics. A basic approach to solve this problem is based on dynamic programming. As the biological sequence databases are growing continuously, bit-parallel sequence comparison algorithms are becoming increasingly important. In this paper, we present XLCS, a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations. We have designed an asynchronous IO framework to improve the data transfer efficiency. To make full use of the computing resources of Xeon Phi clusters, we use three levels of parallelism: node-level, thread-level and vector-level. We also propose a segmentation-method to decrease cache misses. Our performance evaluation shows that XLCS achieves a peak performance of 3.61 TCUPS on a single 31S1P card and 8.20 TCUPS on a single 7210 card. Compared to the well-parallelized LCS algorithm OCS implemented on three M2090 cards, XLCS achieves a speedup of 3.6 on a KNC and 8.1 on a KNL, respectively. XLCS further achieves 93.84 TCUPS running 16 compute nodes of the Tianhe-2 supercomputer and 312.30 TCUPS with 32 compute nodes of a KNL cluster. To our knowledge, this is the first reported implementation of the bit-parallel LCS algorithm on Xeon Phi clusters. XLCS is available at https://github.com/wxiaoning/Longest-Common-Subsequence.

https://doi.org/10.1109/hpcc/smartcity/dss.2019.00204