Search results for "Xeon"
showing 10 items of 20 documents
Efficient Parallel Sort on AVX-512-Based Multi-Core and Many-Core Architectures
2019
Sorting kernels are a fundamental part of numerous applications. The performance of sorting implementations is usually limited by a variety of factors such as computing power, memory bandwidth, and branch mispredictions. In this paper we propose an efficient hybrid sorting method which takes advantage of wide vector registers and the high bandwidth memory of modern AVX-512-based multi-core and many-core processors. Our approach employs a combination of vectorized bitonic sorting and load-balanced multi-threaded merging. Thread-level and data-level parallelism are used to exploit both compute power and memory bandwidth. Our single-threaded implementation is ~30x faster than qsort in the C st…
Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
2016
Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…
SWMapper: Scalable Read Mapper on SunWay TaihuLight
2020
With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data tra…
SPECTR
2018
Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific opt…
Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures
2016
This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…
S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light
2017
The availability and amount of sequenced genomes have been rapidly growing in recent years because of the adoption of next-generation sequencing (NGS) technologies that enable high-throughput short-read generation at highly competitive cost. Since this trend is expected to continue in the foreseeable future, the design and implementation of efficient and scalable NGS bioinformatics algorithms are important to research and industrial applications. In this paper, we introduce S-Aligner–a highly scalable read mapper designed for the Sunway Taihu Light supercomputer and its fourth-generationShenWei many-core architecture (SW26010). S-Aligner employs a combination of optimization techniques to o…
Accelerating metagenomic read classification on CUDA-enabled GPUs.
2016
Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (…
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
2016
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data par…
SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search
2017
Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified …
FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight
2022
Abstract One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our perf…