Search results for "Speedup"
showing 10 items of 97 documents
Geomeasure: GIS and Scripting for Measuring Morphometric Variability
2019
This paper presents Geomeasure, a methodological tool developed to recover typometric information with a twofold objective. First, to speed up the process of gathering data by automatizing the way in which it is recovered. Second, it adds higher accuracy and the possibility of re-measuring archeological items without further directly interacting with the piece. Based on a combination of R scripting with GIS features, Geomeasure is at the time able to automatically gather 125–130 typometric variables per archaeological item, with the only input of vectorized photographs. It can be used as a reliable methodological aid to extract detailed information on patterns and trends of shape variabilit…
Efficient Parallel Sort on AVX-512-Based Multi-Core and Many-Core Architectures
2019
Sorting kernels are a fundamental part of numerous applications. The performance of sorting implementations is usually limited by a variety of factors such as computing power, memory bandwidth, and branch mispredictions. In this paper we propose an efficient hybrid sorting method which takes advantage of wide vector registers and the high bandwidth memory of modern AVX-512-based multi-core and many-core processors. Our approach employs a combination of vectorized bitonic sorting and load-balanced multi-threaded merging. Thread-level and data-level parallelism are used to exploit both compute power and memory bandwidth. Our single-threaded implementation is ~30x faster than qsort in the C st…
Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
2016
Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…
SWMapper: Scalable Read Mapper on SunWay TaihuLight
2020
With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data tra…
Collision detection for 3D rigid body motion planning with narrow passages
2017
In sampling-based 3D rigid body motion planning one of the major subroutines is collision detection. Especially for problems with narrow passages many samples have to be checked by a collision detection algorithm. In this application, the runtime of the motion planning algorithm is dominated by collision detection and the samples have the very specific characteristic that many of them are in collision and have small penetration volumes. In our work, we introduce a data structure and an algorithm that makes use of this characteristic by combining well-known data structures like a distance field and an octree with the swap algorithm by Llanas et al. For 3D rigid body motion planning with narr…
SPECTR
2018
Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific opt…
parSRA: A framework for the parallel execution of short read aligners on compute clusters
2018
The growth of next generation sequencing datasets poses as a challenge to the alignment of reads to reference genomes in terms of both accuracy and speed. In this work we present parSRA, a parallel framework to accelerate the execution of existing short read aligners on distributed-memory systems. parSRA can be used to parallelize a variety of short read alignment tools installed in the system without any modification to their source code. We show that our framework provides good scalability on a compute cluster for accelerating the popular BWA-MEM and Bowtie2 aligners. On average, it is able to accelerate sequence alignments on 16 64-core nodes (in total, 1024 cores) with speedup of 10.48 …
CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
2015
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…
Computing the Original eBWT Faster, Simpler, and with Less Memory
2021
Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …
FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight
2022
Abstract One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our perf…