Search results for "parallel computing"
showing 10 items of 189 documents
A recognize-and-accuse policy to speed up distributed processes
1994
Accelerating short read mapping on an FPGA (abstract only)
2012
The explosive growth of short read datasets produced by high throughput DNA sequencing technologies poses a challenge to the mapping of short reads to a reference genome in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. Although a number of short read mapping software tools have been proposed, designs based on hardware are relatively rare. In this paper, we present a hybrid system for short read mapping utilizing both software and field programmable gate array (FPGA)-based hardware. The compute intensive semi-g…
A distributed dynamic load balancer and its implementation on multi-transputer systems for molecular dynamics simulation
1990
Abstract A new and efficient approach is described to the dynamic load-balancing problem which is central in concurrent computing. A transputer-based implementation is tested on a molecular dynamics simulation of spinodal phase separation.
CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data
2014
Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive …
AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation
2020
Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that ar…
Parallel In-Memory Evaluation of Spatial Joins
2019
The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parame…
Lightweight LCP construction for very large collections of strings
2016
The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is reali…
Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model
2010
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…
WarpCore: A Library for fast Hash Tables on GPUs
2020
Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…
Sorted deduplication: How to process thousands of backup streams
2016
The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…