Search results for "Parallel computing"
showing 10 items of 189 documents
BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures
2018
Abstract Motivation Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. Results BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance perf…
A parallel and sensitive software tool for methylation analysis on multicore platforms.
2015
Abstract Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bis…
RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures
2020
Abstract Motivation Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets. Results We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furtherm…
Analysis of the influence of processor hidden registers on the accuracy of fault injection techniques
2004
Modern processors tend to increase the number of registers, being part of them not accessible by the instruction set. Traditionally, the effect of faults in these hidden registers has not been considered during system validation using fault injection. In this paper, a study of the importance of faults in hidden registers is performed. Firstly, we have analysed the sensitivity of hidden registers to faults in combinational logic. In a second phase, we have analysed the impact of the faults occurred in hidden registers on system behaviour. A broad set of permanent and transient faults have been injected into the models of two typical commercial microcontrollers, using a VHDL-based fault injec…
Parallel Computing for the study of the focusing Davey-Stewartson II equation in semiclassical limit
2012
The asymptotic description of the semiclassical limit of nonlinear Schrödinger equations is a major challenge with so far only scattered results in 1 + 1 dimensions. In this limit, solutions to the NLS equations can have zones of rapid modulated oscillations or blow up. We numerically study in this work the Davey-Stewartson system, a 2 + 1 dimensional nonlinear Schrödinger equation with a nonlocal term, by using parallel computing. This leads to the first results on the semiclassical limit for the Davey-Stewartson equations.
Optimal Configuration for N-Dimensional Twin Torus Networks
2014
Torus topology is one of the most common topologies used in the current largest supercomputers. Although 3D torus is widely used, recently some supercomputers in the Top500 list have been built using networks with topologies of five or six dimensions. To obtain an nD torus, 2n ports per node are needed. These ports can be offered by a single or several cards per node. In the second case, there are multiple ways of assigning the dimension and direction of the card ports. In a previous work we proposed the 3D Twin (3DT) torus which uses two 4-port cards per node, and obtained the optimal port configuration. This paper extends and generalizes that work in order to obtain the optimal port confi…
Empirical Autotuning of Two-level Parallel Linear Algebra Routines on Large cc-NUMA Systems
2012
In large cc-NUMA systems the efficient use of the different levels of the memory hierarchy is not an easy task, and the performance of multithreading implementations of the libraries decreases when the number of cores used increases, so producing an important lost of efficiency. To alleviate this problem, routines with multilevel parallelism can be developed by combining OpenMP and BLAS parallelism. In that way, higher performance can be achieved, but it is necessary to develop some autotuning technique for the appropriate selection of the number of threads to use at each level. The selection can be made through theoretical models of the execution time or some installation methodology. This…
Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform
2012
Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time com…
Work Partitioning on Parallel and Distributed Agent-Based Simulation
2017
Work partitioning is a key challenge with ap- plications in many scientific and technological fields. The problem is very well studied with a rich literature on both distributed and parallel computing architectures. In this paper we deal with the work partitioning problem for parallel and distributed agent-based simulations which aims at (i) balancing the overall load distribution, (ii) minimizing, at the same time, the communication overhead due to agents' inter-dependencies. We introduce a classification taxonomy of work partitioning strategies and present a space-based work partitioning ap- proach, based on a Quad-tree data structure, which enables to: identify a good space partitioning …
Parallel Collision Queries on the GPU
2013
We present parallel algorithms to accelerate collision tests of rigid body objects for a high number of independent transformations as they occur in sampling-based motion planning and path validation problems. We compare various GPU approaches with a different level of parallelism against each other and against a parallel CPU implementation. Our algorithms require no sophisticated load balancing schemes. They make no assumption on the distribution of the input transformations and require no pre-processing. Yet, we can perform up to 1 million collision tests per second with our best GPU implementation in our benchmarks. This is about 2.5X faster than our reference multi-core CPU implementati…