Search results for "Parallel computing"
showing 10 items of 189 documents
SAUCE: A web application for interactive teaching and learning of parallel programming
2017
Abstract Prevalent hardware trends towards parallel architectures and algorithms create a growing demand for graduate students familiar with the programming of concurrent software. However, learning parallel programming is challenging due to complex communication and memory access patterns as well as the avoidance of common pitfalls such as dead-locks and race conditions. Hence, the learning process has to be supported by adequate software solutions in order to enable future computer scientists and engineers to write robust and efficient code. This paper discusses a selection of well-known parallel algorithms based on C++11 threads, OpenMP, MPI, and CUDA that can be interactively embedded i…
Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping
2013
Network-on-Chip architectures are scalable on-chip interconnection networks. They replace the inefficient shared buses and are suitable for multicore and manycore systems. This paper presents an Optimized Simulated Annealing (OSA) algorithm for the Network-on-Chip application mapping problem. With OSA, the cores are implicitly and dynamically clustered using knowledge about communication demands. We show that OSA is a more feasible Simulated Annealing approach to NoC application mapping by comparing it with a general Simulated Annealing algorithm and a Branch and Bound algorithm, too. Using real applications we show that OSA is significantly faster than a general Simulated Annealing, withou…
Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems
2015
This is a post-peer-review, pre-copyedit version of an article published in IEEE - ACM Transactions on Computational Biology and Bioinformatics. The final authenticated version is available online at: http://dx.doi.org/10.1109/TCBB.2015.2389958 [Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively lon…
High Precision Conservative Surface Mesh Generation for Swept Volumes
2015
We present a novel, efficient, and flexible scheme to generate a high-quality mesh that approximates the outer boundary of a swept volume. Our approach comes with two guarantees. First, the approximation is conservative, i.e., the swept volume is enclosed by the generated mesh. Second, the one-sided Hausdorff distance of the generated mesh to the swept volume is upper bounded by a user defined tolerance. Exploiting this tolerance the algorithm generates a mesh that is adapted to the local complexity of the swept volume boundary, keeping the overall output complexity remarkably low. The algorithm is two-phased: the actual sweep and the mesh generation. In the sweeping phase, we introduce a g…
SoC-Based Implementation of the Backpropagation Algorithm for MLP
2008
The backpropagation algorithm used for the training of multilayer perceptrons (MLPs) has a high degree of parallelism and is therefore well-suited for hardware implementation on an ASIC or FPGA. However, most implementations are lacking in generality of application, either by limiting the range of trainable network topologies or by resorting to fixed-point arithmetic to increase processing speed. We propose a parallel backpropagation implementation on a multiprocessor system-on-chip (SoC) with a large number of independent floating-point processing units, controlled by software running on embedded processors in order to allow flexibility in the selection of the network topology to be traine…
Concurrent Molecular Dynamics Simulation of ST2 Water on a Transputer Array
1988
Abstract A concurrent implementation of a Molecular Dynamics program for ST2 water molecules is presented, which exploits the great potentialities of the Transputer arrays for statistical mechanical calculations. High load-balance efficiency is obtained using a new task decomposition algorithm which evenly distributes particles and interaction calculations among the processors. This approach can also help to solve efficiently the more general problem of task distribution in parallel computing of symmetric pairwise system properties.
Optimizing PolyACO Training with GPU-Based Parallelization
2016
A central part of Ant Colony Optimisation (ACO) is the function calculating the quality and cost of solutions, such as the distance of a potential ant route. This cost function is used to deposit an opportune amount of pheromones to achieve an apt convergence, and in an active ACO implementation a significant part of the runtime is spent in this part of the code. In some cases, the cost function accumulates up towards 94 % in its run time making it a performance bottle neck.
Constant Time Garbage Collection in SSDs
2021
The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
2012
Abstract Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an effcient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU sho…