Search results for "GPU"
showing 10 items of 43 documents
On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
2018
Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point perfor…
Fast Poisson solvers for graphics processing units
2013
Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3- fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU…
Generic heuristics on GPU to superpixel segmentation and application to optical flow estimation
2020
Finding clusters in point clouds and matching graphs to graphs are recurrent tasks in computer science domain, data analysis, image processing, that are most often modeled as NP-hard optimization problems. With the development and accessibility of cheap multiprocessors, acceleration of the heuristic procedures for these tasks becomes possible and necessary. We propose parallel implantation on GPU (graphics processing unit) system for some generic algorithms applied here to image superpixel segmentation and image optical flow problem. The aim is to provide generic algorithms based on standard decentralized data structures to be easy to improve and customized on many optimization problems and…
On GPU-accelerated fast direct solvers and their applications in image denoising
2015
Real-time data processing in the ALICE High Level Trigger at the LHC
2019
At the Large Hadron Collider at CERN in Geneva, Switzerland, atomic nuclei are collided at ultra-relativistic energies. Many final-state particles are produced in each collision and their properties are measured by the ALICE detector. The detector signals induced by the produced particles are digitized leading to data rates that are in excess of 48 GB/$s$. The ALICE High Level Trigger (HLT) system pioneered the use of FPGA- and GPU-based algorithms to reconstruct charged-particle trajectories and reduce the data size in real time. The results of the reconstruction of the collision events, available online, are used for high level data quality and detector-performance monitoring and real-tim…
GPU accelerated Monte Carlo simulations of lattice spin models
2011
We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.
CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware
2011
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…
An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia
2020
The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes…
Collision Avoidance with Potential Fields Based on Parallel Processing of 3D-Point Cloud Data on the GPU
2014
In this paper we present an experimental study on real-time collision avoidance with potential fields that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential fields serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control loop. The GPU algorithm creates a map representation that is quickly accessible. In addition, outliers and the robot structure are efficiently removed from the data, and the resolution of the representation can be easily ad…
Perfect Hashing Structures for Parallel Similarity Searches
2015
International audience; Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementa- tion. We propose a 2-stage index structure which is based on neighborhood indexing and perfect hashing techniques. This structure performs a filtering phase over the neighborhood regions around the seeds in constant time and avoid as much as possible random memory accesses and branch divergences. Moreover, it fits particularly well on parallel SIMD processors, because it requ…