Search results for "Graphics processing unit"
showing 10 items of 42 documents
Accelerating Clifford Algebra Operations using GPUs and an OpenCL Code Generator
2015
Clifford Algebra (CA) is a powerful mathematical language that allows for a simple and intuitive representation of geometric objects and their transformations. It has important applications in many research fields, such as computer graphics, robotics, and machine vision. Direct hardware support of Clifford data types and operators is needed to accelerate applications based on Clifford Algebra. This paper proposes a mixed software-hardware system that exploits the computational power of Graphics Processing Units (GPUs) to accelerate Clifford operations. A code generator, namely OpenCLifford, is presented that automatically generates Java and C libraries for the direct support of Clifford ele…
On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone arrays
2015
Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate …
CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations
2013
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…
Multi-Kernel Implicit Curve Evolution for Selected Texture Regions Segmentation in VHR Satellite Images
2014
Very high resolution (VHR) satellite images provide a mass of detailed information which can be used for urban planning, mapping, security issues, or environmental monitoring. Nevertheless, the processing of this kind of image is timeconsuming, and extracting the needed information from among the huge quantity of data is a real challenge. For some applications such as natural disaster prevention and monitoring (typhoon, flood, bushfire, etc.), the use of fast and effective processing methods is demanded. Furthermore, such methods should be selective in order to extract only the information required to allow an efficient interpretation. For this purpose, we propose a texture region segmentat…
GPU accelerated Monte Carlo simulations of lattice spin models
2011
We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.
A CUDA-based implementation of an improved SPH method on GPU
2021
We present a CUDA-based parallel implementation on GPU architecture of a modified version of the Smoothed Particle Hydrodynamics (SPH) method. This modified formulation exploits a strategy based on the Taylor series expansion, which simultaneously improves the approximation of a function and its derivatives with respect to the standard formulation. The improvement in accuracy comes at the cost of an additional computational effort. The computational demand becomes increasingly crucial as problem size increases but can be addressed by employing fast summations in a parallel computational scheme. The experimental analysis showed that our parallel implementation significantly reduces the runti…
CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware
2011
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…
An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia
2020
The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes…
Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication
2017
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with comp…
Collision Avoidance with Potential Fields Based on Parallel Processing of 3D-Point Cloud Data on the GPU
2014
In this paper we present an experimental study on real-time collision avoidance with potential fields that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential fields serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control loop. The GPU algorithm creates a map representation that is quickly accessible. In addition, outliers and the robot structure are efficiently removed from the data, and the resolution of the representation can be easily ad…