Search results for "Graphics processing unit"

showing 10 items of 42 documents

Accelerating Clifford Algebra Operations using GPUs and an OpenCL Code Generator

2015

Clifford Algebra (CA) is a powerful mathematical language that allows for a simple and intuitive representation of geometric objects and their transformations. It has important applications in many research fields, such as computer graphics, robotics, and machine vision. Direct hardware support of Clifford data types and operators is needed to accelerate applications based on Clifford Algebra. This paper proposes a mixed software-hardware system that exploits the computational power of Graphics Processing Units (GPUs) to accelerate Clifford operations. A code generator, namely OpenCLifford, is presented that automatically generates Java and C libraries for the direct support of Clifford ele…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSpeedupHardware-software co-designOpenCLComputer scienceClifford algebraGeometric AlgebraParallel computingData typeMetaprogrammingComputer graphicsClifford AlgebraGeometric algebraComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATIONCode generationCentral processing unitGraphicsGraphics Processing Unit

researchProduct

On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone arrays

2015

Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate …

Signal processingReverberationComputer scienceMicrophoneReal-time computingGeneral EngineeringAcoustic source localizationSound powercomputer.software_genreGridExpert systemMicrophone arraysComputer Science ApplicationsSound source localizationNoiseArtificial IntelligenceTEORIA DE LA SEÑAL Y COMUNICACIONESCIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIALGraphics Processing UnitscomputerSteered Response Power

researchProduct

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing

researchProduct

Multi-Kernel Implicit Curve Evolution for Selected Texture Regions Segmentation in VHR Satellite Images

2014

Very high resolution (VHR) satellite images provide a mass of detailed information which can be used for urban planning, mapping, security issues, or environmental monitoring. Nevertheless, the processing of this kind of image is timeconsuming, and extracting the needed information from among the huge quantity of data is a real challenge. For some applications such as natural disaster prevention and monitoring (typhoon, flood, bushfire, etc.), the use of fast and effective processing methods is demanded. Furthermore, such methods should be selective in order to extract only the information required to allow an efficient interpretation. For this purpose, we propose a texture region segmentat…

[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR][INFO.INFO-AR] Computer Science [cs]/Hardware Architecture [cs.AR]Pixelbusiness.industryComputer science0211 other engineering and technologiesGraphics processing unitBoundary (topology)Scale-space segmentation02 engineering and technologyImage segmentationFuzzy logicImage texture11. Sustainability0202 electrical engineering electronic engineering information engineeringGeneral Earth and Planetary Sciences020201 artificial intelligence & image processingComputer visionSegmentation[ INFO.INFO-AR ] Computer Science [cs]/Hardware Architecture [cs.AR]Artificial intelligenceElectrical and Electronic EngineeringbusinessComputingMilieux_MISCELLANEOUS021101 geological & geomatics engineering

researchProduct

GPU accelerated Monte Carlo simulations of lattice spin models

2011

We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.

cluster algorithmsStatistical Mechanics (cond-mat.stat-mech)Computer scienceComputationNumerical analysisspin modelsMonte Carlo methodHigh Energy Physics - Lattice (hep-lat)FOS: Physical sciencesStatistical mechanicsGPU computingPhysics and Astronomy(all)Computational Physics (physics.comp-ph)generalized-ensemble simulationsMonte Carlo simulationsComputational scienceCUDAHigh Energy Physics - LatticeSpin modelGeneral-purpose computing on graphics processing unitsGraphicsPhysics - Computational PhysicsCondensed Matter - Statistical Mechanics

researchProduct

A CUDA-based implementation of an improved SPH method on GPU

2021

We present a CUDA-based parallel implementation on GPU architecture of a modified version of the Smoothed Particle Hydrodynamics (SPH) method. This modified formulation exploits a strategy based on the Taylor series expansion, which simultaneously improves the approximation of a function and its derivatives with respect to the standard formulation. The improvement in accuracy comes at the cost of an additional computational effort. The computational demand becomes increasingly crucial as problem size increases but can be addressed by employing fast summations in a parallel computational scheme. The experimental analysis showed that our parallel implementation significantly reduces the runti…

fast gauss transformScheme (programming language)0209 industrial biotechnologyComputer scienceApplied Mathematics020206 networking & telecommunications02 engineering and technologyFunction (mathematics)Computational scienceSmoothed-particle hydrodynamicsComputational MathematicsCUDAsymbols.namesakeSettore MAT/08 - Analisi Numerica020901 industrial engineering & automationgraphic processing unit0202 electrical engineering electronic engineering information engineeringTaylor seriessymbolsSmoothed Particle Hydrodynamics Fast Gauss Transform Graphics Processing Unit.Central processing unitsmoothed particle hydorodinamicscomputercomputer.programming_language

researchProduct

CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware

2011

Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…

graphics hardwareSource codeComputer sciencemedia_common.quotation_subjectGraphics hardwareGraphics processing unitParallel computingGeneral Purpose Computation on Graphics Processing Unit (GPGPU)Computational scienceInstruction setCUDAGeneticsComputer GraphicsDatabases Proteinmedia_commondynamic programmingFinite-state machineSequence databaseApplied MathematicsProteinsCompute Unified Device Architecture (CUDA)sequence alignmentGeneral-purpose computing on graphics processing unitsAlgorithmsSoftwareBiotechnology

researchProduct

An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia

2020

The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes…

interpolation.General Computer Scienceparallel filtersComputer scienceGPUGpuGraphics processing unitLatency (audio)Parametric model02 engineering and technologycomputer.software_genre030507 speech-language pathology & audiology03 medical and health sciencesSoftware portabilityHRTF modeling0202 electrical engineering electronic engineering information engineeringGeneral Materials ScienceMultimediaparametric modelGeneral EngineeringTeleconferenceBinaural synthesis020206 networking & telecommunicationsVideo processingEnergy consumptioninterpolationInterpolationHrtf modelingScalabilityParallel filtersElectrónicaAugmented realitylcsh:Electrical engineering. Electronics. Nuclear engineering0305 other medical sciencelcsh:TK1-9971Mobile devicecomputerIEEE Access

researchProduct

Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication

2017

The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with comp…

load balancedata layoutlarge-scale I/Ovirtauslaskentaprosessoritasynchronous communicationgraphics processing unitTitanLattice Boltzmannmemory alignmentComputingMethodologies_COMPUTERGRAPHICS

researchProduct

Collision Avoidance with Potential Fields Based on Parallel Processing of 3D-Point Cloud Data on the GPU

2014

In this paper we present an experimental study on real-time collision avoidance with potential fields that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential fields serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control loop. The GPU algorithm creates a map representation that is quickly accessible. In addition, outliers and the robot structure are efficiently removed from the data, and the resolution of the representation can be easily ad…

parallel processingComputer scienceGraphics processing unitPoint cloudpotential fieldslaw.inventionreactive motion generationInstitut für Robotik und Mechatronik (ab 2013)Computer Science::RoboticsParallel processing (DSP implementation)lawControl systemTrajectoryRobotCartesian coordinate systemGPU 3D-Point Cloud Computationcollision avoidanceCollision avoidanceSimulation

researchProduct