Search results for "GPU"

showing 10 items of 43 documents

On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method

2018

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point perfor…

Tridiagonal linear systemsProgramvaruteknikComputer Networks and CommunicationsComputer sciencePartial solution techniquereduction010103 numerical & computational mathematicsParallel computingtietotekniikka01 natural scienceslineaariset mallitTheoretical Computer ScienceSeparable spaceinformation technologyArtificial IntelligenceSeparable block tridiagonal linear systemBlock (telecommunications)Fast direct solverRadix0101 mathematicsta113Computer Sciencesta111Linear systemSoftware EngineeringGPU computingSolverComputer Science::Numerical Analysis010101 applied mathematicsPSCR methodDatavetenskap (datalogi)partial solution techniqueHardware and ArchitectureComputer Science::Mathematical Softwarepienennyslinear modelsSoftwareRoofline modelCyclic reductionJournal of Parallel and Distributed Computing

researchProduct

Fast Poisson solvers for graphics processing units

2013

Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3- fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU…

Tridiagonal matrixOpenCLComputer scienceparallel computingScalar (mathematics)Linear systemSyklinen reductionGPGPUGPUDouble-precision floating-point formatParallel computingSolverPoisson distributionPSCRComputational sciencefast Poisson solversymbols.namesakenopea Poisson-ratkaisijanäytönohjainsymbolsComputer Science::Mathematical SoftwareCyclic reductionGraphicsrinnakkaislaskentaCyclic reduction

researchProduct

Generic heuristics on GPU to superpixel segmentation and application to optical flow estimation

2020

Finding clusters in point clouds and matching graphs to graphs are recurrent tasks in computer science domain, data analysis, image processing, that are most often modeled as NP-hard optimization problems. With the development and accessibility of cheap multiprocessors, acceleration of the heuristic procedures for these tasks becomes possible and necessary. We propose parallel implantation on GPU (graphics processing unit) system for some generic algorithms applied here to image superpixel segmentation and image optical flow problem. The aim is to provide generic algorithms based on standard decentralized data structures to be easy to improve and customized on many optimization problems and…

[INFO.INFO-OH] Computer Science [cs]/Other [cs.OH]MstImage segmentationAlgorithme mémétiqueOptical flowSegmentation d’image[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH]GpuK-MeansMemetic algorithmFlot optique

researchProduct

On GPU-accelerated fast direct solvers and their applications in image denoising

2015

block cyclic reductionnäytönohjaimetOpenCLnumeeriset menetelmätprosessoritimage denoisingparallel computingmean curvatureGPU computingkuvankäsittelyimage processingfast Poisson solverseparable block tridiagonal linear systemPSCR methodoptimointialgoritmitohjelmointiaugmented Lagrangian methodkohinafast direct solverrinnakkaislaskentaalternating direction methods of multipliers

researchProduct

Real-time data processing in the ALICE High Level Trigger at the LHC

2019

At the Large Hadron Collider at CERN in Geneva, Switzerland, atomic nuclei are collided at ultra-relativistic energies. Many final-state particles are produced in each collision and their properties are measured by the ALICE detector. The detector signals induced by the produced particles are digitized leading to data rates that are in excess of 48 GB/$s$. The ALICE High Level Trigger (HLT) system pioneered the use of FPGA- and GPU-based algorithms to reconstruct charged-particle trajectories and reduce the data size in real time. The results of the reconstruction of the collision events, available online, are used for high level data quality and detector-performance monitoring and real-tim…

calibration ; ALICE ; trigger ; monitoring ; quality ; data management ; programming ; FPGA ; multiprocessor: graphics ; performancePhysics - Instrumentation and DetectorsHigh level triggerPhysics::Instrumentation and DetectorsLevel datatutkimuslaitteetFPGA; GPUDetector calibrationGPUFOS: Physical sciencesGeneral Physics and AstronomyhiukkasfysiikkaPhysics and Astronomy(all)01 natural sciencesprogramming010305 fluids & plasmasCombinatoricsALICE0103 physical sciencesmultiprocessor: graphics[INFO]Computer Science [cs][PHYS.PHYS.PHYS-INS-DET]Physics [physics]/Physics [physics]/Instrumentation and Detectors [physics.ins-det]Detectors and Experimental Techniques010306 general physicsNuclear Experimentphysics.ins-detFPGAcomputer.programming_languagePhysicsLarge Hadron ColliderFPGA; GPU; TRACKsignaalinkäsittelyInstrumentation and Detectors (physics.ins-det)triggercalibrationmonitoringdatailmaisimetqualityHardware and ArchitectureTRACKHigh Energy Physics::Experimentdata managementAlice (programming language)computerperformance

researchProduct

GPU accelerated Monte Carlo simulations of lattice spin models

2011

We consider Monte Carlo simulations of classical spin models of statistical mechanics using the massively parallel architecture provided by graphics processing units (GPUs). We discuss simulations of models with discrete and continuous variables, and using an array of algorithms ranging from single-spin flip Metropolis updates over cluster algorithms to multicanonical and Wang-Landau techniques to judge the scope and limitations of GPU accelerated computation in this field. For most simulations discussed, we find significant speed-ups by two to three orders of magnitude as compared to single-threaded CPU implementations.

cluster algorithmsStatistical Mechanics (cond-mat.stat-mech)Computer scienceComputationNumerical analysisspin modelsMonte Carlo methodHigh Energy Physics - Lattice (hep-lat)FOS: Physical sciencesStatistical mechanicsGPU computingPhysics and Astronomy(all)Computational Physics (physics.comp-ph)generalized-ensemble simulationsMonte Carlo simulationsComputational scienceCUDAHigh Energy Physics - LatticeSpin modelGeneral-purpose computing on graphics processing unitsGraphicsPhysics - Computational PhysicsCondensed Matter - Statistical Mechanics

researchProduct

CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware

2011

Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as wel…

graphics hardwareSource codeComputer sciencemedia_common.quotation_subjectGraphics hardwareGraphics processing unitParallel computingGeneral Purpose Computation on Graphics Processing Unit (GPGPU)Computational scienceInstruction setCUDAGeneticsComputer GraphicsDatabases Proteinmedia_commondynamic programmingFinite-state machineSequence databaseApplied MathematicsProteinsCompute Unified Device Architecture (CUDA)sequence alignmentGeneral-purpose computing on graphics processing unitsAlgorithmsSoftwareBiotechnology

researchProduct

An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia

2020

The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes…

interpolation.General Computer Scienceparallel filtersComputer scienceGPUGpuGraphics processing unitLatency (audio)Parametric model02 engineering and technologycomputer.software_genre030507 speech-language pathology & audiology03 medical and health sciencesSoftware portabilityHRTF modeling0202 electrical engineering electronic engineering information engineeringGeneral Materials ScienceMultimediaparametric modelGeneral EngineeringTeleconferenceBinaural synthesis020206 networking & telecommunicationsVideo processingEnergy consumptioninterpolationInterpolationHrtf modelingScalabilityParallel filtersElectrónicaAugmented realitylcsh:Electrical engineering. Electronics. Nuclear engineering0305 other medical sciencelcsh:TK1-9971Mobile devicecomputerIEEE Access

researchProduct

Collision Avoidance with Potential Fields Based on Parallel Processing of 3D-Point Cloud Data on the GPU

2014

In this paper we present an experimental study on real-time collision avoidance with potential fields that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential fields serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control loop. The GPU algorithm creates a map representation that is quickly accessible. In addition, outliers and the robot structure are efficiently removed from the data, and the resolution of the representation can be easily ad…

parallel processingComputer scienceGraphics processing unitPoint cloudpotential fieldslaw.inventionreactive motion generationInstitut für Robotik und Mechatronik (ab 2013)Computer Science::RoboticsParallel processing (DSP implementation)lawControl systemTrajectoryRobotCartesian coordinate systemGPU 3D-Point Cloud Computationcollision avoidanceCollision avoidanceSimulation

researchProduct

Perfect Hashing Structures for Parallel Similarity Searches

2015

International audience; Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementa- tion. We propose a 2-stage index structure which is based on neighborhood indexing and perfect hashing techniques. This structure performs a filtering phase over the neighborhood regions around the seeds in constant time and avoid as much as possible random memory accesses and branch divergences. Moreover, it fits particularly well on parallel SIMD processors, because it requ…

researchProduct