Search results for "CUDA"

showing 10 items of 56 documents

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

2012

Abstract Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an effcient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU sho…

Computer scienceSparse matrix-vector multiplicationCUDAParallel computingMatrix (mathematics)CUDAFactor (programming language)SpMVGeneral Earth and Planetary SciencesMultiplicationcomputerFermiGeneral Environmental Sciencecomputer.programming_languageSparse matrixProcedia Computer Science

researchProduct

GPU-Based Occlusion Minimisation for Optimal Placement of Multiple 3D Cameras

2020

This paper presents a fast GPU-based solution to the 3D occlusion detection problem and the 3D camera placement optimisation problem. Occlusion detection is incorporated into the optimisation problem to return near-optimal positions for 3D cameras in environments containing occluding objects, which maximises the volume that is visible to the cameras. In addition, the authors’ previous work on 3D sensor placement optimisation is extended to include a model for a pyramid-shaped viewing frustum and to take the camera’s pose into account when computing the optimal position.

Computer sciencebusiness.industry010401 analytical chemistryComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION020207 software engineering02 engineering and technology01 natural sciencesMinimisation (clinical trials)0104 chemical sciencesCUDAViewing frustumOcclusion0202 electrical engineering electronic engineering information engineeringComputer visionArtificial intelligencebusinessComputingMethodologies_COMPUTERGRAPHICS2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA)

researchProduct

Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data

2013

Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…

Computer sciencebusiness.industryImage registrationMutual informationMachine learningcomputer.software_genreFuzzy logicCUDANon-rigid registration Fuzzy regression Mutual information Interpolation GPU computingArtificial IntelligenceSignal ProcessingPattern recognition (psychology)Kernel regressionComputer Vision and Pattern RecognitionArtificial intelligenceData miningGeneral-purpose computing on graphics processing unitsCluster analysisbusinesscomputerSoftwareInterpolationPattern Recognition

researchProduct

Connected-component identification and cluster update on graphics processing units.

2011

Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as compared to serial codes on CPUs for the case of local and thus naturally parallelized problems such as single-spin flip update simulations of spin models, the situation is considerably more complicated for the nonlocal problem of cluster or connected component identification. I discuss the suitability of different approaches…

Connected componentCUDAIdentification (information)Cluster labelingCluster (physics)Image processingGraphicsComputational scienceNetwork analysisPhysical review. E, Statistical, nonlinear, and soft matter physics

researchProduct

CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data

2014

Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive …

Euclidean distanceCUDADynamic time warpingData stream miningComputer scienceAnomaly detectionParallel computingCluster analysisTime complexityDistance measures2014 43rd International Conference on Parallel Processing

researchProduct

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

2010

A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…

FOS: Computer and information sciencesComputer scienceMonte Carlo methodGraphics processing unitFOS: Physical sciencesGeneral Physics and AstronomyMathematical Physics (math-ph)Parallel computingGPU clusterComputational Physics (physics.comp-ph)Graphics (cs.GR)Computational scienceCUDAComputer Science - GraphicsHardware and ArchitectureIsing modelCentral processing unitGeneral-purpose computing on graphics processing unitsMassively parallelPhysics - Computational PhysicsMathematical Physics

researchProduct

Real-time computation of parameter fitting and image reconstruction using graphical processing units

2016

Abstract In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μ SR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission T…

FOS: Computer and information sciencesMulti-core processorSpeedup010308 nuclear & particles physicsComputer scienceComputationFOS: Physical sciencesGeneral Physics and AstronomyIterative reconstructionComputational Physics (physics.comp-ph)Supercomputer01 natural sciences030218 nuclear medicine & medical imagingComputational science03 medical and health sciencesRange (mathematics)CUDA0302 clinical medicineComputer Science - Distributed Parallel and Cluster ComputingHardware and Architecture0103 physical sciencesSingle-coreDistributed Parallel and Cluster Computing (cs.DC)Physics - Computational PhysicsComputer Physics Communications

researchProduct

WarpCore: A Library for fast Hash Tables on GPUs

2020

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…

FOS: Computer and information sciencesScheme (programming language)Amortized analysisComputer scienceHash functionParallel computingData structureHash tableCUDAComputer Science - Distributed Parallel and Cluster ComputingServerDistributed Parallel and Cluster Computing (cs.DC)Throughput (business)computercomputer.programming_language2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

researchProduct

GPU-laskennan optimointi

2013

Näytönohjaimet, grafiikkasuorittimet, tarjoavat rinnakkaisen laskennan alustan, jossa voidaan suorittaa ohjelmakoodia satojen ydinten toimesta. Tämä alusta mahdollistaa matemaattisesti työläiden ongelmien ratkaisemisen tehokkaasti. Grafiikkasuorittimen rinnakkainen suoritusympäristö kuitenkin eroaa suuresti tietokoneen suorittimen peräkkäisestä suoritusympäristöstä. Ongelmien ratkaisemiseksi tehokkaasti rinnakkaisympäristössä on noudettava ohjelmointimenetelmiä, jotka soveltuvat erityisesti rinnakkaisympäristöön. Tässä työssä tarkastellaan rinnakkaisen laskennan perusteita, miten erilaiset ohjelmointimenetelmät vaikuttavat ohjelman suoriutumiseen grafiikkasuorittimella sekä miten voidaan sa…

Graphics processing unitnäytönohjaimetoptimointinäytönohjainparallel computingGPUrinnakkainen laskentaGrafiikkasuoritinCUDAohjelmointioptimization

researchProduct

LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs

2015

Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…

Instruction setCUDASpeedupComputer scienceSparse matrix-vector multiplicationDouble-precision floating-point formatParallel computingGeneral-purpose computing on graphics processing unitsRowSparse matrix2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

researchProduct