Search results for "cud"

showing 10 items of 74 documents

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

2010

A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468–4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message P…

FOS: Computer and information sciencesComputer scienceMonte Carlo methodGraphics processing unitFOS: Physical sciencesGeneral Physics and AstronomyMathematical Physics (math-ph)Parallel computingGPU clusterComputational Physics (physics.comp-ph)Graphics (cs.GR)Computational scienceCUDAComputer Science - GraphicsHardware and ArchitectureIsing modelCentral processing unitGeneral-purpose computing on graphics processing unitsMassively parallelPhysics - Computational PhysicsMathematical Physics
researchProduct

Real-time computation of parameter fitting and image reconstruction using graphical processing units

2016

Abstract In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μ SR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission T…

FOS: Computer and information sciencesMulti-core processorSpeedup010308 nuclear & particles physicsComputer scienceComputationFOS: Physical sciencesGeneral Physics and AstronomyIterative reconstructionComputational Physics (physics.comp-ph)Supercomputer01 natural sciences030218 nuclear medicine & medical imagingComputational science03 medical and health sciencesRange (mathematics)CUDA0302 clinical medicineComputer Science - Distributed Parallel and Cluster ComputingHardware and Architecture0103 physical sciencesSingle-coreDistributed Parallel and Cluster Computing (cs.DC)Physics - Computational PhysicsComputer Physics Communications
researchProduct

WarpCore: A Library for fast Hash Tables on GPUs

2020

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore -- a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines ent…

FOS: Computer and information sciencesScheme (programming language)Amortized analysisComputer scienceHash functionParallel computingData structureHash tableCUDAComputer Science - Distributed Parallel and Cluster ComputingServerDistributed Parallel and Cluster Computing (cs.DC)Throughput (business)computercomputer.programming_language2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)
researchProduct

GPU-laskennan optimointi

2013

Näytönohjaimet, grafiikkasuorittimet, tarjoavat rinnakkaisen laskennan alustan, jossa voidaan suorittaa ohjelmakoodia satojen ydinten toimesta. Tämä alusta mahdollistaa matemaattisesti työläiden ongelmien ratkaisemisen tehokkaasti. Grafiikkasuorittimen rinnakkainen suoritusympäristö kuitenkin eroaa suuresti tietokoneen suorittimen peräkkäisestä suoritusympäristöstä. Ongelmien ratkaisemiseksi tehokkaasti rinnakkaisympäristössä on noudettava ohjelmointimenetelmiä, jotka soveltuvat erityisesti rinnakkaisympäristöön. Tässä työssä tarkastellaan rinnakkaisen laskennan perusteita, miten erilaiset ohjelmointimenetelmät vaikuttavat ohjelman suoriutumiseen grafiikkasuorittimella sekä miten voidaan sa…

Graphics processing unitnäytönohjaimetoptimointinäytönohjainparallel computingGPUrinnakkainen laskentaGrafiikkasuoritinCUDAohjelmointioptimization
researchProduct

Pilgrimages of the Polish Gentry to Holy Places in the 17th and the 18th Centuries

2015

Pielgrzymki szlachty polskiej do miejsc świętych w XVII i XVIII wieku(streszczenie) W wiekach XVII i XVIII szlachta polska i litewska chętnie wyruszała w podróże. W przypadku podróży zagranicznych o charakterze religijnym największym zainteresowaniem cieszyły się wyprawy do Ziemi Świętej, grobu św. Jakuba Starszego w Santiago de Compostela i do Rzymu. W ich trakcie odwiedzano sanktuaria przechowujące relikwie świętych oraz sławne z cudów za sprawą Matki Boskiej lub świętych na terenie dzisiejszych Czech, Austrii, Bawarii czy północnych Włoch. Popularnymi celami pielgrzymek były też ośrodki pątnicze na terenie Rzeczypospolitej, których liczba wzrosła w XVIII wieku do około 150. Znaczenie mia…

Historypielgrzymkimedia_common.quotation_subjectPolska i litewska szlachta w XVII i XVIII w.lcsh:DJK1-77Art historylcsh:DAW1001-1051Artlcsh:History of Eastern Europecudowne wizerunki maryjne i świętychGentryHumanitiessanktuaria katolickiekult maryjnymedia_commonlcsh:History of Central EuropeBiuletyn Polskiej Misji Historycznej
researchProduct

LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs

2015

Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…

Instruction setCUDASpeedupComputer scienceSparse matrix-vector multiplicationDouble-precision floating-point formatParallel computingGeneral-purpose computing on graphics processing unitsRowSparse matrix2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
researchProduct

Real Time Stereo Matching Using Two Step Zero-Mean SAD and Dynamic Programing

2018

Dense depth map extraction is a dynamic research field in a computer vision that tries to recover three-dimensional information from a stereo image pair. A large variety of algorithms has been developed. The local methods based on block matching that are prevalent due to the linear computational complexity and easy implementation. This local cost is used on global methods as graph cut and dynamic programming in order to reduce sensitivity to local to occlusion and uniform texture. This paper proposes a new method for matching images based on a two-stage of block matching as local cost function and dynamic programming as energy optimization approach. In our work introduce the two stage of th…

Matching (statistics)Computational complexity theory010308 nuclear & particles physicsComputer scienceGraphics hardware02 engineering and technology01 natural sciencesDynamic programmingCUDASum of absolute differencesDepth mapComputer Science::Computer Vision and Pattern RecognitionCut0103 physical sciences0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingAlgorithm2018 15th International Multi-Conference on Systems, Signals & Devices (SSD)
researchProduct

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions

2013

Background The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. Results We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU …

Methodology ArticleGPUCUDASoftware_PROGRAMMINGTECHNIQUESBiochemistryComputer Science ApplicationsSmith-WatermanConcurrent executionSequence Analysis ProteinPTX SIMD instructionsDatabases ProteinMolecular BiologySequence AlignmentAlgorithmsSoftwareBMC Bioinformatics
researchProduct

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
researchProduct

GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

2009

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a curren…

Numerical AnalysisMulti-core processorPhysics and Astronomy (miscellaneous)Computer scienceApplied MathematicsMonte Carlo methodGraphics processing unitSquare-lattice Ising modelComputer Science ApplicationsComputational scienceComputational MathematicsCUDAModeling and SimulationIsing modelStatistical physicsGeneral-purpose computing on graphics processing unitsLattice model (physics)Journal of Computational Physics
researchProduct