Search results for "cud"

showing 10 items of 74 documents

SAUCE: A web application for interactive teaching and learning of parallel programming

2017

Abstract Prevalent hardware trends towards parallel architectures and algorithms create a growing demand for graduate students familiar with the programming of concurrent software. However, learning parallel programming is challenging due to complex communication and memory access patterns as well as the avoidance of common pitfalls such as dead-locks and race conditions. Hence, the learning process has to be supported by adequate software solutions in order to enable future computer scientists and engineers to write robust and efficient code. This paper discusses a selection of well-known parallel algorithms based on C++11 threads, OpenMP, MPI, and CUDA that can be interactively embedded i…

Computer Networks and Communicationsbusiness.industryComputer scienceProgramming languageWhite-box testingParallel algorithmProcess (computing)020206 networking & telecommunications02 engineering and technologyParallel computingThread (computing)computer.software_genreTheoretical Computer ScienceCUDASoftwareArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineeringCode (cryptography)Web application020201 artificial intelligence & image processingbusinesscomputerSoftwareJournal of Parallel and Distributed Computing

researchProduct

High Precision Conservative Surface Mesh Generation for Swept Volumes

2015

We present a novel, efficient, and flexible scheme to generate a high-quality mesh that approximates the outer boundary of a swept volume. Our approach comes with two guarantees. First, the approximation is conservative, i.e., the swept volume is enclosed by the generated mesh. Second, the one-sided Hausdorff distance of the generated mesh to the swept volume is upper bounded by a user defined tolerance. Exploiting this tolerance the algorithm generates a mesh that is adapted to the local complexity of the swept volume boundary, keeping the overall output complexity remarkably low. The algorithm is two-phased: the actual sweep and the mesh generation. In the sweeping phase, we introduce a g…

Computer scienceBoundary (topology)Parallel computingUpper and lower boundsComputational scienceCUDAHausdorff distanceEngine displacementControl and Systems EngineeringMesh generationBounded functionElectrical and Electronic EngineeringRuppert's algorithmComputingMethodologies_COMPUTERGRAPHICSIEEE Transactions on Automation Science and Engineering

researchProduct

Massively parallel computation of atmospheric neutrino oscillations on CUDA-enabled accelerators

2019

Abstract The computation of neutrino flavor transition amplitudes through inhomogeneous matter is a time-consuming step and thus could benefit from optimization and parallelization. Next to reliable parameter estimation of intrinsic physical quantities such as neutrino masses and mixing angles, these transition amplitudes are important in hypothesis testing of potential extensions of the standard model of elementary particle physics, such as additional neutrino flavors. Hence, fast yet precise implementations are of high importance to research. In the recent past, massively parallel accelerators such as CUDA-enabled GPUs featuring thousands of compute units have been widely adopted due to t…

Computer scienceComputationGeneral Physics and AstronomyMemory bandwidth01 natural sciences010305 fluids & plasmasStandard ModelComputational scienceCUDAHardware and Architecture0103 physical sciencesNeutrino010306 general physicsNeutrino oscillationMassively parallelPhysical quantityComputer Physics Communications

researchProduct

GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation

2020

Atomistic simulations of large biomolecular systems with chemical variability such as constant pH dynamic protonation offer multiple challenges in high performance computing. One of them is the correct treatment of the involved electrostatics in an efficient and highly scalable way. Here we review and assess two of the main building blocks that will permit such simulations: (1) An electrostatics library based on the Fast Multipole Method (FMM) that treats local alternative charge distributions with minimal overhead, and (2) A $λ$-dynamics module working in tandem with the FMM that enables various types of chemical transitions during the simulation. Our $λ$-dynamics and FMM implementations d…

Computer scienceFast multipole method05 social sciencesFast Fourier transform050301 educationSupercomputerElectrostaticsbiomolekyylitComputational scienceMolecular dynamicsCUDAsähköstatiikkaParticle MeshScalabilityOverhead (computing)simulointi0501 psychology and cognitive sciencesSIMD0503 education050104 developmental & child psychology

researchProduct

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

2012

Abstract Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an effcient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse matrices. We compared SCOO performance to existing formats of the NVIDIA Cusp library. Our resutls on a Fermi GPU show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices. Furthermore, comparison to a Sandy-Bridge CPU sho…

Computer scienceSparse matrix-vector multiplicationCUDAParallel computingMatrix (mathematics)CUDAFactor (programming language)SpMVGeneral Earth and Planetary SciencesMultiplicationcomputerFermiGeneral Environmental Sciencecomputer.programming_languageSparse matrixProcedia Computer Science

researchProduct

GPU-Based Occlusion Minimisation for Optimal Placement of Multiple 3D Cameras

2020

This paper presents a fast GPU-based solution to the 3D occlusion detection problem and the 3D camera placement optimisation problem. Occlusion detection is incorporated into the optimisation problem to return near-optimal positions for 3D cameras in environments containing occluding objects, which maximises the volume that is visible to the cameras. In addition, the authors’ previous work on 3D sensor placement optimisation is extended to include a model for a pyramid-shaped viewing frustum and to take the camera’s pose into account when computing the optimal position.

Computer sciencebusiness.industry010401 analytical chemistryComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION020207 software engineering02 engineering and technology01 natural sciencesMinimisation (clinical trials)0104 chemical sciencesCUDAViewing frustumOcclusion0202 electrical engineering electronic engineering information engineeringComputer visionArtificial intelligencebusinessComputingMethodologies_COMPUTERGRAPHICS2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA)

researchProduct

Three-dimensional Fuzzy Kernel Regression framework for registration of medical volume data

2013

Abstract In this work a general framework for non-rigid 3D medical image registration is presented. It relies on two pattern recognition techniques: kernel regression and fuzzy c-means clustering. The paper provides theoretic explanation, details the framework, and illustrates its application to implement three registration algorithms for CT/MR volumes as well as single 2D slices. The first two algorithms are landmark-based approaches, while the third one is an area-based technique. The last approach is based on iterative hierarchical volume subdivision, and maximization of mutual information. Moreover, a high performance Nvidia CUDA based implementation of the algorithm is presented. The f…

Computer sciencebusiness.industryImage registrationMutual informationMachine learningcomputer.software_genreFuzzy logicCUDANon-rigid registration Fuzzy regression Mutual information Interpolation GPU computingArtificial IntelligenceSignal ProcessingPattern recognition (psychology)Kernel regressionComputer Vision and Pattern RecognitionArtificial intelligenceData miningGeneral-purpose computing on graphics processing unitsCluster analysisbusinesscomputerSoftwareInterpolationPattern Recognition

researchProduct

Connected-component identification and cluster update on graphics processing units.

2011

Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as compared to serial codes on CPUs for the case of local and thus naturally parallelized problems such as single-spin flip update simulations of spin models, the situation is considerably more complicated for the nonlocal problem of cluster or connected component identification. I discuss the suitability of different approaches…

Connected componentCUDAIdentification (information)Cluster labelingCluster (physics)Image processingGraphicsComputational scienceNetwork analysisPhysical review. E, Statistical, nonlinear, and soft matter physics

researchProduct

Figli di zingari. pratiche di accudimento tra povertà e discriminazione

2014

Donne Rom pratiche di cura e accudimento dei figli antiziganismo marginalità sociale.Settore M-DEA/01 - Discipline Demoetnoantropologiche

researchProduct

CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data

2014

Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive …

Euclidean distanceCUDADynamic time warpingData stream miningComputer scienceAnomaly detectionParallel computingCluster analysisTime complexityDistance measures2014 43rd International Conference on Parallel Processing

researchProduct