Search results for "graphics processing units"

showing 10 items of 21 documents

Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects

2016

We present a real-time system capable of segmenting multiple 3D objects and tracking their pose using a single RGB camera, based on prior shape knowledge. The proposed method uses twist-coordinates for pose parametrization and a pixel-wise second-order optimization approach which lead to major improvements in terms of tracking robustness, especially in cases of fast motion and scale changes, compared to previous region-based approaches. Our implementation runs at about 50–100 Hz on a commodity laptop when tracking a single object without relying on GPGPU computations. We compare our method to the current state of the art in various experiments involving challenging motion sequences and diff…

Monocularbusiness.industryComputer scienceComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION020207 software engineering02 engineering and technologyRobustness (computer science)0202 electrical engineering electronic engineering information engineeringRGB color model020201 artificial intelligence & image processingComputer visionSegmentationArtificial intelligenceGeneral-purpose computing on graphics processing unitsbusinessPose
researchProduct

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures

2013

The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds. However, improving the scalability of crowd simulation systems by exploiting the inherent parallelism of these architectures is still an open issue. In this paper, we propose different parallelization strategies for the collision check procedure that takes place in agent-based simulations. These strategies are designed for exploiting the parallelism in both multi-core and many-core architectures like graphic processing units (GPUs). As for the many-core implementations, we analyse the bottlenecks of a previous G…

Multi-core processorSpeedupComputer scienceParallel computingCollisionTheoretical Computer ScienceRendering (computer graphics)CrowdsHardware and ArchitectureScalabilityCollision detectionCrowd simulationGeneral-purpose computing on graphics processing unitsSoftwareThe International Journal of High Performance Computing Applications
researchProduct

AnyDSL: a partial evaluation framework for programming high-performance libraries

2023

This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system exposes code generation for accelerators (vectorization/parallelization for CPUs and GPUs) via compiler-known higher-order functions that can be subjected to partial evaluation. This way, generic implementations can be instantiated with target-specific code at compile time. In our experimental evaluation we present three extensive case studies from image processing, ray tracing, and genome sequence alignment. We demonstrate that using partial …

Intermediate languageComputer science020207 software engineeringImage processing02 engineering and technologyParallel computingPartial evaluation004020204 information systems0202 electrical engineering electronic engineering information engineeringCode generationRay tracing (graphics)General-purpose computing on graphics processing unitsSafety Risk Reliability and QualityImplementationSoftwareCompile time
researchProduct

Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems

2015

This is a post-peer-review, pre-copyedit version of an article published in IEEE - ACM Transactions on Computational Biology and Bioinformatics. The final authenticated version is available online at: http://dx.doi.org/10.1109/TCBB.2015.2389958 [Abstract] High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively lon…

Computer scienceBioinformaticsDNA Mutational AnalysisGenome-wide association studyParallel computingPolymorphism Single NucleotideSensitivity and SpecificityComputational biologyComputer GraphicsGeneticsComputer architectureField-programmable gate arrayRandom access memoryApplied MathematicsChromosome MappingHigh-Throughput Nucleotide SequencingReproducibility of ResultsField programmable gate arraysEpistasis GeneticSignal Processing Computer-AssistedEquipment DesignRandom access memoryComputing systemsReconfigurable computingEquipment Failure AnalysisTask (computing)EpistasisHost (network)Graphics processing unitsGenome-Wide Association StudyBiotechnology
researchProduct

Accelerating H.264 inter prediction in a GPU by using CUDA

2010

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of General Purpose Graphics Processing Units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the H.264/AVC inter prediction algorithm into a GPU using Compute Unified Device Architecture (CUDA). The results show a negligible rate distortion drop with a time reduction on average up to 93.6%.

Reduction (complexity)CUDACoprocessorComputer scienceImage processingParallel computingGeneral-purpose computing on graphics processing unitsGraphicsData compression2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE)
researchProduct

Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units

2021

We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the numerical information. Our approach is multi-platform, in the sense that the realizations for (general-purpose) multicore processors as well as graphics accelerators (GPUs) are built upon common principles, but differ in the implementation details, which are adapted to avoid thread divergence in the GPU case or maximize compression element-wise (i.e., for each matrix entry) for multicore architectures. Our evaluation on the two last generations of NVIDIA GPUs as well as In…

workload balancingMulti-core processorComputer Networks and CommunicationsComputer sciencesparse matrix-vector productParallel computingLoad balancing (computing)coordinate sparse matrix formatSparse matrix vectorcompressionExascale computingComputer Science ApplicationsTheoretical Computer ScienceComputational Theory and MathematicsCompression (functional analysis)Product (mathematics)Graphicsgraphics processing units (GPUs)multicoreprocessors (CPUs)Software
researchProduct

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

2016

This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…

0301 basic medicineCoprocessorComputer science0206 medical engineeringAccelerationData modelsSymmetric multiprocessor systemComputational modeling02 engineering and technologyParallel computingSupercomputer03 medical and health sciencesTask (computing)030104 developmental biologyCoprocessorsComputational Theory and MathematicsHardware and ArchitectureSignal ProcessingGeneticsPairwise comparisonComputer architectureGraphics processing units020602 bioinformaticsXeon Phi
researchProduct

SIMULATING SPIN MODELS ON GPU: A TOUR

2012

The use of graphics processing units (GPUs) in scientific computing has gathered considerable momentum in the past five years. While GPUs in general promise high performance and excellent performance per Watt ratios, not every class of problems is equally well suitable for exploiting the massively parallel architecture they provide. Lattice spin models appear to be prototypic examples of problems suitable for this architecture, at least as long as local update algorithms are employed. In this review, I summarize our recent experience with the simulation of a wide range of spin models on GPU employing an equally wide range of update algorithms, ranging from Metropolis and heat bath updates,…

Heat bathComputer scienceMonte Carlo methodGeneral Physics and AstronomyStatistical and Nonlinear PhysicsMassively parallel architectureRangingParallel computingComputer Science ApplicationsComputational Theory and MathematicsGeneral-purpose computing on graphics processing unitsGraphicsArchitectureMathematical PhysicsPerformance per wattInternational Journal of Modern Physics C
researchProduct

GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

2009

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a curren…

Numerical AnalysisMulti-core processorPhysics and Astronomy (miscellaneous)Computer scienceApplied MathematicsMonte Carlo methodGraphics processing unitSquare-lattice Ising modelComputer Science ApplicationsComputational scienceComputational MathematicsCUDAModeling and SimulationIsing modelStatistical physicsGeneral-purpose computing on graphics processing unitsLattice model (physics)Journal of Computational Physics
researchProduct

Architecture-Driven Level Set Optimization: From Clustering to Sub-pixel Image Segmentation

2016

Thanks to their effectiveness, active contour models (ACMs) are of great interest for computer vision scientists. The level set methods (LSMs) refer to the class of geometric active contours. Comparing with the other ACMs, in addition to subpixel accuracy, it has the intrinsic ability to automatically handle topological changes. Nevertheless, the LSMs are computationally expensive. A solution for their time consumption problem can be hardware acceleration using some massively parallel devices such as graphics processing units (GPUs). But the question is: which accuracy can we reach while still maintaining an adequate algorithm to massively parallel architecture? In this paper, we attempt to…

Level set methodComputer science0211 other engineering and technologiesInitialization02 engineering and technology[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingLevel setgraphics processing units0202 electrical engineering electronic engineering information engineeringLevel set methodComputer visionElectrical and Electronic EngineeringCluster analysisMassively parallelimage segmentation021101 geological & geomatics engineeringActive contour modelhybrid CPU-GPU architecturebusiness.industryImage segmentationSubpixel renderingComputer Science ApplicationsHuman-Computer InteractionControl and Systems EngineeringHardware acceleration020201 artificial intelligence & image processingArtificial intelligencebusiness[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processingSoftwareInformation Systems
researchProduct