Search results for "CUDA"

showing 10 items of 56 documents

Real Time Stereo Matching Using Two Step Zero-Mean SAD and Dynamic Programing

2018

Dense depth map extraction is a dynamic research field in a computer vision that tries to recover three-dimensional information from a stereo image pair. A large variety of algorithms has been developed. The local methods based on block matching that are prevalent due to the linear computational complexity and easy implementation. This local cost is used on global methods as graph cut and dynamic programming in order to reduce sensitivity to local to occlusion and uniform texture. This paper proposes a new method for matching images based on a two-stage of block matching as local cost function and dynamic programming as energy optimization approach. In our work introduce the two stage of th…

Matching (statistics)Computational complexity theory010308 nuclear & particles physicsComputer scienceGraphics hardware02 engineering and technology01 natural sciencesDynamic programmingCUDASum of absolute differencesDepth mapComputer Science::Computer Vision and Pattern RecognitionCut0103 physical sciences0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingAlgorithm2018 15th International Multi-Conference on Systems, Signals & Devices (SSD)

researchProduct

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions

2013

Background The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. Results We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU …

Methodology ArticleGPUCUDASoftware_PROGRAMMINGTECHNIQUESBiochemistryComputer Science ApplicationsSmith-WatermanConcurrent executionSequence Analysis ProteinPTX SIMD instructionsDatabases ProteinMolecular BiologySequence AlignmentAlgorithmsSoftwareBMC Bioinformatics

researchProduct

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

researchProduct

GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

2009

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a curren…

Numerical AnalysisMulti-core processorPhysics and Astronomy (miscellaneous)Computer scienceApplied MathematicsMonte Carlo methodGraphics processing unitSquare-lattice Ising modelComputer Science ApplicationsComputational scienceComputational MathematicsCUDAModeling and SimulationIsing modelStatistical physicsGeneral-purpose computing on graphics processing unitsLattice model (physics)Journal of Computational Physics

researchProduct

Fourth Workshop on using Emerging Parallel Architectures

2012

AbstractThe Fourth Workshop on Using Emerging Parallel Architectures (WEPA), held in conjunction with ICCS 2012, provides a forum for exploring the capabilities of emerging parallel architectures such as GPUs, FPGAs, Cell B.E., Intel M.I.C. and multicores to accelerate computational science applications.

OpenCLGPGPUHeterogeneous Multi-coresReconfigurable ComputingHigh Performance ComputingGeneral Earth and Planetary SciencesCUDAComputational ScienceParallel Computer ArchitecturesGeneral Environmental ScienceProcedia Computer Science

researchProduct

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

2014

This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-09873-9_57 [Abstract] High-throughput genotyping technologies allow the collection of up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as 2-SNP interactions, in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. In this work we present EpistSearch, a parallelized tool that, following the log-linear model appr…

POSIX ThreadsMulti-core processorBioinformaticsComputer scienceComputationCUDAParallel computingBioinformaticsPthreadsCUDAAccelerationComputingMethodologies_PATTERNRECOGNITIONTitan (supercomputer)Filter (video)EpistasisGWASEpistasis

researchProduct

Optimized Parallel Implementation of Face Detection based on GPU component

2015

Display Omitted An algorithm for face detection has been implemented on CPU.An acceleration of this algorithm on GPU migration.Performance of GPU implementation shows the effectiveness of this implementation.Another optimization method on GPU are operated. Face detection is an important aspect for various domains such as: biometrics, video surveillance and human computer interaction. Generally a generic face processing system includes a face detection, or recognition step, as well as tracking and rendering phase. In this paper, we develop a real-time and robust face detection implementation based on GPU component. Face detection is performed by adapting the Viola and Jones algorithm. We hav…

Parallel computingBiometricsComputer Networks and CommunicationsComputer science02 engineering and technologyParallel computing[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingFace detectionRendering (computer graphics)CUDACUDA optimizationArtificial Intelligence0202 electrical engineering electronic engineering information engineeringGraphics processorsAdaBoost[ INFO.INFO-ES ] Computer Science [cs]/Embedded SystemsGraphicsWaldBoostFace detectionComputingMilieux_MISCELLANEOUS[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processingViola and Jones algorithmAdaBoostGrid020202 computer hardware & architectureShared memoryHardware and Architecture020201 artificial intelligence & image processing[INFO.INFO-ES]Computer Science [cs]/Embedded Systems[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processingSoftware

researchProduct

Accelerating H.264 inter prediction in a GPU by using CUDA

2010

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of General Purpose Graphics Processing Units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the H.264/AVC inter prediction algorithm into a GPU using Compute Unified Device Architecture (CUDA). The results show a negligible rate distortion drop with a time reduction on average up to 93.6%.

Reduction (complexity)CUDACoprocessorComputer scienceImage processingParallel computingGeneral-purpose computing on graphics processing unitsGraphicsData compression2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE)

researchProduct

Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model

2015

[Abstract] Detecting epistasis, such as 2-SNP interactions, in genome-wide association studies (GWAS) is an important but time consuming operation. Consequently, GPUs have already been used to accelerate these studies, reducing the runtime for moderately-sized datasets to less than 1 hour. However, single-GPU approaches cannot perform large-scale GWAS in reasonable time. In this work we present multiEpistSearch, a tool to detect epistasis that works on GPU clusters. While CUDA is used for parallelization within each GPU, the workload distribution among GPUs is performed with Unified Parallel C++ (UPC++), a novel extension of C++ that follows the Partitioned Global Address Space (PGAS) model…

Scale (ratio)BioinformaticsComputer sciencePGASGPUCUDAGenome-wide association studyParallel computingGPU clusterSoftware_PROGRAMMINGTECHNIQUESTheoretical Computer ScienceComputational scienceCUDAHardware and ArchitectureUnified Parallel CProgramming paradigmPartitioned global address spacecomputerUPC++Softwarecomputer.programming_languageThe International Journal of High Performance Computing Applications

researchProduct

Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences

2014

In this paper, we present a GPU-accelerated Smith-Waterman (SW) algorithm with Alignment Backtracking, called GSWAB, for short DNA sequences. This algorithm performs all-to-all pairwise alignments and retrieves optimal local alignments on CUDA-enabled GPUs. To facilitate fast alignment backtracking, we have investigated a tile-based SW implementation using the CUDA programming model. This tiled computing pattern enables us to more deeply explore the powerful compute capability of GPUs. We have evaluated the performance of GSWAB on a Kepler-based GeForce GTX Titan graphics card. The results show that GSWAB can achieve a performance of up to 56.8 GCUPS on large-scale datasets. Furthermore, ou…

Smith–Waterman algorithmCUDATitan (supercomputer)SpeedupComputer scienceBacktrackingParallel computingSoftware_PROGRAMMINGTECHNIQUESGraphicsDNA sequencingComputingMethodologies_COMPUTERGRAPHICS

researchProduct