Search results for "CUDA"

showing 10 items of 56 documents

GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences

2014

In this paper, we present GSWABE, a graphics processing unit GPU-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture CUDA-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a varie…

Smith–Waterman algorithmSpeedupComputer Networks and CommunicationsComputer scienceSequence alignmentNeedleman–Wunsch algorithmParallel computingDNA sequencingComputer Science ApplicationsTheoretical Computer ScienceDynamic programmingCUDAComputational Theory and MathematicsSoftwareConcurrency and Computation: Practice and Experience

researchProduct

Mapping of BLASTP Algorithm onto GPU Clusters

2011

Searching protein sequence database is a fundamental and often repeated task in computational biology and bioinformatics. However, the high computational cost and long runtime of many database scanning algorithms on sequential architectures heavily restrict their applications for large-scale protein databases, such as GenBank. The continuing exponential growth of sequence databases and the high rate of newly generated queries further deteriorate the situation and establish a strong requirement for time-efficient scalable database searching algorithms. In this paper, we demonstrate how GPU clusters, powered by the Compute Unified Device Architecture (CUDA), OpenMP, and MPI parallel programmi…

Source codeSequence databaseComputer sciencemedia_common.quotation_subjectMessage passingParallel computingGPU clusterComputational scienceCUDATask (computing)Search algorithmGenBankScalabilityAlgorithmmedia_common2011 IEEE 17th International Conference on Parallel and Distributed Systems

researchProduct

The Dynamical Kernel Scheduler - Part 1

2015

Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The Dynamic Kernel Scheduler (DKS) is being developed in order to provide a software layer between host application and different hardware accelerators. DKS handles the communication between the host and device, schedules task execution, and provides a library of built-in algorithms. …

Speedup010308 nuclear & particles physicsComputer sciencebusiness.industryFast Fourier transformGeneral Physics and AstronomyFOS: Physical sciencesParallel computingComputational Physics (physics.comp-ph)Supercomputer01 natural sciencesCUDASoftwareKernel (image processing)Hardware and Architecture0103 physical sciencesHardware acceleration010306 general physicsbusinessPhysics - Computational PhysicsXeon Phi

researchProduct

Reducing complexity in H.264/AVC motion estimation by using a GPU

2011

H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…

SpeedupComputational complexity theoryComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAAlgorithmic efficiency0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingContext-adaptive binary arithmetic codingData compressionContext-adaptive variable-length coding

researchProduct

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing

researchProduct

cuBool: Bit-Parallel Boolean Matrix Factorization on CUDA-Enabled Accelerators

2018

Boolean Matrix Factorization (BMF) is a commonly used technique in the field of unsupervised data analytics. The goal is to decompose a ground truth matrix C into a product of two matrices A and $B$ being either an exact or approximate rank k factorization of C. Both exact and approximate factorization are time-consuming tasks due to their combinatorial complexity. In this paper, we introduce a massively parallel implementation of BMF - namely cuBool - in order to significantly speed up factorization of huge Boolean matrices. Our approach is based on alternately adjusting rows and columns of A and B using thousands of lightweight CUDA threads. The massively parallel manipulation of entries …

SpeedupRank (linear algebra)Computer science02 engineering and technologyParallel computingMatrix decompositionCUDAMatrix (mathematics)Factorization020204 information systemsSingular value decomposition0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMassively parallelInteger (computer science)2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)

researchProduct

Performance potential for simulating spin models on GPU

2012

Graphics processing units (GPUs) are recently being used to an increasing degree for general computational purposes. This development is motivated by their theoretical peak performance, which significantly exceeds that of broadly available CPUs. For practical purposes, however, it is far from clear how much of this theoretical performance can be realized in actual scientific applications. As is discussed here for the case of studying classical spin models of statistical mechanics by Monte Carlo simulations, only an explicit tailoring of the involved algorithms to the specific architecture under consideration allows to harvest the computational power of GPU systems. A number of examples, ran…

Spin glassPhysics and Astronomy (miscellaneous)Computer scienceMonte Carlo methodFOS: Physical sciencesComputational scienceCUDAHigh Energy Physics - LatticeStatistical physicsGraphicsCondensed Matter - Statistical MechanicsNumerical AnalysisStatistical Mechanics (cond-mat.stat-mech)Applied MathematicsHigh Energy Physics - Lattice (hep-lat)RangingStatistical mechanicsDisordered Systems and Neural Networks (cond-mat.dis-nn)Condensed Matter - Disordered Systems and Neural NetworksComputational Physics (physics.comp-ph)Computer Science ApplicationsComputational MathematicsModeling and SimulationIsing modelParallel temperingPhysics - Computational Physics

researchProduct

Simulating spin models on GPU

2010

Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating…

Statistical Mechanics (cond-mat.stat-mech)Computer scienceHigh Energy Physics - Lattice (hep-lat)Monte Carlo methodFOS: Physical sciencesGeneral Physics and AstronomyParallel computingComputational Physics (physics.comp-ph)Power (physics)CUDAHigh Energy Physics - LatticeParallel processing (DSP implementation)Hardware and ArchitectureParallelism (grammar)Ising modelGraphicsPhysics - Computational PhysicsVideo gameCondensed Matter - Statistical MechanicsComputer Physics Communications

researchProduct

Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data

2012

Abstract Motivation: The imperfect sequence data produced by next-generation sequencing technologies have motivated the development of a number of short-read error correctors in recent years. The majority of methods focus on the correction of substitution errors, which are the dominant error source in data produced by Illumina sequencing technology. Existing tools either score high in terms of recall or precision but not consistently high in terms of both measures. Results: In this article, we present Musket, an efficient multistage k-mer-based corrector for Illumina short-read data. We use the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-s…

Statistics and ProbabilityComputer sciencebusiness.industrySequence assemblySequence Analysis DNAMusketBiochemistryComputer Science ApplicationsComputational MathematicsCUDASoftwareComputational Theory and Mathematicsk-merEscherichia coliChromosomes HumanHumansbusinessFocus (optics)Molecular BiologyAlgorithmAlgorithmsGenome BacterialSoftwareIllumina dye sequencingBioinformatics

researchProduct

GPU-accelerated exhaustive search for third-order epistatic interactions in case–control studies

2015

This is a post-peer-review, pre-copyedit version of an article published in Journal of Computational Science. The final authenticated version is available online at: https://doi.org/10.1016/j.jocs.2015.04.001 [Abstract] Interest in discovering combinations of genetic markers from case–control studies, such as Genome Wide Association Studies (GWAS), that are strongly associated to diseases has increased in recent years. Detecting epistasis, i.e. interactions among k markers (k ≥ 2), is an important but time consuming operation since statistical computations have to be performed for each k-tuple of measured markers. Efficient exhaustive methods have been proposed for k = 2, but exhaustive thi…

Theoretical computer scienceSource codeGeneral Computer ScienceComputer scienceComputationmedia_common.quotation_subjectGPUBrute-force searchCUDAMutual informationcomputer.software_genreTheoretical Computer ScienceMutual informationCUDAModeling and SimulationEpistasisGWASNode (circuits)Data miningTupleHeuristicscomputermedia_commonJournal of Computational Science

researchProduct