Search results for "Speedup"

showing 10 items of 97 documents

Design Space Exploration of Parallel Embedded Architectures for Native Clifford Algebra Operations

2012

In the past few decades, Geometric or Clifford algebra (CA) has received a growing attention in many research fields, such as robotics, machine vision and computer graphics, as a natural and intuitive way to model geometric objects and their transformations. At the same time, the high dimensionality of Clifford algebra and its computational complexity demand specialized hardware architectures for the direct support of Clifford data types and operators. This paper presents the design space exploration of parallel embedded architectures for native execution of four-dimensional (4D) and five-dimensional (5D) Clifford algebra operations. The design space exploration has been described along wit…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSpeedupTheoretical computer scienceComputer sciencebusiness.industryDesign space explorationMachine visionClifford algebraClifford algebra Computational geometry Embedded coprocessors Application-specific processors Design space exploration FPGA-based prototypingRoboticsComputer graphicsSoftwareHardware and ArchitectureComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATIONArtificial intelligenceElectrical and Electronic EngineeringVariety (universal algebra)businessSoftwareIEEE Design & Test of Computers

researchProduct

Optimisation des requêtes de similarité dans les espaces métriques répondant aux besoins des usagers

2012

The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (Rq) and the k-Nearest Neighbor (kNNq) queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the par…

researchProduct

Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences

2014

In this paper, we present a GPU-accelerated Smith-Waterman (SW) algorithm with Alignment Backtracking, called GSWAB, for short DNA sequences. This algorithm performs all-to-all pairwise alignments and retrieves optimal local alignments on CUDA-enabled GPUs. To facilitate fast alignment backtracking, we have investigated a tile-based SW implementation using the CUDA programming model. This tiled computing pattern enables us to more deeply explore the powerful compute capability of GPUs. We have evaluated the performance of GSWAB on a Kepler-based GeForce GTX Titan graphics card. The results show that GSWAB can achieve a performance of up to 56.8 GCUPS on large-scale datasets. Furthermore, ou…

Smith–Waterman algorithmCUDATitan (supercomputer)SpeedupComputer scienceBacktrackingParallel computingSoftware_PROGRAMMINGTECHNIQUESGraphicsDNA sequencingComputingMethodologies_COMPUTERGRAPHICS

researchProduct

SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

2014

The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, mul…

Smith–Waterman algorithmFOS: Computer and information sciencesMulti-core processorCoprocessorSpeedupSequence databaseComputer scienceParallel computingIntrinsicsComputer Science - Distributed Parallel and Cluster ComputingScalabilitySIMDDistributed Parallel and Cluster Computing (cs.DC)Xeon Phi

researchProduct

GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences

2014

In this paper, we present GSWABE, a graphics processing unit GPU-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture CUDA-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a varie…

Smith–Waterman algorithmSpeedupComputer Networks and CommunicationsComputer scienceSequence alignmentNeedleman–Wunsch algorithmParallel computingDNA sequencingComputer Science ApplicationsTheoretical Computer ScienceDynamic programmingCUDAComputational Theory and MathematicsSoftwareConcurrency and Computation: Practice and Experience

researchProduct

Reconstruction of Low Energy Neutrino Events with GPUs at IceCube

2020

IceCube is a cubic kilometer neutrino observatory located at the South Pole that produces massive amounts of data by measuring individual Cherenkov photons from neutrino interaction events in the energy range from few GeV to several PeV. The actual reconstruction of neutrino events in the GeV range is computationally challenging due to the scarcity of data produced by single events. This can lead to run times of several weeks for the state-of-the-art reconstruction method – Pegleg – on CPUs for typical workloads of many ten-thousand events. We propose a GPU version of Pegleg that probes the likelihood space with several hypotheses in parallel while adapting the amount of parallel sampled hy…

Speedup010308 nuclear & particles physicsComputer scienceAstrophysics::High Energy Astrophysical PhenomenaComputation01 natural sciencesComputational scienceTitan (supercomputer)Observatory0103 physical sciencesRange (statistics)Neutrino010306 general physicsNeutrino oscillationCherenkov radiation

researchProduct

The Dynamical Kernel Scheduler - Part 1

2015

Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The Dynamic Kernel Scheduler (DKS) is being developed in order to provide a software layer between host application and different hardware accelerators. DKS handles the communication between the host and device, schedules task execution, and provides a library of built-in algorithms. …

Speedup010308 nuclear & particles physicsComputer sciencebusiness.industryFast Fourier transformGeneral Physics and AstronomyFOS: Physical sciencesParallel computingComputational Physics (physics.comp-ph)Supercomputer01 natural sciencesCUDASoftwareKernel (image processing)Hardware and Architecture0103 physical sciencesHardware acceleration010306 general physicsbusinessPhysics - Computational PhysicsXeon Phi

researchProduct

Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions

2022

Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly $100 \times$ 100 × faster than the management processing…

SpeedupComputational Theory and MathematicsXeonHardware and ArchitectureComputer scienceComputationSignal ProcessingVectorization (mathematics)Node (circuits)Parallel computingSupercomputerForce field (chemistry)Sunway TaihuLightIEEE Transactions on Parallel and Distributed Systems

researchProduct

Reducing complexity in H.264/AVC motion estimation by using a GPU

2011

H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…

SpeedupComputational complexity theoryComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAAlgorithmic efficiency0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingContext-adaptive binary arithmetic codingData compressionContext-adaptive variable-length coding

researchProduct

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

2013

We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…

SpeedupComputer Networks and CommunicationsComputer scienceSparse matrix-vector multiplicationParallel computingComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceMatrix (mathematics)CUDAArtificial IntelligenceHardware and ArchitectureBenchmark (computing)MultiplicationGeneral-purpose computing on graphics processing unitsSoftwareSparse matrixParallel Computing

researchProduct