Search results for "Parallel"

showing 10 items of 667 documents

GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences

2014

In this paper, we present GSWABE, a graphics processing unit GPU-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture CUDA-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a varie…

Smith–Waterman algorithmSpeedupComputer Networks and CommunicationsComputer scienceSequence alignmentNeedleman–Wunsch algorithmParallel computingDNA sequencingComputer Science ApplicationsTheoretical Computer ScienceDynamic programmingCUDAComputational Theory and MathematicsSoftwareConcurrency and Computation: Practice and Experience

researchProduct

Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures

2015

In this paper we present new parallelization techniques for searching large-scale biological sequence databases with the Smith-Waterman algorithm on Xeon Phi-based neoheterogenous architectures. In order to make full use of the compute power of both the multi-core CPU and the many-core Xeon Phi hardware, we use a collaborative computing scheme as well as hybrid parallelism. At the CPU side, we employ SSE intrinsics and multi-threading to implement SIMD parallelism. At the Xeon Phi side, we use Knights Corner vector instructions to gain more data parallelism. We have presented two dynamic task distribution schemes (thread level and device level) in order to achieve better load balancing. Fur…

Smith–Waterman algorithmXeonComputer scienceData parallelismHyper-threadingSIMDParallel computingCentral processing unitComputerSystemsOrganization_PROCESSORARCHITECTURESIntrinsicsXeon Phi2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

researchProduct

A distributed-memory MPI parallelization scheme for multi-domain incompressible SPH

2022

A parallel scheme for a multi-domain truly incompressible smoothed particle hydrodynamics (SPH) approach is presented. The proposed method is developed for distributed-memory architectures through the Message Passing Interface (MPI) paradigm as communication between partitions. The proposal aims to overcome one of the main drawbacks of the SPH method, which is the high computational cost with respect to mesh-based methods, by coupling a multi-resolution approach with parallel computing techniques. The multi-domain approach aims to employ different resolutions by subdividing the computational domain into non-overlapping blocks separated by block interfaces. The particles belonging to differe…

Smoothed particle hydrodynamics (SPH)Artificial IntelligenceComputer Networks and CommunicationsHardware and ArchitectureMulti-domain approachMPIParallel distributed-memory computationLoad balancingSoftwareSettore ICAR/01 - IdraulicaTheoretical Computer ScienceJournal of Parallel and Distributed Computing

researchProduct

Splitting the data cache: a survey

2000

Recent cache-memory research has focused on approaches that split the first-level data cache into two independent subcaches. The authors introduce a methodology for helping cache designers devise splitting schemes and survey a representative set of the published cache schemes.

Snoopy cacheHardware_MEMORYSTRUCTURESDatabaseCache coloringComputer scienceGeneral EngineeringParallel computingCache pollutioncomputer.software_genreSmart CacheCache invalidationPage cacheCachecomputerCache algorithmsIEEE Concurrency

researchProduct

Las raíces del 'Brexit': institucionalización del euroescepticismo

2021

El Reino Unido votó abandonar la Unión Europea el 23 de junio de 2016, con un 51,9% de votos. Fue una decisión que sorprendió a todo el mundo, pese a que un análisis de la relación entre ambos actores puede denotar cierta ajenidad en el comportamiento del Estado británico respecto al organismo supranacional. Es por esto por lo que la tesis del presente estudio va a ser que la campaña del Leave basó sus argumentos en cuestiones identitarias y emocionales, para lo que reprodujeron las tradiciones históricas de la relación entre el Reino Unido y la Unión Europea. Para defenderla, se analizarán las razones esgrimidas en la campaña del Referéndum del 2016, a fin de buscar una explicación del fra…

Sociology and Political Science05 social sciencesGeography Planning and DevelopmentVictoryAlienationIdentity (social science)050601 international relations0506 political scienceKingdomBrexitPolitical sciencePolitical economyPolitical Science and International RelationsReferendum050602 political science & public administrationmedia_common.cataloged_instanceEuropean unionParallelsmedia_commonGeopolítica(s). Revista de estudios sobre espacio y poder

researchProduct

Multi-Skill Call Center as a Grading from “Old” Telephony

2009

We explore parallels between the older telephony switches and the multi-skill call centers. The numerical results have shown that a call center with equally distributed skills is preferable compared to traditional grading-type design. The annex contains a short version of mathematical proof on limited availability schemes design for small call flow intensity *** and for large *** . The proof explores one excellent V. Benes' paper (from Bell Labs). On its own merit, the annex could initiate new mathematical research in call center area, more by now the powerful software for numerical analysis is available. Main conclusion is the following: numerical analysis of simple multi-skill call center…

Softwarebusiness.industryComputer scienceTelephonybusinessGrading (education)Mathematical proofTelecommunicationsCall controlParallelsMathematical researchCall setup success rate

researchProduct

Mapping of BLASTP Algorithm onto GPU Clusters

2011

Searching protein sequence database is a fundamental and often repeated task in computational biology and bioinformatics. However, the high computational cost and long runtime of many database scanning algorithms on sequential architectures heavily restrict their applications for large-scale protein databases, such as GenBank. The continuing exponential growth of sequence databases and the high rate of newly generated queries further deteriorate the situation and establish a strong requirement for time-efficient scalable database searching algorithms. In this paper, we demonstrate how GPU clusters, powered by the Compute Unified Device Architecture (CUDA), OpenMP, and MPI parallel programmi…

Source codeSequence databaseComputer sciencemedia_common.quotation_subjectMessage passingParallel computingGPU clusterComputational scienceCUDATask (computing)Search algorithmGenBankScalabilityAlgorithmmedia_common2011 IEEE 17th International Conference on Parallel and Distributed Systems

researchProduct

The Dynamical Kernel Scheduler - Part 1

2015

Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The Dynamic Kernel Scheduler (DKS) is being developed in order to provide a software layer between host application and different hardware accelerators. DKS handles the communication between the host and device, schedules task execution, and provides a library of built-in algorithms. …

Speedup010308 nuclear & particles physicsComputer sciencebusiness.industryFast Fourier transformGeneral Physics and AstronomyFOS: Physical sciencesParallel computingComputational Physics (physics.comp-ph)Supercomputer01 natural sciencesCUDASoftwareKernel (image processing)Hardware and Architecture0103 physical sciencesHardware acceleration010306 general physicsbusinessPhysics - Computational PhysicsXeon Phi

researchProduct

Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions

2022

Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly $100 \times$ 100 × faster than the management processing…

SpeedupComputational Theory and MathematicsXeonHardware and ArchitectureComputer scienceComputationSignal ProcessingVectorization (mathematics)Node (circuits)Parallel computingSupercomputerForce field (chemistry)Sunway TaihuLightIEEE Transactions on Parallel and Distributed Systems

researchProduct

Reducing complexity in H.264/AVC motion estimation by using a GPU

2011

H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…

SpeedupComputational complexity theoryComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAAlgorithmic efficiency0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingContext-adaptive binary arithmetic codingData compressionContext-adaptive variable-length coding

researchProduct