Search results for "Parallel"
showing 10 items of 667 documents
GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences
2014
In this paper, we present GSWABE, a graphics processing unit GPU-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture CUDA-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a varie…
Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures
2015
In this paper we present new parallelization techniques for searching large-scale biological sequence databases with the Smith-Waterman algorithm on Xeon Phi-based neoheterogenous architectures. In order to make full use of the compute power of both the multi-core CPU and the many-core Xeon Phi hardware, we use a collaborative computing scheme as well as hybrid parallelism. At the CPU side, we employ SSE intrinsics and multi-threading to implement SIMD parallelism. At the Xeon Phi side, we use Knights Corner vector instructions to gain more data parallelism. We have presented two dynamic task distribution schemes (thread level and device level) in order to achieve better load balancing. Fur…
A distributed-memory MPI parallelization scheme for multi-domain incompressible SPH
2022
A parallel scheme for a multi-domain truly incompressible smoothed particle hydrodynamics (SPH) approach is presented. The proposed method is developed for distributed-memory architectures through the Message Passing Interface (MPI) paradigm as communication between partitions. The proposal aims to overcome one of the main drawbacks of the SPH method, which is the high computational cost with respect to mesh-based methods, by coupling a multi-resolution approach with parallel computing techniques. The multi-domain approach aims to employ different resolutions by subdividing the computational domain into non-overlapping blocks separated by block interfaces. The particles belonging to differe…
Splitting the data cache: a survey
2000
Recent cache-memory research has focused on approaches that split the first-level data cache into two independent subcaches. The authors introduce a methodology for helping cache designers devise splitting schemes and survey a representative set of the published cache schemes.
Las raíces del 'Brexit': institucionalización del euroescepticismo
2021
El Reino Unido votó abandonar la Unión Europea el 23 de junio de 2016, con un 51,9% de votos. Fue una decisión que sorprendió a todo el mundo, pese a que un análisis de la relación entre ambos actores puede denotar cierta ajenidad en el comportamiento del Estado británico respecto al organismo supranacional. Es por esto por lo que la tesis del presente estudio va a ser que la campaña del Leave basó sus argumentos en cuestiones identitarias y emocionales, para lo que reprodujeron las tradiciones históricas de la relación entre el Reino Unido y la Unión Europea. Para defenderla, se analizarán las razones esgrimidas en la campaña del Referéndum del 2016, a fin de buscar una explicación del fra…
Multi-Skill Call Center as a Grading from “Old” Telephony
2009
We explore parallels between the older telephony switches and the multi-skill call centers. The numerical results have shown that a call center with equally distributed skills is preferable compared to traditional grading-type design. The annex contains a short version of mathematical proof on limited availability schemes design for small call flow intensity *** and for large *** . The proof explores one excellent V. Benes' paper (from Bell Labs). On its own merit, the annex could initiate new mathematical research in call center area, more by now the powerful software for numerical analysis is available. Main conclusion is the following: numerical analysis of simple multi-skill call center…
Mapping of BLASTP Algorithm onto GPU Clusters
2011
Searching protein sequence database is a fundamental and often repeated task in computational biology and bioinformatics. However, the high computational cost and long runtime of many database scanning algorithms on sequential architectures heavily restrict their applications for large-scale protein databases, such as GenBank. The continuing exponential growth of sequence databases and the high rate of newly generated queries further deteriorate the situation and establish a strong requirement for time-efficient scalable database searching algorithms. In this paper, we demonstrate how GPU clusters, powered by the Compute Unified Device Architecture (CUDA), OpenMP, and MPI parallel programmi…
The Dynamical Kernel Scheduler - Part 1
2015
Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The Dynamic Kernel Scheduler (DKS) is being developed in order to provide a software layer between host application and different hardware accelerators. DKS handles the communication between the host and device, schedules task execution, and provides a library of built-in algorithms. …
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions
2022
Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly $100 \times$ 100 × faster than the management processing…
Reducing complexity in H.264/AVC motion estimation by using a GPU
2011
H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…