Search results for "Parallel computing"
showing 10 items of 189 documents
A Methodology for the Analysis of Memory Response to Radiation through Bitmap Superposition and Slicing
2015
A methodology is proposed for the statistical analysis of memory radiation test data, with the aim of identifying trends in the single-even upset (SEU) distribution. The treated case study is a 65nm SRAM irradiated with neutrons, protons and heavy-ions.
Numerical experiments with a parallel fast direct elliptic solver on Cray T3E
1997
A parallel fast direct O(N log N) solver is shortly described for linear systems with separable block tridiagonal matrices. A good parallel scalability of the proposed method is demonstrated on a Cray T3E parallel computer using MPI in communication. Also, the sequential performance is compared with the well-known BLKTRI-implementation of the generalized. cyclic reduction method using a single processor of Cray T3E.
Parallelization strategies for density matrix renormalization group algorithms on shared-memory systems
2003
Shared-memory parallelization (SMP) strategies for density matrix renormalization group (DMRG) algorithms enable the treatment of complex systems in solid state physics. We present two different approaches by which parallelization of the standard DMRG algorithm can be accomplished in an efficient way. The methods are illustrated with DMRG calculations of the two-dimensional Hubbard model and the one-dimensional Holstein-Hubbard model on contemporary SMP architectures. The parallelized code shows good scalability up to at least eight processors and allows us to solve problems which exceed the capability of sequential DMRG calculations.
Parallelization of a Lattice Boltzmann Suspension Flow Solver
2002
We have applied a parallel Lattice Boltzmann method to solve the behaviour of the suspension flow. The complex behaviour of the suspension flow cannot be solved by analytical methods, so simulations are the only way to study it. Usually the size of an interesting problem is so big that calculation time on one processor is too long, and this can be solved by parallel program. We have written a parallel suspension flow solver and tested it on massive parallel computers. The measured performance of our program show that the parallelization of suspension particles was successful. We also show that over one million particles can be simulated.
Cell-List based Molecular Dynamics on Many-Core Processors: A Case Study on Sunway TaihuLight Supercomputer
2020
Molecular dynamics (MD) simulations are playing an increasingly important role in several research areas. The most frequently used potentials in MD simulations are pair-wise potentials. Due to the memory wall, computing pair-wise potentials on many-core processors are usually memory bounded. In this paper, we take the SW26010 processor as an exemplary platform to explore the possibility to break the memory bottleneck by improving data reusage via cell-list-based methods. We use cell-lists instead of neighbor-lists in the potential computation, and apply a number of novel optimization methods. Theses methods include: an adaptive replica arrangement strategy, a parameter profile data structur…
Pairwise DNA Sequence Alignment Optimization
2015
This chapter presents a parallel implementation of the Smith-Waterman algorithm to accelerate the pairwise alignment of DNA sequences. This algorithm is especially computationally demanding for long DNA sequences. Parallelization approaches are examined in order to deeply explore the inherent parallelism within Intel Xeon Phi coprocessors. This chapter looks at exploiting instruction-level parallelism within 512-bit single instruction multiple data instructions (vectorization) as well as thread-level parallelism over the many cores (multithreading using OpenMP). Between coprocessors, device-level parallelism through the compute power of clusters including Intel Xeon Phi coprocessors using M…
Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices
2016
With growing demands in real-time control, classification or prediction, algorithms become more complex while low power and small size devices are required. Matrix multiplication (direct or transpose) is common for such computation algorithms. In numerous algorithms, it is also required to perform matrix multiplication repeatedly, where the result of a multiplication is further multiplied again. This work describes a versatile computation procedure and architecture: one of the matrices is stored in internal memory in its circulant form, then, a sequence of direct or transpose multiplications can be performed without timing penalty. The architecture proposes a RAM-ALU block for each matrix c…
Etude numérique d'équations aux dérivées partielles non linéaires et dispersives
2011
Numerical analysis becomes a powerful resource in the study of partial differential equations (PDEs), allowing to illustrate existing theorems and find conjectures. By using sophisticated methods, questions which seem inaccessible before, like rapid oscillations or blow-up of solutions can be addressed in an approached way. Rapid oscillations in solutions are observed in dispersive PDEs without dissipation where solutions of the corresponding PDEs without dispersion present shocks. To solve numerically these oscillations, the use of efficient methods without using artificial numerical dissipation is necessary, in particular in the study of PDEs in some dimensions, done in this work. As stud…
Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems
2014
The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations…
Randomized renaming in shared memory systems.
2021
Abstract Renaming is a task in distributed computing where n processes are assigned new names from a name space of size m . The problem is called tight if m = n , and loose if m > n . In recent years renaming came to the fore again and new algorithms were developed. For tight renaming in asynchronous shared memory systems, Alistarh et al. describe a construction based on the AKS network that assigns all names within O ( log n ) steps per process. They also show that, depending on the size of the name space, loose renaming can be done considerably faster. For m = ( 1 + ϵ ) ⋅ n and constant ϵ , they achieve a step complexity of O ( log log n ) . In this paper we consider tight as well as loos…