Search results for "Shared memory"
showing 10 items of 26 documents
A Low Cost Solution for 2D Memory Access
2006
Many of the new coding tools in the H.264/AVC video coding standard are based on 2D processing resulting in row-wise and column-wise memory accesses starting from arbitrary memory locations. This paper proposes a low cost solution for efficient realization of these 2D block memory accesses on sub-word parallel processors. It is based on the use of simple register-based data permutation networks placed between the processor and memory. The data rearrangement capabilities of the networks can further be extended with more complex control schemes. With the proposed control schemes, the networks enable row and column accesses from arbitrary memory locations for blocks of data while maintaining f…
Use of parallel computing to improve the accuracy of calculated molecular properties
1998
Calculation of electron correlation energy in molecules is unavoidable in accurate studies of chemical reactivity. However, these calculations involve, a computational effort several, even in the simplest cases, orders of magnitude larger than the computer power nowadays available. In this work the possibility of parallelize the calculations of the electron correlation energy is studied. The formalism chosen is the dressing of matrices in both distributed and shared memory parallel systems MIMD. Algorithms developed on PVM are presented, and the results are evaluated on several platforms. These results show that the parallel techniques are useful in order to decrease very appreciably the ti…
Parallel Algorithms for Listing Well-Formed Parentheses Strings
1998
We present two cost-optimal parallel algorithms generating the set of all well-formed parentheses strings of length 2n with constant delay for each generated string. In our first algorithm we generate in lexicographic order well-formed parentheses strings represented by bitstrings, and in the second one we use the representation by weight sequences. In both cases the computational model is based on an architecture CREW PRAM, where each processor performs the same algorithm simultaneously on a different set of data. Different processors can access the shared memory at the same time to read different data in the same or different memory locations, but no two processors are allowed to write i…
SYSTOLIC GENERATION OF k-ARY TREES
1999
The only parallel generating algorithms for k-ary trees are those of Akl and Stojmenović in 1996 and of Vajnovszki and Phillips in 1997. In the first of them, trees are represented by an inversion table and the processor model is a linear aray multicomputer. In the second, trees are represented by bitstrings and the algorithm executes on a shared memory multiprocessor. In this paper we give a parallel generating algorithm for k-ary trees represented by generalized P–sequences for execution on a linear array multicomputer.
Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors
2014
In this paper we analyze the energy overhead incurred when operating with data stored in different levels of the memory subsystem (cache levels and DDR chips) of current multicore architectures. Our approach builds upon servet, a portable framework for the memory characterization of multicore processors, extending this suite with a power-related test that, when applied to a platform equipped with a power measurement mechanism, provides information on the efficiency of memory energy usage. As additional contributions, i) we provide a complete experimental study of the impact that the CPU performance states (also known as P-states) exert on the memory energy efficiency of a collection of rece…
Simurgh
2021
The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updat…
Suffix Array Construction on Multi-GPU Systems
2019
Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …
Solution of time-independent Schrödinger equation by the imaginary time propagation method
2007
Numerical solution of eigenvalues and eigenvectors of large matrices originating from discretization of linear and non-linear Schrodinger equations using the imaginary time propagation (ITP) method is described. Convergence properties and accuracy of 2nd and 4th order operator-splitting methods for the ITP method are studied using numerical examples. The natural convergence of the method is further accelerated with a new dynamic time step adjustment method. The results show that the ITP method has better scaling with respect to matrix size as compared to the implicitly restarted Lanczos method. An efficient parallel implementation of the ITP method for shared memory computers is also demons…
Unified Parallel C++
2018
Abstract Although MPI is commonly used for parallel programming on distributed-memory systems, Partitioned Global Address Space (PGAS) approaches are gaining attention for programming modern multi-core CPU clusters. They feature a hybrid memory abstraction: distributed memory is viewed as a shared memory that is partitioned among nodes in order to simplify programming. In this chapter you will learn about Unified Parallel C++ (UPC++), a library-based extension of C++ that gathers the advantages of both PGAS and Object Oriented paradigms. The examples included in this chapter will help you to understand the main features of PGAS languages and how they can simplify the task of programming par…
Optimized Parallel Implementation of Face Detection based on GPU component
2015
Display Omitted An algorithm for face detection has been implemented on CPU.An acceleration of this algorithm on GPU migration.Performance of GPU implementation shows the effectiveness of this implementation.Another optimization method on GPU are operated. Face detection is an important aspect for various domains such as: biometrics, video surveillance and human computer interaction. Generally a generic face processing system includes a face detection, or recognition step, as well as tracking and rendering phase. In this paper, we develop a real-time and robust face detection implementation based on GPU component. Face detection is performed by adapting the Viola and Jones algorithm. We hav…