Search results for "parallel computing"
showing 10 items of 189 documents
A Low Cost Solution for 2D Memory Access
2006
Many of the new coding tools in the H.264/AVC video coding standard are based on 2D processing resulting in row-wise and column-wise memory accesses starting from arbitrary memory locations. This paper proposes a low cost solution for efficient realization of these 2D block memory accesses on sub-word parallel processors. It is based on the use of simple register-based data permutation networks placed between the processor and memory. The data rearrangement capabilities of the networks can further be extended with more complex control schemes. With the proposed control schemes, the networks enable row and column accesses from arbitrary memory locations for blocks of data while maintaining f…
A mixed geometric-systolic approach to parallel molecular dynamics simulations
1995
We have developed a flexible and efficient method of performing molecular dynamics simulations on distributed memory parallel computers. The novel feature is to use simultaneously spatial partitioning and systolic loop approaches according to a strategy which, for a given simulation, adapts itself to the multiprocessor system, allowing to approach optimal performance. The method assures high efficiencies even in situations in which, due to the exceeding large number of processors, the usage of a pure spatial decomposition would be impossible. The algorithm provides as particular cases both the pure spatial partitioning and the pure systolic parallelization schemes, so that its adoption assu…
LARGE-SCALE SIMULATIONS IN CONDENSED MATTER PHYSICS —THE NEED FOR A TERAFLOP COMPUTER
1992
The introduction of vector processors {“supercomputers” with a performance in the range of 109 floating point operations (1 GFLOP) per second} has had an enormous impact on computational condensed matter physics. The possibility of a substantially enhanced performance by massively parallel processors (“teraflop” machines with 1012 floating point operations per second) will allow satisfactory treatment of a large range of important scientific problems which have to a great extent thus far escaped numerical resolution. The present paper describes only a few examples (out of a long list of interesting research problems!) for which the availability of “teraflops” will allow spectacular progres…
Accelerating bioinformatics applications via emerging parallel computing systems [Guest editorial]
2015
The papers in this issue focus on advanced parallel computing systems for bioinformatics applications. This papers provide a forum to publish recent advances in the improvement of handling bioinformatics problems on emerging parallel computing systems. These systems can be characterized by exploiting different types of parallelism, including fine-grained versus coarse-grained and thread-level parallelism versus datalevel parallelism versus request-level parallelism. Hence, parallel computing systems based on multi- and many-core CPUs, many-core GPUs, vector processors, or FPGAs offer the promise to massively accelerate many bioinformatics algorithms and applications, ranging from computeint…
Use of parallel computing to improve the accuracy of calculated molecular properties
1998
Calculation of electron correlation energy in molecules is unavoidable in accurate studies of chemical reactivity. However, these calculations involve, a computational effort several, even in the simplest cases, orders of magnitude larger than the computer power nowadays available. In this work the possibility of parallelize the calculations of the electron correlation energy is studied. The formalism chosen is the dressing of matrices in both distributed and shared memory parallel systems MIMD. Algorithms developed on PVM are presented, and the results are evaluated on several platforms. These results show that the parallel techniques are useful in order to decrease very appreciably the ti…
A prospect for computing in porous materials research: Very large fluid flow simulations
2016
Abstract Properties of porous materials, abundant both in nature and industry, have broad influences on societies via, e.g. oil recovery, erosion, and propagation of pollutants. The internal structure of many porous materials involves multiple scales which hinders research on the relation between structure and transport properties: typically laboratory experiments cannot distinguish contributions from individual scales while computer simulations cannot capture multiple scales due to limited capabilities. Thus the question arises how large domain sizes can in fact be simulated with modern computers. This question is here addressed using a realistic test case; it is demonstrated that current …
On the Use of GPU for Accelerating Communication-Aware Mapping Techniques
2015
Different communication-aware mapping techniques were proposed in recent years for improving the performance of distributed systems based on both, off-chip and on-chip networks. Some of these proposals were based on heuristic search for finding pseudo-optimal assignments of tasks and processing elements. However, the technology integration improvements have allowed a significant increase in the number of network nodes, requiring the acceleration of the heuristic search. In this paper, we propose a comparative study of the local search method used in a communication-aware mapping technique, when implemented on different parallel architectures. We compare the performance provided by a version…
GPU-laskennan optimointi
2013
Näytönohjaimet, grafiikkasuorittimet, tarjoavat rinnakkaisen laskennan alustan, jossa voidaan suorittaa ohjelmakoodia satojen ydinten toimesta. Tämä alusta mahdollistaa matemaattisesti työläiden ongelmien ratkaisemisen tehokkaasti. Grafiikkasuorittimen rinnakkainen suoritusympäristö kuitenkin eroaa suuresti tietokoneen suorittimen peräkkäisestä suoritusympäristöstä. Ongelmien ratkaisemiseksi tehokkaasti rinnakkaisympäristössä on noudettava ohjelmointimenetelmiä, jotka soveltuvat erityisesti rinnakkaisympäristöön. Tässä työssä tarkastellaan rinnakkaisen laskennan perusteita, miten erilaiset ohjelmointimenetelmät vaikuttavat ohjelman suoriutumiseen grafiikkasuorittimella sekä miten voidaan sa…
A novel hardware accelerator for the HEVC intra prediction
2015
International audience; A novel hardware accelerator for the High Efficiency Video Coding (HEVC) intra prediction is presented in this paper in order to reduce the computation complexity within this standard and to accelerate the concerned calculations. We propose a new pipelined structure that we called Processing Element (PE) to execute all angular modes, and we repeat it in five paths that our architecture composed of. We present also another structure to carry out the Planar mode. This architecture supports all intra prediction modes for all prediction unit sizes. The synthesis results show that our design can run at 213 MHz for Xilinx Virtex 6 and is capable to process real time 120 10…
A HARDWARE SOLUTION FOR HEVC INTRA PREDICTION LOSSLESS CODING
2015
International audience; The lossless coding mode of the High Efficiency Video Coding (HEVC) main profile that bypasses transform, quantization, and in-loop filters is described. Compared to the HEVC non-lossless coding mode, the HEVC lossless coding mode provides perfect fidelity and an average bit-rate reduction of 3.2%–13.2%. It also significantly outperforms the existing lossless compression solutions, such as JPEG2000 and JPEG-LS for images as well as WinRAR for data archiving. A fully parallel-based solution is presented in this paper in order to reduce processing time and computation complexity resulting from intra prediction. Two higher performance structures are designed to perform …