Search results for "Parallel computing"
showing 10 items of 189 documents
Multi-objective DSE algorithms' evaluations on processor optimization
2013
Very complex micro-architectures, like complex superscalar/SMT or multicore systems, have lots of configurations. Exploring this huge design space and trying to optimize multiple objectives, like performance, power consumption and hardware complexity is a real challenge. In this paper, using the multi-objective design space exploration tool FADSE, we tried to optimize the hardware parameters of the complex superscalar Grid ALU Processor. We compared how different heuristic algorithms handle the DSE optimization. Three of these algorithms are taken from the jMetal library (NSGAII, SPEA2 and SMPSO) while the other two, CNSGAII and MOHC were implemented by us. We show that in this huge design …
A predictive function optimization algorithm for multi-spectral skin lesion assessment
2015
The newly introduced Kubelka-Munk Genetic Algorithm (KMGA) is a promising technique used in the assessment of skin lesions. Unfortunately, this method is computationally expensive due to its function inverting process. In the work of this paper, we design a Predictive Function Optimization Algorithm in order to improve the efficiency of KMGA by speeding up its convergence rate. Using this approach, a High-Convergence-Rate KMGA (HCR-KMGA) is implemented onto multi-core processors and FPGA devices respectively. Furthermore, the implementations are optimized using parallel computing techniques. Intensive experiments demonstrate that HCR-KMGA can effectively accelerate KMGA method, while improv…
Comparison of parallel implementation of some multi-level Schwarz methods for singularly perturbed parabolic problems
1999
Abstract Parallel multi-level algorithms combining a time discretization and an overlapping domain decomposition technique are applied to the numerical solution of singularly perturbed parabolic problems. Two methods based on the Schwarz alternating procedure are considered: a two-level method with auxiliary “correcting” subproblems as well as a three-level method with auxiliary “predicting” and “correcting” subproblems. Moreover, modifications of the methods using time extrapolation on subdomain interfaces are investigated. The emphasis is given to the description of the algorithms as well as their computer realization on a distributed memory multiprocessor computer. Numerical experiments …
Parallelization of the Wolff single-cluster algorithm.
2010
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times …
Lattice quantum hadrodynamics on a CRAY Y-MP
1992
Quantum corrections to the mean-field equation of state for nuclear matter are estimated in a lattice simulation of quantum hadrodynamics on a CRAY Y-MP. In contrast with lattice quantum chromodynamics, where coordinate space methods are the standard, the calculations are carried out in momentum space and on nonhypercubic (irregular) lattices. The quantum corrections to the known, mean-field equation of state were found to be considerable. The time frame of the project and the large computational needs of the program required the use of powerful supercomputers, like the CRAY Y-MP, which are capable of performing at a very high computing speed by using both vector and parallel hardware, the …
Exact Response Time Analysis of Hierarchical Fixed-Priority Scheduling
2009
Hierarchical scheduling has recently been used to provide temporal isolation to embedded virtualised systems. Response time analysis is a common way to derive a schedulability test for these systems. This paper points out that response time analysis for hierarchical fixed-priority scheduling found in the literature is only exact for tasks of the highest priority domain. For the rest of the tasks is an upper bound. In our work, we provide the exact analysis and we compare it with previously published works.
Accelerating H.264 inter prediction in a GPU by using CUDA
2010
H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of General Purpose Graphics Processing Units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the H.264/AVC inter prediction algorithm into a GPU using Compute Unified Device Architecture (CUDA). The results show a negligible rate distortion drop with a time reduction on average up to 93.6%.
Optimizing H.264/AVC interprediction on a GPU-based framework
2011
H.264/MPEG-4 part 10 is the latest standard for video compression and promises a significant advance in terms of quality and distortion compared with the commercial standards currently most in use such as MPEG-2 or MPEG-4. To achieve this better performance, H.264 adopts a large number of new/improved compression techniques compared with previous standards, albeit at the expense of higher computational complexity. In addition, in recent years new hardware accelerators have emerged, such as graphics processing units (GPUs), which provide a new opportunity to reduce complexity for a large variety of algorithms. However, current GPUs suffer from higher power consumption requirements because of…
An implicitly parallel EDA based on restricted boltzmann machines
2014
We present a parallel version of RBM-EDA. RBM-EDA is an Estimation of Distribution Algorithm (EDA) that models dependencies between decision variables using a Restricted Boltzmann Machine (RBM). In contrast to other EDAs, RBM-EDA mainly uses matrix-matrix multiplications for model estimation and sampling. Hence, for implementation, standard libraries for linear algebra can be used. This allows an easy parallelization and leads to a high utilization of parallel architectures. The probabilistic model of the parallel version and the version on a single core are identical. We explore the speedups gained from running RBM-EDA on a Graphics Processing Unit. For problems of bounded difficulty like …
Scalable Dense Factorizations for Heterogeneous Computational Clusters
2008
This paper discusses the design and the implementation of the LU factorization routines included in the Heterogeneous ScaLAPACK library, which is built on top of ScaLAPACK. These routines are used in the factorization and solution of a dense system of linear equations. They are implemented using optimized PBLAS, BLACS and BLAS libraries for heterogeneous computational clusters. We present the details of the implementation as well as performance results on a heterogeneous computing cluster.