Search results for "Parallel computing"

showing 10 items of 189 documents

Multi-objective DSE algorithms' evaluations on processor optimization

2013

Very complex micro-architectures, like complex superscalar/SMT or multicore systems, have lots of configurations. Exploring this huge design space and trying to optimize multiple objectives, like performance, power consumption and hardware complexity is a real challenge. In this paper, using the multi-objective design space exploration tool FADSE, we tried to optimize the hardware parameters of the complex superscalar Grid ALU Processor. We compared how different heuristic algorithms handle the DSE optimization. Three of these algorithms are taken from the jMetal library (NSGAII, SPEA2 and SMPSO) while the other two, CNSGAII and MOHC were implemented by us. We show that in this huge design …

Power consumptionComputer scienceHeuristic (computer science)Design space explorationFeature extractionProcess (computing)Feature selectionParallel computingGridDesign spaceAlgorithm2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing (ICCP)

researchProduct

A predictive function optimization algorithm for multi-spectral skin lesion assessment

2015

The newly introduced Kubelka-Munk Genetic Algorithm (KMGA) is a promising technique used in the assessment of skin lesions. Unfortunately, this method is computationally expensive due to its function inverting process. In the work of this paper, we design a Predictive Function Optimization Algorithm in order to improve the efficiency of KMGA by speeding up its convergence rate. Using this approach, a High-Convergence-Rate KMGA (HCR-KMGA) is implemented onto multi-core processors and FPGA devices respectively. Furthermore, the implementations are optimized using parallel computing techniques. Intensive experiments demonstrate that HCR-KMGA can effectively accelerate KMGA method, while improv…

Predictive functionRate of convergenceOptimization algorithmComputer scienceGenetic algorithmProcess (computing)Function (mathematics)Parallel computingField-programmable gate arraySkin lesionAlgorithm2015 23rd European Signal Processing Conference (EUSIPCO)

researchProduct

Comparison of parallel implementation of some multi-level Schwarz methods for singularly perturbed parabolic problems

1999

Abstract Parallel multi-level algorithms combining a time discretization and an overlapping domain decomposition technique are applied to the numerical solution of singularly perturbed parabolic problems. Two methods based on the Schwarz alternating procedure are considered: a two-level method with auxiliary “correcting” subproblems as well as a three-level method with auxiliary “predicting” and “correcting” subproblems. Moreover, modifications of the methods using time extrapolation on subdomain interfaces are investigated. The emphasis is given to the description of the algorithms as well as their computer realization on a distributed memory multiprocessor computer. Numerical experiments …

Predictor–corrector methodParallel computingSingular perturbationPartial differential equationDiscretizationApplied MathematicsMathematical analysisExtrapolationMathematicsofComputing_NUMERICALANALYSISDomain decomposition methodsComputational MathematicsMulti-level Schwarz methodApplied mathematicsSingularly perturbed parabolic problemDistributed memorySchwarz alternating methodMathematicsJournal of Computational and Applied Mathematics

researchProduct

Parallelization of the Wolff single-cluster algorithm.

2010

A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times …

Pseudorandom number generatorSpeedupShufflingIterative methodSpin modelIsing modelMultiprocessingParallel computingSerial codeAlgorithmMathematicsPhysical review. E, Statistical, nonlinear, and soft matter physics

researchProduct

Lattice quantum hadrodynamics on a CRAY Y-MP

1992

Quantum corrections to the mean-field equation of state for nuclear matter are estimated in a lattice simulation of quantum hadrodynamics on a CRAY Y-MP. In contrast with lattice quantum chromodynamics, where coordinate space methods are the standard, the calculations are carried out in momentum space and on nonhypercubic (irregular) lattices. The quantum corrections to the known, mean-field equation of state were found to be considerable. The time frame of the project and the large computational needs of the program required the use of powerful supercomputers, like the CRAY Y-MP, which are capable of performing at a very high computing speed by using both vector and parallel hardware, the …

Quantum chromodynamicsEquation of stateComputer scienceNumerical analysisMonte Carlo methodPosition and momentum spaceParallel computingNuclear matterSupercomputerTheoretical Computer ScienceComputational scienceHardware and ArchitectureQuantum hadrodynamicsLinear algebraCoordinate spaceQuantumSoftwareInformation SystemsThe Journal of Supercomputing

researchProduct

Exact Response Time Analysis of Hierarchical Fixed-Priority Scheduling

2009

Hierarchical scheduling has recently been used to provide temporal isolation to embedded virtualised systems. Response time analysis is a common way to derive a schedulability test for these systems. This paper points out that response time analysis for hierarchical fixed-priority scheduling found in the literature is only exact for tasks of the highest priority domain. For the rest of the tasks is an upper bound. In our work, we provide the exact analysis and we compare it with previously published works.

Rate-monotonic schedulingTheoretical computer scienceComputer scienceServerResponse timeDynamic priority schedulingParallel computingTemporal isolationUpper and lower boundsFair-share schedulingScheduling (computing)2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications

researchProduct

Accelerating H.264 inter prediction in a GPU by using CUDA

2010

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of General Purpose Graphics Processing Units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the H.264/AVC inter prediction algorithm into a GPU using Compute Unified Device Architecture (CUDA). The results show a negligible rate distortion drop with a time reduction on average up to 93.6%.

Reduction (complexity)CUDACoprocessorComputer scienceImage processingParallel computingGeneral-purpose computing on graphics processing unitsGraphicsData compression2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE)

researchProduct

Optimizing H.264/AVC interprediction on a GPU-based framework

2011

H.264/MPEG-4 part 10 is the latest standard for video compression and promises a significant advance in terms of quality and distortion compared with the commercial standards currently most in use such as MPEG-2 or MPEG-4. To achieve this better performance, H.264 adopts a large number of new/improved compression techniques compared with previous standards, albeit at the expense of higher computational complexity. In addition, in recent years new hardware accelerators have emerged, such as graphics processing units (GPUs), which provide a new opportunity to reduce complexity for a large variety of algorithms. However, current GPUs suffer from higher power consumption requirements because of…

Reduction (complexity)Computational Theory and MathematicsComputer Networks and CommunicationsComputer scienceDistortionMotion estimationSymmetric multiprocessor systemEnergy consumptionParallel computingSoftwareComputer Science ApplicationsTheoretical Computer ScienceData compressionConcurrency and Computation: Practice and Experience

researchProduct

An implicitly parallel EDA based on restricted boltzmann machines

2014

We present a parallel version of RBM-EDA. RBM-EDA is an Estimation of Distribution Algorithm (EDA) that models dependencies between decision variables using a Restricted Boltzmann Machine (RBM). In contrast to other EDAs, RBM-EDA mainly uses matrix-matrix multiplications for model estimation and sampling. Hence, for implementation, standard libraries for linear algebra can be used. This allows an easy parallelization and leads to a high utilization of parallel architectures. The probabilistic model of the parallel version and the version on a single core are identical. We explore the speedups gained from running RBM-EDA on a Graphics Processing Unit. For problems of bounded difficulty like …

Restricted Boltzmann machineSpeedupEstimation of distribution algorithmArtificial neural networkComputer scienceLinear algebraGraphics processing unitBoltzmann machineParallel computingProceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

researchProduct

Scalable Dense Factorizations for Heterogeneous Computational Clusters

2008

This paper discusses the design and the implementation of the LU factorization routines included in the Heterogeneous ScaLAPACK library, which is built on top of ScaLAPACK. These routines are used in the factorization and solution of a dense system of linear equations. They are implemented using optimized PBLAS, BLACS and BLAS libraries for heterogeneous computational clusters. We present the details of the implementation as well as performance results on a heterogeneous computing cluster.

ScaLAPACKComputer scienceMathematicsofComputing_NUMERICALANALYSISSymmetric multiprocessor systemParallel computingLU decompositionComputational sciencelaw.inventionMatrix decompositionFactorizationlawScalabilityLinear algebraConcurrent computing2008 International Symposium on Parallel and Distributed Computing

researchProduct