Search results for "Parallel computing"
showing 10 items of 189 documents
Parallel Schwarz methods for convection-dominated semilinear diffusion problems
2002
AbstractParallel two-level Schwarz methods are proposed for the numerical solution of convection-diffusion problems, with the emphasis on convection-dominated problems. Two variants of the methodology are investigated. They differ from each other by the type of boundary conditions (Dirichlet- or Neumann-type) posed on a part of the second-level subdomain interfaces. Convergence properties of the two-level Schwarz methods are experimentally compared with those of a variant of the standard multi-domain Schwarz alternating method. Numerical experiments performed on a distributed memory multiprocessor computer illustrate parallel efficiency of the methods.
Optimized Parallel Implementation of Face Detection based on GPU component
2015
Display Omitted An algorithm for face detection has been implemented on CPU.An acceleration of this algorithm on GPU migration.Performance of GPU implementation shows the effectiveness of this implementation.Another optimization method on GPU are operated. Face detection is an important aspect for various domains such as: biometrics, video surveillance and human computer interaction. Generally a generic face processing system includes a face detection, or recognition step, as well as tracking and rendering phase. In this paper, we develop a real-time and robust face detection implementation based on GPU component. Face detection is performed by adapting the Viola and Jones algorithm. We hav…
Parallel and scalable short-read alignment on multi-core clusters using UPC++
2016
[Abstract]: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the …
The combined distribution/assignment problem in transportation network planning: a parallel approach on hypercube architecture
1995
The joint distribution/assignment problem plays a central role in urban transport network planning. In this problem, according to the mathematical model proposed by S. P. Evans, the trips are iteratively calculated and assigned to the network in such a way that the resulting traffic flows pattern satisfies the selfish equilibrium condition. Unfortunately the number of variables and constraints increase hardly with the greatness of the networks causing long computational time for the equilibrium solution. In this paper an nCUBE 2 parallel computing architecture is employed to solve the combined problem and to asses the potential of MIMD machines to handle large scale transportation network p…
Classifier Optimized for Resource-constrained Pervasive Systems and Energy-efficiency
2017
Computational intelligence is often used in smart environment applications in order to determine a user’scontext. Many computational intelligence algorithms are complex and resource-consuming which can beproblematic for implementation devices such as FPGA:s, ASIC:s and low-level microcontrollers. Thesetypes of devices are, however, highly useful in pervasive and mobile computing due to their small size,energy-efficiency and ability to provide fast real-time responses. In this paper, we propose a classi-fier, CORPSE, specifically targeted for implementation in FPGA:s, ASIC:s or low-level microcontrollers.CORPSE has a small memory footprint, is computationally inexpensive, and is suitable for…
A distributed genetic algorithm for restoration of vertical line scratches
2008
This paper reports a distributed algorithm for the restoration of still frames corrupted by vertical line scratches. The restoration is here approached as an optimisation problem, and is solved using an ad-hoc Genetic Algorithm. The distributed algorithm is designed following a pipeline logical structure. The front end is a network of standard workstations with heterogeneous operating systems. The quality of image is appreciable and the computational time is quite low with respect the sequential version.
PenRed: An extensible and parallel Monte-Carlo framework for radiation transport based on PENELOPE
2021
Monte Carlo methods provide detailed and accurate results for radiation transport simulations. Unfortunately, the high computational cost of these methods limits its usage in real-time applications. Moreover, existing computer codes do not provide a methodology for adapting these kind of simulations to specific problems without advanced knowledge of the corresponding code system, and this restricts their applicability. To help solve these current limitations, we present PenRed, a general-purpose, stand-alone, extensible and modular framework code based on PENELOPE for parallel Monte Carlo simulations of electron-photon transport through matter. It has been implemented in C++ programming lan…
Parallel Calculation of CCSD and CCSD(T) Analytic First and Second Derivatives.
2007
In this paper we present a parallel adaptation of a highly efficient coupled-cluster algorithm for calculating coupled-cluster singles and doubles (CCSD) and coupled-cluster singles and doubles augmented by a perturbative treatment of triple excitations (CCSD(T)) energies, gradients, and, for the first time, analytic second derivatives. A minimal-effort strategy is outlined that leads to an amplitude-replicated, communication-minimized implementation by parallelizing the time-determining steps for CCSD and CCSD(T). The resulting algorithm is aimed at affordable cluster architectures consisting of compute nodes with sufficient memory and local disk space and that are connected by standard co…
Masclet: a new multidimensional AMR cosmological code
2004
A new cosmological multidimensional hydrodynamic and N-body code based on an Adaptive Mesh Refinement scheme is described and tested. The hydro part is based on modern high-resolution shock-capturing techniques, whereas N-body approach is based on a Particle Mesh method. The code has been specifically designed for cosmological applications.To search for other articles by the author(s) go to: http://adsabs.harvard.edu/abstract_service.html
The impact of grain size on the efficiency of embedded SIMD image processing architectures
2004
Pixel-per-processing element (PPE) ratio-the amount of image data directly mapped to each processing element-has a significant impact on the area and energy efficiency of embedded SIMD architectures for image processing applications. This paper quantitatively evaluates the impact of PPE ratio on system performance and efficiency for focal-plane SIMD image processing architectures by comparing throughput, area efficiency, and energy efficiency for a range of common application kernels using architectural and workload simulation. While the impact of grain size is affected by the mix of executed instructions within an application program, the most efficient PPE ratio often does not occur at PE…