Search results for "Parallel"
showing 10 items of 667 documents
An implicitly parallel EDA based on restricted boltzmann machines
2014
We present a parallel version of RBM-EDA. RBM-EDA is an Estimation of Distribution Algorithm (EDA) that models dependencies between decision variables using a Restricted Boltzmann Machine (RBM). In contrast to other EDAs, RBM-EDA mainly uses matrix-matrix multiplications for model estimation and sampling. Hence, for implementation, standard libraries for linear algebra can be used. This allows an easy parallelization and leads to a high utilization of parallel architectures. The probabilistic model of the parallel version and the version on a single core are identical. We explore the speedups gained from running RBM-EDA on a Graphics Processing Unit. For problems of bounded difficulty like …
Ekskurs XI. Sprofanowana świątynia (Ez 8,1-18) i dolina suchych kości (Ez 37,1-14) w świetle retoryki hebrajskiej
2021
Kontekstem badań było to, że komentatorzy Księgi Ezechiela nie są zgodni w sprawie struktury badanych tekstów i proponują odmienne schematy. Celem badań stało się odkrycie struktury, którą starożytny autor natchniony zawarł w tekście. By osiągnąć założony cel, zastosowano metodę retoryki hebrajskiej, którą opracował Roland Meynet. W wyniku przeprowadzonych badań udało się odkryć, że sprofanowana świątynia ma strukturę paralelno-koncentryczną, składającą się z 9 elementów (A, B, C, D, E, D’, C’, B’, A’), natomiast dolina suchych kości też ma strukturę paralelno-koncentryczną, na którą składa się 5 elementów (A, B, C, B’, A’). Osiągnięte wyniki pozwoliły wyciągnąć wspólny wniosek dla dwóch ba…
Scalable Dense Factorizations for Heterogeneous Computational Clusters
2008
This paper discusses the design and the implementation of the LU factorization routines included in the Heterogeneous ScaLAPACK library, which is built on top of ScaLAPACK. These routines are used in the factorization and solution of a dense system of linear equations. They are implemented using optimized PBLAS, BLACS and BLAS libraries for heterogeneous computational clusters. We present the details of the implementation as well as performance results on a heterogeneous computing cluster.
Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model
2015
[Abstract] Detecting epistasis, such as 2-SNP interactions, in genome-wide association studies (GWAS) is an important but time consuming operation. Consequently, GPUs have already been used to accelerate these studies, reducing the runtime for moderately-sized datasets to less than 1 hour. However, single-GPU approaches cannot perform large-scale GWAS in reasonable time. In this work we present multiEpistSearch, a tool to detect epistasis that works on GPU clusters. While CUDA is used for parallelization within each GPU, the workload distribution among GPUs is performed with Unified Parallel C++ (UPC++), a novel extension of C++ that follows the Partitioned Global Address Space (PGAS) model…
Large-Scale Clustering of Short Reads for Metagenomics On GPUs
2013
Checkpointing Workflows for Fail-Stop Errors
2017
International audience; We consider the problem of orchestrating the exe- cution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application data to checkpoint to stable storage, so as to mitigate the impact of processor failures. For general DAGs this problem is hopelessly intractable. In fact, given a solution, computing its expected makespan is still a difficult problem. To address this challenge,…
Serial In-network Processing for Large Stationary Wireless Sensor Networks
2017
International audience; In wireless sensor networks, a serial processing algorithm browses nodes one by one and can perform different tasks such as: creating a schedule among nodes, querying or gathering data from nodes, supplying nodes with data, etc. Apart from the fact thatserial algorithms totally avoid collisions, numerous recent works have confirmed that these algorithms reduce communications andconsiderably save energy and time in large-dense networks. Yet, due to the path construction complexity, the proposed algorithmsare not optimal and their performances can be further enhanced. To do so, in the present paper, we propose a new serial processing algorithm that, in most of the case…
Simulation of parallel mechanisms for motion cueing generation in vehicle simulators using AM-FM bi-modulated signals
2018
Abstract The use of robotic motion platforms in vehicle simulators is relatively common. However, the process of testing and tuning the so-called washout algorithms, used for motion cueing generation in motion-based vehicle simulators, is complex. This process can be reduced in cost, simplified, improved, shortened and performed safer if virtual motion platforms are used instead of real devices. This paper deals with identifying a method to perform a fast but reliable simulation of parallel mechanisms to be used for motion cueing generation. The method relies on the use of Laplacian polynomial transfer function models by means of using AM-FM bi-modulated signals as reference inputs to achie…
Iterative moment method for electromagnetic transients in grounding systems on CRAY T3D
1996
In this paper the parallel aspects of an electromagnetic model for transients in grounding systems based on an iterative scheme are investigated in a multiprocessor environment. A coarse and fine grain parallel solutions have been developed on the CRAY T3D, housed at CINECA, equipped with 64 processors working in space sharing modality. The performances of the two parallel approaches implemented according to the work sharing parallel paradigm have been evaluated for different problem sizes employing variable number of processors.
A Parallel Implementation of the Tree-Structured Self-Organizing Map
2002
This paper presents how Self-Organizing Maps (SOMs)can be trained efficiently using several, simultaneously executing threads on a shared memory Symmetric MultiProcessing (SMP)computer. The training method is a batch version of the Tree-Structured Self-Organizing Map. We note that SMP type of parallel training is very useful for large data sets obtained from nature, the process industry or large document collections, since we do not encounter similar model size limitations as with hardware SOM implementations.