Search results for "Parallel"
showing 7 items of 667 documents
Can back-projection fully resolve polarity indeterminacy of independent component analysis in study of event-related potential?
2011
a b s t r a c t In the study of event-related potentials (ERPs) using independent component analysis (ICA), it is a traditional way to project the extracted ERP component back to electrodes for correcting its scaling (magnitude and polarity) indeterminacy. However, ICA tends to be locally optimized in practice, and then, the back-projection of a component estimated by the ICA can possibly not fully correct its polarity at every electrode. We demonstrate this phenomenon from the view of the theoretical analysis and numerical simulations and suggest checking and modifying the abnormal polarity of the projected component in the electrode field before further analysis. Moreover, when several co…
Lattice Boltzmann Simulations at Petascale on Multi-GPU Systems with Asynchronous Data Transfer and Strictly Enforced Memory Read Alignment
2015
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently general-purpose graphics processing units have become accessible as high-performance computing resources at large-scale. We report on implementing a lattice Boltzmann solver for multi-GPU systems that achieves 0.69 PFLOPS performance on 16384 GPUs. In addition to optimizing the data layout on the GPUs and eliminating the halo sites, we make use of the possibility to overlap data transfer between the host CPU and the device GPU with computing on the GPU. We simulate flow in porous media and measure both strong and weak scaling performance with the emphasis being on a large scale…
Functionally-defined recurrent multi-word units in English-to-Polish translation
2022
This study uses both parallel and comparable reference corpora in the English-Polish language pair to explore how translators deal with recurrent multi-word items performing specific discoursal functions.We also consider whether the observed tendencies overlap with those found in native texts,and the extent to which the discoursal functions realised by themulti-word items under scrutiny are “preserved” in translation. Capitalizing on findings fromearlier research (Granger, 2014; Grabar & Lefer, 2015), we analyzed a pre-selected set of phrases signaling stance-taking and those functioning as textual, discourse-structuring devices originally found in the European Parliament proceedings corpus…
Reliable Outer Bounds for the Dual Simplex Algorithm with Interval Right-hand Side
2013
International audience; In this article, we describe the reliable computation of outer bounds for linear programming problems occuring in linear relaxations derived from the Bernstein polynomials. The computation uses interval arithmetic for the Gauss-Jordan pivot steps on a simplex tableau. The resulting errors are stored as interval right hand sides. Additionally, we show how to generate a start basis for the linear programs of this type. We give details of the implementation using OpenMP and comment on numerical experiments.
Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication
2016
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with com…
Conception en technologie CMOS d'un Système de Vision dédié à l'Imagerie Rapide et aux Traitements d'Images
2008
Our work presented in this thesis focuses on the design, testing and implementation of monolithics CMOS image smart sensors : The principle, performance and limitations. The hardware implementation of a vision smart system is the central link. HISIC is High Speed Image Capture with processing at pixel level. An experimental platform for instrumentation and evaluation of retina operators was conducted during this thesis. After a state of the smart sensors and CMOS retinas, the second part is dedicated to the study and design of the pixel image sensor HISIC. Two circuits were realized in CMOS technology. The first identied a new type of photo-detector, and the second, to create a prototype em…
Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units
2021
We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and the numerical information. Our approach is multi-platform, in the sense that the realizations for (general-purpose) multicore processors as well as graphics accelerators (GPUs) are built upon common principles, but differ in the implementation details, which are adapted to avoid thread divergence in the GPU case or maximize compression element-wise (i.e., for each matrix entry) for multicore architectures. Our evaluation on the two last generations of NVIDIA GPUs as well as In…