Search results for "Asynchronous communication"
showing 5 items of 65 documents
Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication
2017
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with comp…
What do we do when we analyse the temporal aspects of computer-supported collaborative learning? A systematic literature review
2021
To better understand the premises for successful computer-supported collaborative learning (CSCL), several studies over the last 10 years have analysed the temporal aspects of CSCL. We broadly define the temporal aspects of CSCL as focusing on the characteristics of or interrelations between events over time. The analysis of these aspects, however, has been loosely defined, creating challenges regarding the comparability and commensurability of studies. To address these challenges, we conducted a systematic literature review to define the temporal analysis procedure for CSCL using 78 journal papers published from 2003 to 2019. After identifying the key operations to be included in the proce…
Lattice Boltzmann Simulations at Petascale on Multi-GPU Systems with Asynchronous Data Transfer and Strictly Enforced Memory Read Alignment
2015
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently general-purpose graphics processing units have become accessible as high-performance computing resources at large-scale. We report on implementing a lattice Boltzmann solver for multi-GPU systems that achieves 0.69 PFLOPS performance on 16384 GPUs. In addition to optimizing the data layout on the GPUs and eliminating the halo sites, we make use of the possibility to overlap data transfer between the host CPU and the device GPU with computing on the GPU. We simulate flow in porous media and measure both strong and weak scaling performance with the emphasis being on a large scale…
Performance of an asymmetric and asynchronous decode-and-forward FBMC relay system
2014
End-to-end link performance of an asymmetric and asynchronous dual-hop decode-and-forward (DF) relay system built up using a causal multirate filter bank multicarrier (FBMC) technique operated under Rayleigh fading is presented. Three main performance measures namely bit error rate (BER), outage probability and channel capacity are used for this evaluation and approximate closed-form expressions for them are also made available. FBMC setup is modeled in exact form without any approximations while customizing to one of the most efficient subcarrier filter. Simulations are carried out in quasi-static multipath fading channels under symmetric, asymmetric, synchronous and asynchronous condition…
Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication
2016
The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with com…