Author: Ramy Gad

0000000000393874

AUTHOR

Ramy Gad

Deduplication Potential of HPC Applications’ Checkpoints

HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design.

research product

Accelerating Application Migration in HPC

It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

research product

VarySched: A Framework for Variable Scheduling in Heterogeneous Environments

Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often not provided for every task, and (ii) that schedulers may follow the run-until-completion model and, hence, disallow resource changes during runtime. In this paper, we present VarySched, a configurable task scheduler framework tailored to efficiently utilize all available computing resources in…

research product

Migration Techniques in HPC Environments

Process migration is an important feature in modern computing centers as it allows for a more efficient use and maintenance of hardware. Especially in virtualized infrastructures it is successfully exploited by schemes for load balancing and energy efficiency. One can divide the tools and techniques into three groups: Process-level migration, virtual machine migration, and container-based migration.

research product

Compiler Driven Automatic Kernel Context Migration for Heterogeneous Computing

Computer systems provide different heterogeneous resources (e.g., GPUs, DSPs and FPGAs) that accelerate applications and that can reduce the energy consumption by using them. Usually, these resources have an isolated memory and a require target specific code to be written. There exist tools that can automatically generate target specific codes for program parts, so-called kernels. The data objects required for a target kernel execution need to be moved to the target resource memory. It is the programmers' responsibility to serialize these data objects used in the kernel and to copy them to or from the resource's memory. Typically, the programmer writes his own serializing function or uses e…

research product