6533b7dcfe1ef96bd1272866

RESEARCH PRODUCT

Deduplication Potential of HPC Applications’ Checkpoints

Tim SubFederico PaduaJürgen KaiserAndré BrinkmannRamy GadLars Nagel

subject

0301 basic medicine03 medical and health sciences030104 developmental biologyComputer scienceDistributed computingScalabilityData_FILESRedundancy (engineering)Data deduplicationApplication checkpointing

description

HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design.

https://doi.org/10.1109/cluster.2016.32