0000000000073985
AUTHOR
Cheng Wen
AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre
Abstract In high performance computing systems, I/O congestion is a common problem in large-scale distributed file systems. However, the current implementation mainly requires administrator to manually design low-level implementation and optimization, we proposes an adaptive I/O congestion control framework, named AIOC 2 , which can not only adaptively tune the I/O congestion control parameters, but also exploit the deep Q-learning method to start the training parameters and optimize the tuning for different types of workloads from the server and the client at the same time. AIOC 2 combines the feedback-based dynamic I/O congestion control and deep Q-learning parameter tuning technology to …
Improving LSM‐trie performance by parallel search
NVMM-Oriented Hierarchical Persistent Client Caching for Lustre
In high-performance computing (HPC), data and metadata are stored on special server nodes and client applications access the servers’ data and metadata through a network, which induces network latencies and resource contention. These server nodes are typically equipped with (slow) magnetic disks, while the client nodes store temporary data on fast SSDs or even on non-volatile main memory (NVMM). Therefore, the full potential of parallel file systems can only be reached if fast client side storage devices are included into the overall storage architecture. In this article, we propose an NVMM-based hierarchical persistent client cache for the Lustre file system (NVMM-LPCC for short). NVMM-LPC…
LPCC
Most high-performance computing (HPC) clusters use a global parallel file system to enable high data throughput. The parallel file system is typically centralized and its storage media are physically separated from the compute cluster. Compute nodes as clients of the parallel file system are often additionally equipped with SSDs. The node internal storage media are rarely well-integrated into the I/O and compute workflows. How to make full and flexible use of these storage media is therefore a valuable research question. In this paper, we propose a hierarchical Persistent Client Caching (LPCC) mechanism for the Lustre file system. LPCC provides two modes: RW-PCC builds a read-write cache on…