6533b82afe1ef96bd128c0b3

RESEARCH PRODUCT

Challenges and Solutions for Tracing Storage Systems

André BrinkmannVasily TarasovDean HildebrandMarc-andré Vef

subject

File systemComputer sciencebusiness.industryInterface (computing)Distributed computing020206 networking & telecommunications020207 software engineering02 engineering and technologyTracingcomputer.software_genreSoftwareHardware and ArchitecturePOSIXScalability0202 electrical engineering electronic engineering information engineeringOverhead (computing)businesscomputerTRACE (psycholinguistics)

description

IBM Spectrum Scale’s parallel file system General Parallel File System (GPFS) has a 20-year development history with over 100 contributing developers. Its ability to support strict POSIX semantics across more than 10K clients leads to a complex design with intricate interactions between the cluster nodes. Tracing has proven to be a vital tool to understand the behavior and the anomalies of such a complex software product. However, the necessary trace information is often buried in hundreds of gigabytes of by-product trace records. Further, the overhead of tracing can significantly impact running applications and file system performance, limiting the use of tracing in a production system. In this research article, we discuss the evolution of the mature and highly scalable GPFS tracing tool and present the exploratory study of GPFS’ new tracing interface, FlexTrace , which allows developers and users to accurately specify what to trace for the problem they are trying to solve. We evaluate our methodology and prototype, demonstrating that the proposed approach has negligible overhead, even under intensive I/O workloads and with low-latency storage devices.

https://doi.org/10.1145/3149376