6533b7d5fe1ef96bd1263aa8
RESEARCH PRODUCT
One Phase Commit: A Low Overhead Atomic Commitment Protocol for Scalable Metadata Services
Sai NarasimhamurthyGiuseppe CongiuAndré BrinkmannMatthias Grawinkelsubject
MetadataFile systemComputer scienceStorage Resource BrokerDistributed computingServerScalabilityData_FILESMeta Data ServicesNamespacecomputer.software_genrecomputerMetadata repositorydescription
As the number of client machines in high end computing clusters increases, the file system cannot keep up with the resulting volume of requests, using a centralized metadata server. This problem will be even more prominent with the advent of the exascale computing age. In this context, the centralized metadata server represents a bottleneck for the scaling of the file system performance as well as a single point of failure. To overcome this problem, file systems are evolving from centralized metadata services to distributed metadata services. The metadata distribution raises a number of additional problems that must be taken into account. In this paper we will focus on the problem of managing distributed namespace operations such as CREATE, DELETE and RENAME. Distributed namespace operations are a side effect of metadata distribution across the cluster of metadata servers. Available protocols for handling distributed namespace operations such as the two phase commitment protocol are expensive since they require the exchange of a large number of messages between metadata servers as well as synchronous writes to stable storage to log vital information. Moreover, such protocols adopt locking schemes to protect the resource during the operation, which force multiple operations on the same directory to be serialized. This severely impacts the performance of high performance computing applications in typical scenarios such as high rate of file create operations. We propose a one phase commit protocol that is tailored to the use for typical inter-metadata messages. We rely on a fast, highly available shared storage for metadata in order to minimize writes, messages, coordination overhead and recovery time in case of failing metadata servers. We present a formal description of the new protocol, a theoretical analysis of its capabilities, a proof of correctness and the evaluation of the protocol in a simulated environment that renders the protocol to be fast and reliable. In simulations the protocol achieved more than 50% better performance compared with the two phase commitment protocol.
year | journal | country | edition | language |
---|---|---|---|---|
2012-09-01 | 2012 IEEE International Conference on Cluster Computing Workshops |