A Study on Distributed Fault-Tolerant Service Architectures for Critical Software Systems
Reliable systems have been the subject of much research in the past years, with societal dependency on computer systems becoming more and more apparent. With more and more organizations embracing DevOps culture, there is a persistent need to understand how these systems are built and what their trade-offs are. This paper discusses and benchmarks the components of a modern fault tolerant and easily scalable system, designed to maximize up-time. The paper also describes the techniques used in the development of such a system. The system architecture described is implemented through several services deployed for a new critical single sign-on system deployed at CERN (The European Laboratory for…