I'm Artyom Karamyshev, a system administration team leader at Mail.Ru Cloud Solutions (MCS). We launched many products in 2019. We've aimed to make API services easily scalable, fault-tolerant, and ready to accommodate rapid growth. Our platform is running on OpenStack, and in this article, I describe all the component fault tolerance issues that we've resolved.
The overall fault tolerance of the platform is consists of its components fault tolerance. So, I'm going to show you step by step tutorial about all levels where we've found the risks.