I'm a DevOps Teamlead at Miro
. Our service is a high-load one: it is used by 4 million users worldwide, daily active users - 168,000. Our servers are hosted by Amazon and located in a single Ireland region. We have more than 200 active servers, with almost 70 of them being database servers.
The service's backend is a monolith stateful Java application that maintains a persistent WebSocket connection for each client. When several users collaborate using the same whiteboard, they see changes on the whiteboard in real-time. That's because we write every change to a database, resulting in ~20,000 requests per second to the databases. During peak hours, the data is written to Redis at ~80,000–100,000 RPS.
I am going to speak about why it is important to us to maintain PostgreSQL high availability, what methods we've applied to solve the problem, and what results we've achieved so far.