• ## QA process at Miro

We have been working on our current QA process for about two years, and we still keep improving it. This process may seem obvious, but when we started to implement it in a new team that consisted entirely of new developers, we realized that it was difficult to do right away. Many people are used to working differently and need to make a lot of changes at once to switch, which is difficult. At the same time, it is ill-advised to implement this process in parts, because it can negatively affect the quality.

What do we do? We need to do preliminary preparation for each block of the development process: task decomposition, evaluation and planning, development itself, investigative testing, and release. This preparation does not consist of simply throwing old parts out of the process but of their adequate replacement, which increases quality.

In this article, I will talk in detail about our testing process at each stage of creating a new feature and also about the introduced changes that have increased the quality and speed of development.

• ## Annotations for Concurrency in Java. Our approach to coloring threads

At Miro, we always try to improve the maintainability of our code by using common practices, including in matters of multithreading. This does not solve all the issues that arise because of the ever-increasing load, but it simplifies the support: it increases both the code readability and the speed of developing new features.

Today (May 2020) we have about 100 servers in the production environment, 6,000 HTTP API requests per second and more than 50,000 WebSocket API commands, and daily releases. Miro has been developing since 2011; in the current implementation, user requests are handled in parallel by a cluster of different servers.

• ## Managing hundreds of servers for load testing: autoscaling, custom monitoring, DevOps culture

In the previous article, I talked about our load testing infrastructure. On average, we use about 100 servers to create a load, about 150 servers to run our service. All these servers need to be created, configured, started, deleted. To do this, we use the same tools as in the production environment to reduce the amount of manual work:

• Terraform scripts for creating and deleting a test environment;
• Ansible scripts for configuring, updating, starting servers;
• In-house Python scripts for dynamic scaling, depending on the load.

Thanks to the Terraform and Ansible scripts, all operations ranging from creating instances to starting servers are performed with only six commands:

#launch the required instances in the AWS console
ansible-playbook deploy-config.yml #update servers versions
ansible-playbook start-application.yml #start our app on these servers
ansible-playbook update-test-scenario.yml --ask-vault-pass #update the JMeter test scenario if it was changed
infrastructure-aws-cluster/jmeter_clients:~# terraform apply #create JMeter servers for creating the load
playbook start-jmeter-server-cluster.yml #start the JMeter cluster
ansible-playbook start-stress-test.yml #start the test


• ## Reliable load testing with regards to unexpected nuances

We thought about building the infrastructure for large load tests a year ago when we reached the mark of 12,000 simultaneously active online users. In three months, we made the first version of the test, which showed us the limits of the service.

The irony is that simultaneously with the launch of the test, we reached the limits on the production server, resulting in two-hour service downtime. This further encouraged us to move from making occasional tests to establishing an effective load testing infrastructure. By infrastructure, I mean all tools for working with load testing: tools for launching the test (manual and automatic), the cluster that creates the load, a production-like cluster, metrics and reporting services, scaling services, and the code to manage it all.

Simplified, this is what our structure looks like: a collection of different servers that somehow interact with each other, each server performing specific tasks. It seemed that to build the load testing infrastructure, it was enough for us to make this diagram, take account of all interactions, and start creating test cases for each block one by one.

This approach is right, but it would have taken many months, which was not suitable for us because of our rapid growth — over the past twelve months, we have grown from 12,000 to 100,000 simultaneously active online users. Also, we didn’t know how our service infrastructure would respond to the increased load: which blocks would become the bottleneck, and which would scale linearly?
• ## Implementing Fault-Tolerance PostgreSQL Cluster with Patroni

I'm a DevOps Teamlead at Miro. Our service is a high-load one: it is used by 4 million users worldwide, daily active users - 168,000. Our servers are hosted by Amazon and located in a single Ireland region. We have more than 200 active servers, with almost 70 of them being database servers.

The service's backend is a monolith stateful Java application that maintains a persistent WebSocket connection for each client. When several users collaborate using the same whiteboard, they see changes on the whiteboard in real-time. That's because we write every change to a database, resulting in ~20,000 requests per second to the databases. During peak hours, the data is written to Redis at ~80,000–100,000 RPS.

I am going to speak about why it is important to us to maintain PostgreSQL high availability, what methods we've applied to solve the problem, and what results we've achieved so far.

• ## How we learned to draw text on HTML5 Canvas

We are developing an online collaborative whiteboard, and we are using Canvas to display all graphics and text on boards. There is no out-of-the-box solution for displaying text on Canvas as you would in regular HTML. Hence, over the past several years, we’ve been iterating on this; and we finally think we’ve reached the best solution.

In this article, we will walk you through this transition from Flash to Canvas and why we gave up on using SVG foreignObject in the process.