Pull to refresh
95.63

High performance *

Methods for obtaining systems high performance

Show first
Rating limit
Level of difficulty

Tarantool Kubernetes Operator

Reading time 10 min
Views 1.8K


Kubernetes has already become a de-facto standard for running stateless applications, mainly because it can reduce time-to-market for new features. Launching stateful applications, such as databases or stateful microservices, is still a complex task, but companies have to meet the competition and maintain a high delivery rate. So they create a demand for such solutions.

We want to introduce our solution for launching stateful Tarantool Cartridge clusters: Tarantool Kubernetes Operator, more under the cut.
Read more →
Total votes 34: ↑34 and ↓0 +34
Comments 0

Tarantool Data Grid: Architecture and Features

Reading time 6 min
Views 2.1K


In 2017, we won the competition for the development of the transaction core for Alfa-Bank's investment business and started working at once. (Vladimir Drynkin, Development Team Lead for Alfa-Bank's Investment Business Transaction Core, spoke about the investment business core at HighLoad++ 2018.) This system was supposed to aggregate transaction data in different formats from various sources, unify the data, save it, and provide access to it.

In the process of development, the system evolved and extended its functions. At some point, we realized that we created something much more than just application software designed for a well-defined scope of tasks: we created a system for building distributed applications with persistent storage. Our experience served as a basis for the new product, Tarantool Data Grid (TDG).

I want to talk about TDG architecture and the solutions that we worked out during the development. I will introduce the basic functions and show how our product could become the basis for building turnkey solutions.
Read more →
Total votes 39: ↑38 and ↓1 +37
Comments 0

Тarantool Cartridge: Sharding Lua Backend in Three Lines

Reading time 8 min
Views 2.4K

In Mail.ru Group, we have Tarantool, a Lua-based application server and a database united. It's fast and classy, but the resources of a single server are always limited. Vertical scaling is also not the panacea. That is why Tarantool has some tools for horizontal scaling, or the vshard module [1]. It allows you to spread data across multiple servers, but you'll have to tinker with it for a while to configure it and bolt on the business logic.

Good news: we got our share of bumps (for example, [2], [3]) and created another framework, which significantly simplifies the solution to this problem.

Тarantool Cartridge is the new framework for developing complex distributed systems. It allows you to concentrate on writing business logic instead of solving infrastructure problems. Under the cut, I will tell you how this framework works and how it could help in writing distributed services.
Read more →
Total votes 26: ↑25 and ↓1 +24
Comments 0

Enumerable: How to yield a business value

Reading time 6 min
Views 2.1K
This article is a brief explanation about how using a common language keywords might have an influence on the budget of IT-infrastructure of a project or help to achieve some limitations/restrictions of hosting infrastructure and, moreover, will be a good sing of the quality and maturity of the source code.
Read more →
Total votes 12: ↑11 and ↓1 +10
Comments 0

12.3 million of concurrent WebSockets

Reading time 19 min
Views 13K

One thing about WebSockets is that you need a lot of resources on the client's side to generate high enough load for the server to actually eat up all the CPU resources.


There are several challenges you have to overcome because the WebSockets protocol is more CPU demanding on the client's side than on the server's side. At the same time you need a lot of RAM to store information about open connections if you have millions of them.


I've been lucky enough to get a couple of new servers for a limited period of time at my disposal for the hardware "burnout" tests. So I decided to use my Lua Application Server — LAppS to do both jobs: test the hardware and perform the LAppS high load tests.


Read more →
Total votes 6: ↑3 and ↓3 0
Comments 0

How to speed up LZ4 decompression in ClickHouse?

Reading time 23 min
Views 14K
When you run queries in ClickHouse, you might notice that the profiler often shows the LZ_decompress_fast function near the top. What is going on? This question had us wondering how to choose the best compression algorithm.

ClickHouse stores data in compressed form. When running queries, ClickHouse tries to do as little as possible, in order to conserve CPU resources. In many cases, all the potentially time-consuming computations are already well optimized, plus the user wrote a well thought-out query. Then all that's left to do is to perform decompression.



So why does LZ4 decompression becomes a bottleneck? LZ4 seems like an extremely light algorithm: the data decompression rate is usually from 1 to 3 GB/s per processor core, depending on the data. This is much faster than the typical disk subsystem. Moreover, we use all available CPU cores, and decompression scales linearly across all physical cores.
Read more →
Total votes 23: ↑21 and ↓2 +19
Comments 0

From High Ceph Latency to Kernel Patch with eBPF/BCC

Reading time 9 min
Views 4.8K


There are a lot of tools for debugging kernel and userspace programs in Linux. Most of them have performance impact and cannot easily be run in production environments. A few years ago, eBPF was developed, which provides the ability to trace the kernel and userspace with low overhead, without needing to recompile programs or load kernel modules.

There are now plenty of tools that use eBPF and in this article, we’ll explain how to write your own profiling tool using the PythonBCC library. This article is based on a real issue from the production environment. We’ll walk you through solving the problem and show how existing bcc tools could be used in some cases.
Read more →
Total votes 13: ↑11 and ↓2 +9
Comments 0

Citymobil — a manual for improving availability amid business growth for startups. Part 5

Reading time 8 min
Views 1K


This is the final part of the series describing how we’re increasing our service availability in Citymobil (you can read the previous part here). Now I’m going to talk about one more type of outages and the conclusions we made about them, how we modified the development process, what automation we introduced.
Read more →
Total votes 24: ↑24 and ↓0 +24
Comments 0

Citymobil — a manual for improving availability amid business growth for startups. Part 4

Reading time 7 min
Views 1K


This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here: part 1, part 2, part 3). In further parts, I’ll talk about the accidents and outages in detail.

1. Bad release: database overload


Let me begin with a specific example of this type of outage. We deployed an optimization: added USE INDEX in an SQL query; during testing as well as in production, it sped up short queries, but the long ones — slowed down. The long queries slowdown was only noticed in production. As a result, a lot of long parallel queries caused the database to be down for an hour. We thoroughly studied the way USE INDEX worked; we described it in the Do’s and Dont’s file and warned the engineers against the incorrect usage. We also analyzed the query and realized that it retrieves mostly historical data and, therefore, can be run on a separate replica for historical requests. Even if this replica goes down due to an overload, the business will keep running.
Read more →
Total votes 17: ↑16 and ↓1 +15
Comments 0

Citymobil — a manual for improving availability amid business growth for startups. Part 3

Reading time 8 min
Views 1.1K


This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here and here). In further parts, I’ll talk about the accidents and outages in detail. But first let me highlight something I should’ve talked about in the first article but didn’t. I found out about it from my readers’ feedback. This article gives me a chance to fix this annoying shortcoming.
Read more →
Total votes 23: ↑23 and ↓0 +23
Comments 0

Statistics and monitoring of PHP scripts in real time. ClickHouse and Grafana go to Pinba for help

Reading time 6 min
Views 5.4K
In this article I will explain how to use pinba with clickhouse and grafana instead of pinba_engine and pinboard.

On the php project pinba is probably the only reliable way to understand what is happening with performance. But usually people start to use pinba only when problems are already observed and it isn't clear where to look in.

Often developers have no idea how many RPS each script has. So they begin to optimize starting from places that seem to have problem.

Someone is analyzing the nginx logs, and someone is slow queries in the database.

Of course pinba would not be superfluous, but there are several reasons why it is not on every project.


Read more →
Total votes 16: ↑15 and ↓1 +14
Comments 0

Citymobil — a manual for improving availability amid business growth for startups. Part 2

Reading time 8 min
Views 958


This is a second article out of a series «Citymobil — a manual for improving availability amid business growth for startups». You can read the first part here. Let’s continue to talk about the way we managed to improve the availability of Citymobil services. In the first article, we learned how to count the lost trips. Ok, we are counting them. What now? Now that we are equipped with an understandable tool to measure the lost trips, we can move to the most interesting part — how do we decrease losses? Without slowing down our current growth! Since it seemed to us that the lion’s share of technical problems causing the trips loss had something to do with the backend, we decided to turn our attention to the backend development process first. Jumping ahead of myself, I’m going to say that we were right — the backend became the main site of the battle for the lost trips.
Read more →
Total votes 23: ↑22 and ↓1 +21
Comments 0

Citymobil — a manual for improving availability amid business growth for startups. Part 1

Reading time 4 min
Views 1.3K


In this first part of an article series «Citymobil — a manual for improving availability amid business growth for startups» I’m going to break down the way we managed to dramatically scale up the availability of Citymobil services. The article opens with the story about our business, our task, the reason for this task to increase the availability emerged and limitations. Citymobil is a rapid-growing taxi aggregator. In 2018, it increased by more than 15 times in terms of number of successfully completed trips. Some months showed 50% increase compared with the previous month.

The business grew like a weed in every direction (it still does): there was an increase in server load, team size and number of deployments. At the same time the new threats to service availability emerged. The company faced a task of the most importance — how to increase availability without compromising company growth. In this article, I’ll talk about the way we managed to solve this task in a relatively short time.
Read more →
Total votes 24: ↑24 and ↓0 +24
Comments 0

The authoritative guide to Blockchain Sharding

Reading time 12 min
Views 1.3K

Hi, I'm one of the developers of the sharded blockchain Near Protocol, and in this article want to talk about what blockchain sharding is, how it is implemented, and what problems exist in blockchain sharding designs.


It is well-known that Ethereum, the most used general purpose blockchain at the time of this writing, can only process less than 20 transactions per second on the main chain. This limitation, coupled with the popularity of the network, leads to high gas prices (the cost of executing a transaction on the network) and long confirmation times; despite the fact that at the time of this writing a new block is produced approximately every 10–20 seconds the average time it actually takes for a transaction to be added to the blockchain is 1.2 minutes, according to ETH Gas Station. Low throughput, high prices, and high latency all make Ethereum not suitable to run services that need to scale with adoption.

Read more →
Total votes 15: ↑14 and ↓1 +13
Comments 0

Gotta Go Fast: Building for Speed in iOS. Part 1

Reading time 7 min
Views 2.6K


There are a lot of tips and tricks that allow iOS developers to know how to make performance optimizations to get animations in applications run smoothly. After reading the article you will realize what 16.67 milliseconds for iOS developer means, and which tools are better to use to track down the code.

The article is based on the keynote talk delivered by Luke Parham, currently an iOS engineer at Apple and an author of tutorials for iOS development on RayWenderlich.com, at the International Mobile Developers Conference MBLT DEV 2017.
Total votes 26: ↑26 and ↓0 +26
Comments 1

What to think during NALSD interview

Reading time 7 min
Views 9.1K
There are a lot of posts about what a typical coding interview at Google looks like. But, while not as widely described and discussed, there is also quite often a system design interview. For an SRE position it’s NALSD: non-abstract large system design. The key difference between SWE and SRE interviews consists in these two letters: NA.

So, what is the difference? How to be prepared for this interview? Let’s be non-abstract, and use an example. To be more non-abstract, let’s take something from the material world, such that you won’t be asked the exact same thing at the real interview (at least, not at the Google interview) :)

So, let’s design a public library system. For the paper books, like you have seen everywhere around. The whole text below was written all at once within around one hour, to roughly show you the areas that you should be able to cover / touch during the interview. Please excuse some disorder, that’s how I think (therefore I am).
Read more →
Total votes 26: ↑24 and ↓2 +22
Comments 0
2

Authors' contribution