• How to speed up LZ4 decompression in ClickHouse?

      When you run queries in ClickHouse, you might notice that the profiler often shows the LZ_decompress_fast function near the top. What is going on? This question had us wondering how to choose the best compression algorithm.

      ClickHouse stores data in compressed form. When running queries, ClickHouse tries to do as little as possible, in order to conserve CPU resources. In many cases, all the potentially time-consuming computations are already well optimized, plus the user wrote a well thought-out query. Then all that's left to do is to perform decompression.

      So why does LZ4 decompression becomes a bottleneck? LZ4 seems like an extremely light algorithm: the data decompression rate is usually from 1 to 3 GB/s per processor core, depending on the data. This is much faster than the typical disk subsystem. Moreover, we use all available CPU cores, and decompression scales linearly across all physical cores.
      Read more →
    • How to create a dark theme without breaking things: learning with the Yandex Mail team

        My name is Vladimir, and I develop mobile front-end for Yandex Mail. Our apps have had a dark theme for a while, but it was incomplete: only the interface and plain emails were dark. Messages with custom formatting remained light and stood out against the dark interface, hurting our users’ eyes at night.

        Today I'll tell you how we fixed this problem. You will learn about two simple techniques that didn't work for us and the method that finally did the trick — adaptive page recoloring. I'll also share some ideas about adapting images to a dark theme. To be fair, darkening pages with custom CSS is a rather peculiar task, but I believe some of you may find our experience helpful.

        Read more →
      • Hypercube. How we gave developers test devices without losing any

          You can’t properly test and debug mobile apps without test devices, which there should be plenty of considering how the same code may behave differently on different models. So how do we keep track of these devices? How do we quickly provide developers and testers with the smartphones they need, configured the way they need, and without much red tape?

          I’m Alexey Lavrenuke. Over the years, I’ve worn many hats: one of the authors behind Yandex.Tank, a speaker on load testing, and the guy who calculated energy consumption by mobile phones. Now I’m a Yandex.Rover developer on the self-driving car team.

          After the phones and before Yandex.Rover, there was Hypercube.

          A few years ago, the head of mobile development popped in to the load testing department and mentioned a problem they were having with test devices: phones had a tendency to inexplicably migrate from one desk to another. Picking the right device and then finding it had become a challenge. We already experienced working with mobile devices from building a digital ammeter to calculate energy consumption, so we decided to help our coworkers out and quickly rig up a handy contraption. We figured the whole thing wouldn’t take more than three months. Oh how wrong we were. Let me tell you what we were really in for.

          ''Dallas cube''
          Read more →
        • Five Methods For Database Obfuscation

            ClickHouse users already know that its biggest advantage is its high-speed processing of analytical queries. But claims like this need to be confirmed with reliable performance testing. That's what we want to talk about today.

            We started running tests in 2013, long before the product was available as open source. Back then, just like now, our main concern was data processing speed in Yandex.Metrica. We had been storing that data in ClickHouse since January of 2009. Part of the data had been written to a database starting in 2012, and part was converted from OLAPServer and Metrage (data structures previously used by Yandex.Metrica). For testing, we took the first subset at random from data for 1 billion pageviews. Yandex.Metrica didn't have any queries at that point, so we came up with queries that interested us, using all the possible ways to filter, aggregate, and sort the data.

            ClickHouse performance was compared with similar systems like Vertica and MonetDB. To avoid bias, testing was performed by an employee who hadn't participated in ClickHouse development, and special cases in the code were not optimized until all the results were obtained. We used the same approach to get a data set for functional testing.

            After ClickHouse was released as open source in 2016, people began questioning these tests.

            Read more →