Pull to refresh
239.79

Algorithms *

Everything about algorithms

Show first
Period
Level of difficulty

How we made landmark recognition in Cloud Mail.ru, and why

Reading time11 min
Views2.4K


With the advent of mobile phones with high-quality cameras, we started making more and more pictures and videos of bright and memorable moments in our lives. Many of us have photo archives that extend back over decades and comprise thousands of pictures which makes them increasingly difficult to navigate through. Just remember how long it took to find a picture of interest just a few years ago.

One of Mail.ru Cloud’s objectives is to provide the handiest means for accessing and searching your own photo and video archives. For this purpose, we at Mail.ru Computer Vision Team have created and implemented systems for smart image processing: search by object, by scene, by face, etc. Another spectacular technology is landmark recognition. Today, I am going to tell you how we made this a reality using Deep Learning.
Read more →
Total votes 45: ↑44 and ↓1+43
Comments0

AI-Based Photo Restoration

Reading time7 min
Views18K


Hi everybody! I’m a research engineer at the Mail.ru Group computer vision team. In this article, I’m going to tell a story of how we’ve created AI-based photo restoration project for old military photos. What is «photo restoration»? It consists of three steps:

  • we find all the image defects: fractures, scuffs, holes;
  • we inpaint the discovered defects, based on the pixel values around them;
  • we colorize the image.

Further, I’ll describe every step of photo restoration and tell you how we got our data, what nets we trained, what we accomplished, and what mistakes we made.
Read more →
Total votes 34: ↑33 and ↓1+32
Comments4

Real-time edge detection using FPGA

Reading time8 min
Views15K

Introduction


Our project implements a real-time edge detection system based on capturing image frames from an OV7670 camera and streaming them to a VGA monitor after applying a grayscale filter and Sobel operator. Our design is built on a Cyclone IV FPGA board which enables us to optimize the performance using the powerful features of the low-level hardware and parallel computations which is important to meet the requirements of the real-time system.


We used ZEOWAA FPGA development board which is based on Cyclone IV (EP4CE6E22C8N). Also, we used Quartus Prime Lite Edition as a development environment and Verilog HDL as a programming language. In addition, we used the built-in VGA interface to drive the VGA monitor, and GPIO (General Pins for Input and Output) to connect the external hardware with our board.


ZEOWAA FPGA development board

Read more →
Total votes 55: ↑41 and ↓14+27
Comments46

How does a barcode work?

Reading time6 min
Views13K
Hi all!

Every person is using barcodes nowadays, mostly without noticing this. When we are buying the groceries in the store, their identifiers are getting from barcodes. Its also the same with goods in the warehouses, postal parcels and so on. But not so many people actually know, how it works.

What is 'inside' the barcode, and what is encoded on this image?



Lets figure it out, and also lets write our own bar decoder.
Read more →
Total votes 27: ↑25 and ↓2+23
Comments0

How elliptic curve cryptography works in TLS 1.3

Reading time20 min
Views20K
image

A couple of reader alerts:

In order to (somewhat) simplify the description process and tighten the volume of the article we are going to write, it is essential to make a significant remark and state the primary constraint right away — everything we are going to tell you today on the practical side of the problematics is viable only in terms of TLS 1.3. Meaning that while your ECDSA certificate would still work in TLS 1.2 if you wish it worked, providing backwards compatibility, the description of the actual handshake process, cipher suits and client-server benchmarks covers TLS 1.3 only. Of course, this does not relate to the mathematical description of algorithms behind modern encryption systems.

This article was written by neither a mathematician nor an engineer — although those helped to find a way around scary math and reviewed this article. Many thanks to Qrator Labs employees.

(Elliptic Curve) Diffie-Hellman (Ephemeral)

The Diffie–Hellman legacy in the 21 century

Of course, this has started with neither Diffie nor Hellman. But to provide a correct timeline, we need to point out main dates and events.

There were several major personas in the development of modern cryptography. Most notably, Alan Turing and Claud Shannon both laid an incredible amount of work over the field of theory of computation and information theory as well as general cryptanalysis, and both Diffie and Hellman, are officially credited for coming up with the idea of public-key (or so-called asymmetric) cryptography (although it is known that in the UK there were made serious advances in cryptography that stayed under secrecy for a very long time), making those two gentlemen pioneers.

In what exactly?
Read more →
Total votes 21: ↑21 and ↓0+21
Comments0

Kalman Filter

Reading time9 min
Views6.1K


There are a lot of different articles on Kalman filter, but it is difficult to find the one which contains an explanation, where all filtering formulas come from. I think that without understanding of that this science becomes completely non understandable. In this article I will try to explain everything in a simple way.

Kalman filter is very powerful tool for filtering of different kinds of data. The main idea behind this that one should use an information about the physical process. For example, if you are filtering data from a car’s speedometer then its inertia give you a right to treat a big speed deviation as a measuring error. Kalman filter is also interesting by the fact that in some way it is the best filter. We will discuss precisely what does it mean. In the end of the article I will show how it is possible to simplify the formulas.
Read more →
Total votes 21: ↑21 and ↓0+21
Comments1

How to build a high-performance application on Tarantool from scratch

Reading time33 min
Views2.8K
image

I came to Mail.ru Group in 2013, and I required a queue for one task. First of all, I decided to check what the company had already got. They told me they had this Tarantool product, and I checked how it worked and decided that adding a queue broker to it could work perfectly well.

I contacted Kostja Osipov, the senior expert in Tarantool, and the next day he gave me a 250-string script that was capable of managing almost everything I needed. Since that moment, I have been in love with Tarantool. It turned out that a small amount of code written with a quite simple script language was capable of ensuring some totally new performance for this DBMS.

Today, I’m going to tell you how to instantiate your own queue in Tarantool 2.2.
Read more →
Total votes 18: ↑18 and ↓0+18
Comments0

A City Without Traffic Jams

Reading time55 min
Views4K

Chapter 2.
(the link to Chapter 1)

The Art of Designing Road Networks


Transport problems of a city through the eyes of a Computer Scientist


If I were recommended an article with the title “The Art of Designing Road Networks,” I would immediately ask how many road networks were built with the participation of its author. I must admit, my professional activity was far from road construction and was recently associated with the design of microprocessors where I, among other responsibilities, was engaged in the resource consumption of data switching. At that time my table stood just opposite the panoramic window which opened up a beautiful view of the long section of the Volgograd Highway and part of the Third Transport Ring with their endless traffic jams from morning to evening, from horizon to horizon. One day, I had a sudden shock of recognition: “The complexities of the data switching process that I struggle with on a chip may be similar to the difficulties the cars face as they flow through the labyrinth of road network”.
Probably, this view from the outside and the application of methods that were not traditional for the area in question gave me a chance to understand the cause of traffic jams and make recommendations on how to overcome the problem in practice.
Read more →
Total votes 13: ↑13 and ↓0+13
Comments2

Overview of Morris's counters

Reading time7 min
Views1.3K

On implementing streaming algorithms, counting of events often occurs, where an event means something like a packet arrival or a connection establishment. Since the number of events is large, the available memory can become a bottleneck: an ordinary n-bit counter allows to take into account no more than 2^n - 1events.
One way to handle a larger range of values using the same amount of memory would be approximate counting. This article provides an overview of the well-known Morris algorithm and some generalizations of it.

Another way to reduce the number of bits required for counting mass events is to use decay. We discuss such an approach here [3], and we are going to publish another blog post on this particular topic shortly.

In the beginning of this article, we analyse one straightforward probabilistic calculation algorithm and highlight its shortcomings (Section 2). Then (Section 3), we describe the algorithm proposed by Robert Morris in 1978 and indicate its most essential properties and advantages. For most non-trivial formulas and statements, the text contains our proofs, the demanding reader can find them in the inserts. In the following three sections, we outline valuable extensions of the classic algorithm: you can learn what Morris's counters and exponential decay have in common, how to improve the accuracy by sacrificing the maximum value, and how to handle weighted events efficiently.

Read more
Total votes 12: ↑12 and ↓0+12
Comments0

Automatic respiratory organ segmentation

Reading time8 min
Views2K

Manual lung segmentation takes about 10 minutes and it requires a certain skill to get the same high-quality result as with automatic segmentation. Automatic segmentation takes about 15 seconds.


I assumed that without a neural network it would be possible to get an accuracy of no more than 70%. I also assumed, that morphological operations are only the preparation of an image for more complex algorithms. But as a result of processing of those, although few, 40 samples of tomographic data on hand, the algorithm segmented the lungs without errors. Moreover, after testing in the first five cases, the algorithm didn’t change significantly and correctly worked on the other 35 studies without changing the settings.


Also, neural networks have a disadvantage — for their training we need hundreds of training samples of lungs, which need to be marked up manually.


Read more →
Total votes 11: ↑10 and ↓1+9
Comments1

Langton's ant: a mystery cellular automaton

Reading time4 min
Views2.5K

The life of Langton's Ant seems sad and lonely, but, as we'll soon discover, he is not ready to put up with such an outrageous situation and is trying his best to escape. American scientist Christopher Langton invented his ant back in 1986. Since then, no one has been able to explain the strange behavior of this mysterious model...

Read more
Total votes 8: ↑8 and ↓0+8
Comments3

Algorithms in Go: Dutch National Flag

Reading time3 min
Views2.7K

The flag of the Netherlands consists of three colors: red, white and blue. Given balls of these three colors arranged randomly in a line (it does not matter how many balls there are), the task is to arrange them such that all balls of the same color are together and their collective color groups are in the correct order.

For simplicity instead of colors red, white, and blue we will be dealing with ones, twos and zeroes.

Let's start with our intuition. We have an array of zeroth, ones, and twos. How would we sort it? Well, we could put aside all zeroes into some bucket, all ones into another bucket, and all twos into the third. Then we can fetch all items from the first bucket, then from the second, and from the last bucket, and restore all the items. This approach is perfectly fine and has a great performance. We touch all the elements when we iterate through the array, and then we iterate through all the elements once more when we "reassamble" the array. So, the overall time complexity is O(n) + O(n) ~= O(n). The space complexity is also O(n) as we need to store all items in the buckets.

Can we do better than that? There is no way to improve our time complexity. However, we can think of a more efficient algorithm in regard to space complexity. How would we solve the problem without the additional buckets?

Let's make a leap of faith and pretend that somehow we were able to process a part of the array. We iterate through part of the array and put encountered zeroes and ones at the beginning of the array, and twos at the end of the array. Now, we switched to the next index i with some unprocessed value x. What should we do there?

Read more
Total votes 8: ↑8 and ↓0+8
Comments4

Five Methods For Database Obfuscation

Reading time20 min
Views7.2K
ClickHouse users already know that its biggest advantage is its high-speed processing of analytical queries. But claims like this need to be confirmed with reliable performance testing. That's what we want to talk about today.



We started running tests in 2013, long before the product was available as open source. Back then, just like now, our main concern was data processing speed in Yandex.Metrica. We had been storing that data in ClickHouse since January of 2009. Part of the data had been written to a database starting in 2012, and part was converted from OLAPServer and Metrage (data structures previously used by Yandex.Metrica). For testing, we took the first subset at random from data for 1 billion pageviews. Yandex.Metrica didn't have any queries at that point, so we came up with queries that interested us, using all the possible ways to filter, aggregate, and sort the data.

ClickHouse performance was compared with similar systems like Vertica and MonetDB. To avoid bias, testing was performed by an employee who hadn't participated in ClickHouse development, and special cases in the code were not optimized until all the results were obtained. We used the same approach to get a data set for functional testing.

After ClickHouse was released as open source in 2016, people began questioning these tests.

Read more →
Total votes 11: ↑9 and ↓2+7
Comments4

«Promising Public Transportation for Large and Medium-Sized Cities» — the main idea in a brief summary

Level of difficultyEasy
Reading time9 min
Views1.4K

(source)

Translation provided by ChatGPT, link to the original article.

I recently published a series of articles titled 'As Cheap as a Bus, as Convenient as a Taxi...':

Link to Part 1: «Preliminary Analysis» (ру / eng )
Link to Part 2: «Experiments on a Torus» (ру / eng )
Link to Part 3: «Practically Significant Solutions» (ру / eng )

dedicated to making public transportation in large cities completely seamless, without the need for transfers. In the last article of the series, I extensively described a microbus movement scheme that allows them to operate almost like taxis while accommodating 5-10 passengers at once. Such a transportation system would enable city residents to travel from any intersection to another without any transfers, comparable in time to a personal car journey, and at a cost similar to a regular city bus ticket. However, the feedback from readers indicated that I chose an extremely ineffective way to convey the information, resulting in a failure to effectively communicate the essence of the matter.

I must admit that the previous three articles were written in a way that allowed readers to apply the acquired knowledge in practice or continue the research I started on their own. Unfortunately, my desire to 'teach' resulted in nearly 100 pages of complex mathematical text, which is clearly excessive for readers who simply wanted to familiarize themselves with the idea. Here, I will attempt to rectify this mistake and briefly, yet simply, explain the bus taxi technology.
Read more →
Total votes 6: ↑6 and ↓0+6
Comments0

Why x^0 = 1 visually

Reading time3 min
Views1.1K

The traditional definition for the operation of exponentiation to a natural power (or a positive integer) had introduced approximately as follows:

Exponentiation is an arithmetic operation originally defined as the result of multiple multiplications a number by itself.

But the more precise formulation is still different:

Raising a number X to an integer power N is an arithmetic operation defined as the result of multiple [N by mod times] multiplications or divisions one by number X.

Let's figure it out under the cut! >>
Total votes 5: ↑5 and ↓0+5
Comments0

Affordable as a Bus, Comfortable as a Taxi: A Promising Type of Public Transport for Large and Medium-Sized Cities.Part1

Level of difficultyMedium
Reading time40 min
Views1.8K

(Jean-Claude Mézières)

Translation provided by ChatGPT, link to the original article in Russian

Link to Part 1: «Preliminary Analysis» (ру / eng )
Link to Part 2: «Experiments on a Torus» (ру / eng )
Link to Part 3: «Practically Significant Solutions» (ру / eng )
Link to «Summary» (ру / eng )

1. About this series of articles


1.1 Central result


If I haven't made a critical mistake, I have discovered an astonishing passenger transportation scheme with unique characteristics. Imagine this scenario: you are in a big city and need to get from point A to point B. All you need to do is walk to the nearest intersection and indicate the destination on your smartphone or a special terminal installed there. In a few minutes, a small but spacious bus will arrive for you. The bus is designed for easy entry without bending, and you can bring a stroller, bicycle, or even a cello inside. It provides comfortable seating where you can stretch your legs. This bus will take you to the nearest intersection to point B, and you will reach your destination without any transfers. The entire journey, including waiting at the stop, will take only 25-50% more time than if you were traveling by private car. Based on my estimation, in modern metropolises, this type of transportation will be widely adopted, and the cost of a trip on such buses will be similar to the fare of a regular city bus.

Surprisingly, the reasoning behind these findings is based on relatively simple mathematics, and perhaps even a talented high school student, under fortunate circumstances, could have guessed them on their own. The practical significance of the topic and the modest level of mathematical requirements prompted me to make an effort to write the article in such a way that the reader could follow the path of discoveries, learn some research techniques, and gain a successful example to explain to their children the purpose of mathematics and how it can be applied in everyday life.
Read more →
Total votes 9: ↑7 and ↓2+5
Comments6

Algorithms in Go

Reading time2 min
Views5.2K

Most solutions to algorithmic problems can be grouped into a rather small number of patterns. When we start to solve some problem, we need to think about how we would classify them. For example, can we apply fast and slow аlgorithmic pattern or do we need to use cyclic sortpattern? Some of the problems have several solutions based on different patterns. In this series, we discuss the most popular algorithmic patterns that cover more than 90% of the usual problems.

It is different from High-School Algorithms 101 Course, as it is not intended to cover things like Karatsuba algorithm (fast multiplication algorithm) or prove different methods of sorting. Instead, Algorithmic Patterns focused on practical skills needed for the solution of common problems. For example, when we set up a Prometheus alert for high request latency we are dealing with Sliding Window Pattern. Or let say, we organize a team event and need to find an available time slot for every participant. At the first glance, it is not obvious that in this case, we are actually solving an algorithmic problem. Actually, during our day we usually solve a bunch of algorithmic problems without realizing that we dealing with algorithms.

The knowledge about Algorithmic Patterns helps one to classify a problem and then apply the appropriate method.

But probably most importantly learning algorithmic patterns boost general programming skills. It is especially helpful when you are debugging some production code, as it trains you to understand the execution flow.

Patterns covered so far:

Sliding Window I

Sliding Window II

Merge Intervals

Dutch National Flag

Matrix Spiral

Iterative Postorder Traversal

Bit Manipulation

Stay tuned :)

<Promo> If you interested to work as a backend engineer, there is an open position in my squad. Prior knowledge of Golang is not required. I am NOT an HR and DO NOT represent the company in any capacity. However, I can share my personal experience as a backend engineer working in the company. </Promo>

Read more
Total votes 7: ↑6 and ↓1+5
Comments0

Authors' contribution