Pull to refresh

Administrating

Show first
Period
Level of difficulty

The Rules for Data Processing Pipeline Builders

Reading time5 min
Views3.8K


"Come, let us make bricks, and burn them thoroughly."
– legendary builders

You may have noticed by 2020 that data is eating the world. And whenever any reasonable amount of data needs processing, a complicated multi-stage data processing pipeline will be involved.


At Bumble — the parent company operating Badoo and Bumble apps — we apply hundreds of data transforming steps while processing our data sources: a high volume of user-generated events, production databases and external systems. This all adds up to quite a complex system! And just as with any other engineering system, unless carefully maintained, pipelines tend to turn into a house of cards — failing daily, requiring manual data fixes and constant monitoring.


For this reason, I want to share certain good engineering practises with you, ones that make it possible to build scalable data processing pipelines from composable steps. While some engineers understand such rules intuitively, I had to learn them by doing, making mistakes, fixing, sweating and fixing things again…


So behold! I bring you my favourite Rules for Data Processing Pipeline Builders.

Read more →

The magic of Virtualization: Proxmox VE introductory course

Reading time8 min
Views3.1K

Today, I am going to explain how to quickly deploy several virtual servers with different operating systems on a single physical server without much effort. This will enable any system administrator to manage the whole corporate IT infrastructure in a centralized manner and save a huge amount of resources.
Read more →

Optimising server distribution across the racks

Reading time5 min
Views1.9K
Recently, a colleague asked me in a chat:

— Is there an article how to pack servers into the racks properly?

I realised that I'm unaware of it. So, I decided to write my text.

Firstly, this is an article about bare metal servers in the data centre (DC) facilities. Secondly, we estimate that there are a lot of servers (hundreds or thousands); the article doesn't make sense for fewer quantities. Thirdly, we consider that there are three constraints in the racks: physical space, electric power per each one, and cabinets stay in the rows adjacent to each other, so we can use a single ToR switch to connect servers in them.
The answer to the original question depends significantly...

From High Ceph Latency to Kernel Patch with eBPF/BCC

Reading time9 min
Views5.3K


There are a lot of tools for debugging kernel and userspace programs in Linux. Most of them have performance impact and cannot easily be run in production environments. A few years ago, eBPF was developed, which provides the ability to trace the kernel and userspace with low overhead, without needing to recompile programs or load kernel modules.

There are now plenty of tools that use eBPF and in this article, we’ll explain how to write your own profiling tool using the PythonBCC library. This article is based on a real issue from the production environment. We’ll walk you through solving the problem and show how existing bcc tools could be used in some cases.
Read more →

Automatically obtaining SSL certificates by Let's Encrypt using DNS-01 challenge and AWS

Reading time5 min
Views5.8K

This post describes the steps needed for setting up automatic SSL certificates creation and renewal, using Let's Encrypt as the automated Certificate Authority, which provides a well-maintained API.
acme-dns-route53 is the tool to obtain SSL certificates from Let’s Encrypt using DNS-01 challenge with Route53 and Amazon Certificate Manager by AWS. acme-dns-route53 also has the built-in functionality for using this tool inside AWS Lambda, and this is what we are going to do.

Read more →

Windows Virtual Desktop now in public preview on Azure

Reading time1 min
Views1.4K

We recently shared the public preview of the Windows Virtual Desktop service on Azure. Now customers can access the only service that delivers simplified management, multi-session Windows 10, optimizations for Office 365 ProPlus, and support for Windows Server Remote Desktop Services (RDS) desktops and apps. With Windows Virtual Desktop, you can deploy and scale your Windows desktops and apps on Azure in minutes, while enjoying built-in security and compliance.


Image of women on her desktop in the workplace

Read more →

Quintet instead of Byte — data storage and retrieval approach

Reading time13 min
Views1.8K
Quintet is a way to present atomic pieces of data indicating their role in the business area. Quintets can describe any item, while each of them contains complete information about itself and its relations to other quintets. Such description does not depend on the platform used. Its objective is to simplify the storage of data and to improve the visibility of their presentation.



We will discuss an approach to storing and processing information and share some thoughts on creating a development platform in this new paradigm. What for? To develop faster and in shorter iterations: sketch your project, make sure it is what you thought of, refine it, and then keep refining the result.

The quintet has properties: type, value, parent, and order among the peers. Thus, there are 5 components including the identifier. This is the simplest universal form to record information, a new standard that could potentially fit any programming demands. Quintets are stored in the file system of the unified structure, in a continuous homogeneous indexed bulk of data. The quintet data model — a data model that describes any data structure as a single interconnected list of basic types and terms based on them (metadata), as well as instances of objects stored according to this metadata (data).
Read more →

Bypassing LinkedIn Search Limit by Playing With API

Reading time7 min
Views17K
[Because my extension got a lot of attention from the foreign audience, I translated my original article into English].

Limit


Being a top-rated professional network, LinkedIn, unfortunately, for free accounts, has such a limitation as Commercial Use Limit (CUL). Most likely, you, same as me until recently, have never encountered and never heard about this thing.

image

The point of the CUL is that when you search people outside your connections/network too often, your search results will be limited with only 3 profiles showing instead of 1000 (100 pages with 10 profiles per page by default). How ‘often’ is measured nobody knows, there are no precise metrics; the algorithm decides it based on your actions – how frequently you’ve been searching and how many connections you’ve been adding. The free CUL resets at midnight PST on the 1st of each calendar month, and you get your 1000 search results again, for who knows how long. Of course, Premium accounts have no such limit in place.

However, not so long ago, I’ve started messing around with LinkedIn search for some pet-project, and suddenly got stuck with this CUL. Obviously, I didn’t like it that much; after all, I haven’t been using the search for any commercial purposes. So, my first thought was to explore this limit and try to bypass it.

[Important clarification — all source materials in this article are presented solely for informational and educational purposes. The author doesn't encourage their use for commercial purposes.]
Read more →

10 critical skills every DevOps engineer

Reading time6 min
Views17K

What is DevOps and Why is it important?


DevOps is the combination of Development teams and Operation teams in order to create a business with traditional software development practices. DevOps gaining popularity at a rapid pace. Let's see how DevOps helps the delivery of Software products.

When the development and operational teams are inseparable silos, it makes development life cycles longer due to lack of communication and cooperation between two teams. By merging those two we can make software development shorter cycles.

DevOps is not a profession. It's culture. It builds teams and makes engineers work for a common goal rather than individual performances. This leads to better collaboration and increased efficiency.

More importantly, DevOps reduces rollback failures, Rollbacks and give time to recover. The main characteristic of DevOps. This helps to find bugs and failures quickly giving rise to rectify bugs or recover from failures.
Read more →

Сервис опросов

Reading time12 min
Views981

Опросы – это не просто инструмент сбора мнений; они являются мостом между компаниями и их аудиторией, позволяя не только узнать мнение клиентов, но и вовлечь их в процесс создания и улучшения продуктов и услуг.

Сервис опросов Тестограф предоставляет широкие возможности для создания опросов, тестов и голосований, обладая функционалом, который позволяет не только собирать данные, но и анализировать их, делая выводы и принимая обоснованные решения на их основе.

Эта статья будет интересна широкому кругу читателей: от HR-менеджеров, стремящихся оптимизировать процесс подбора и оценки персонала, до маркетологов, ищущих пути повышения эффективности своих кампаний через глубокое понимание целевой аудитории. Она также окажется полезной для специалистов по UX/UI, стремящихся улучшить пользовательский опыт на основе реальных отзывов, и для организаторов голосований, желающих обеспечить прозрачность и достоверность результатов.

Read more

Q4 2022 DDoS Attacks and BGP Incidents

Reading time7 min
Views1.8K

Now that 2022 has come to an end, we would like to share the DDoS attack mitigation and BGP incident statistics for the fourth quarter of the year, which overall saw unprecedented levels of DDoS attack activity across all business sectors.

In 2022, DDoS attacks increased by 73.09% compared to 2021. 

Let's take a closer look at the Q4 2022 data.

Read more

Introducing One Ring — an open-source pipeline for all your Spark applications

Reading time23 min
Views1.6K

If you utilize Apache Spark, you probably have a few applications that consume some data from external sources and produce some intermediate result, that is about to be consumed by some applications further down the processing chain, and so on until you get a final result.


We suspect that because we have a similar pipeline with lots of processes like this one:


A process flowchart with more than 50 applications and about 70 datasets
Click here for a bit larger version


Each rectangle is a Spark application with a set of their own execution parameters, and each arrow is an equally parametrized dataset (externally stored highlighted with a color; note the number of intermediate ones). This example is not the most complex of our processes, it’s fairly a simple one. And we don’t assemble such workflows manually, we generate them from Process Templates (outlined as groups on this flowchart).


So here comes the One Ring, a Spark pipelining framework with very robust configuration abilities, which makes it easier to compose and execute a most complex Process as a single large Spark job.


And we just made it open source. Perhaps, you’re interested in the details.

We got you covered!

Analytics For Azure DevOps Services is Now Generally Available

Reading time2 min
Views2.1K

Reporting has been an important capability for Azure DevOps customers who rely on Analytics to make data driven decisions.


Today, we’re excited to announce that the following Analytics features listed below will be included in our Azure DevOps Services offering at no additional cost. Customers will start to see these changes rolled out to their accounts soon.


Read more →

How to increase capacity of drive from 32 to 256 GB for Asus E200H laptop

Level of difficultyEasy
Reading time5 min
Views1.7K

Hi Everyone.

Recently I got Asus E200H laptop, which I would like to use as a portable computer to work with a high-precision equipment.

Within the scope of the article we will perform the experiment about the upgrade of the default 32 GB eMMC capacitor to 256 GB and will test it.

Have a nice reading!

Read more

New plasma rocket engine: A promising experiment harnessing a hybrid of chemical and ion technology

Level of difficultyHard
Reading time37 min
Views2.2K

 Gleb Kulev, Candidate of Technical Sciences

The rapid exploration of outer space has faced a problem related to the insufficient efficiency of modern rocket engines. As a solution to this problem, the concept of a jet engine based on new operating principles is proposed, using a combination of known physical laws and having advantages over known types of jet engines.The article presents the results of testing three modifications of jet engines based on new operating principles and their analysis. The article discusses the physical principles of operation of a jet engine based on new principles, its advantages and problems arising during its creation.

Read more