Pull to refresh

All streams

Show first
Rating limit
Level of difficulty

Building a GPT-like Model from Scratch with Detailed Theory and Code Implementation

Reading time 14 min
Views 33K

Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.

Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.

Read more
Total votes 25: ↑25 and ↓0 +25
Comments 1

Mēris botnet, climbing to the record

Reading time 7 min
Views 16K

Introduction

For the last five years, there have virtually been almost no global-scale application-layer attacks.

During this period, the industry has learned how to cope with the high bandwidth network layer attacks, including amplification-based ones. It does not mean that botnets are now harmless.

End of June 2021, Qrator Labs started to see signs of a new assaulting force on the Internet – a botnet of a new kind. That is a joint research we conducted together with Yandex to elaborate on the specifics of the DDoS attacks enabler emerging in almost real-time.

Read more
Total votes 28: ↑28 and ↓0 +28
Comments 0

10(+) years in the Labs

Reading time 4 min
Views 1.2K

At the beginning of the year 2021, Qrator Labs is celebrating its 10 year anniversary. On January 19 our company marks the official passing of a formal 10 years longevity mark, entering its second decade of existence. 

Everything started a little bit earlier - when at the age of 10 Alex saw the Robotron K 1820 - in 2008, when Alexander Lyamin - the founder and CEO of Qrator Labs, approached the Moscow State University superiors, where he worked as a NOC engineer at the time, with an idea of a DDoS-attack mitigation research project. The MSU's network was one of the largest in the country and, as we know now, it was the best place to hatch a future technology.

That time MSU administration agreed, and Mr Lyamin took his own hardware to the university, simultaneously gathering a team. In two years, by summer 2010, the project turned out to be that successful. It courted the DDoS attack of a bandwidth exceeding the MSU's upstream bandwidth capability. And on June 22 MSU superiors gave Mr Lyamin a choice - to shut down or find money to incorporate.

Alexander Lyamin chose to incorporate with his own means, which effectively meant that the needed infrastructure must be built from scratch. The initial design should be distributed instead of concentrated within one network, which resources were not enough for this specific task. And by September 1, 2010, those first server sites were ready and running.

Flashback with us
Total votes 28: ↑28 and ↓0 +28
Comments 1

Tarantool: an analyst's view

Reading time 8 min
Views 1.9K
Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products. 

Many system analysts and engineers are keen to know: 

  1. How to design the architecture of a trigger platform for real-time marketing?
  2. How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
  3. How to ensure the stable operations of the  system under very heavy workloads? 

Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market.  I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.
Read more →
Total votes 26: ↑26 and ↓0 +26
Comments 0

The 2020 National Internet Segment Reliability Research

Reading time 9 min
Views 9.4K

The National Internet Segment Reliability Research explains how the outage of a single Autonomous System might affect the connectivity of the impacted region with the rest of the world. Most of the time, the most critical AS in the region is the dominant ISP on the market, but not always.

As the number of alternate routes between AS’s increases (and do not forget that the Internet stands for “interconnected network” — and each network is an AS), so does the fault-tolerance and stability of the Internet across the globe. Although some paths are from the beginning more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust network.

The global connectivity of any given AS, regardless of whether it is an international giant or regional player, depends on the quantity and quality of its path to Tier-1 ISPs.

Usually, Tier-1 implies an international company offering global IP transit service over connections with other Tier-1 providers. Nevertheless, there is no guarantee that such connectivity will be maintained all the time. For many ISPs at all “tiers”, losing connection to just one Tier-1 peer would likely render them unreachable from some parts of the world.
Read more →
Total votes 26: ↑26 and ↓0 +26
Comments 0

Looking back at 3 months of the global traffic shapeshifting

Reading time 9 min
Views 3.2K
image
There would be no TL;DR in this article, sorry.

Those have been three months that genuinely changed the world. An entire lifeline passed from February, 1, when the coronavirus pandemics just started to spread outside of China and European countries were about to react, to April, 30, when nations were locked down in quarantine measures almost all over the entire world. We want to take a look at the repercussions, cyclic nature of the reaction and, of course, provide DDoS attacks and BGP incidents overview on a timeframe of three months.

In general, there seems to be an objective pattern in almost every country’s shift into the quarantine lockdown.
Read more →
Total votes 27: ↑27 and ↓0 +27
Comments 0

The Anatomy of LuaJIT Tables and What’s Special About Them

Reading time 10 min
Views 3K
I don't know about you, but I really like to get inside all sorts of systems. In this article, I’m going to tell you about the internals of Lua tables and special considerations for their use. Lua is my primary professional programming language, and if one wants to write good code, one needs at least to peek behind the curtain. If you are curious, follow me.


Read more →
Total votes 28: ↑28 and ↓0 +28
Comments 0

Self-Development: How I Couldn't Wear Two Hats and Found Third One

Reading time 18 min
Views 2.4K


Hi all! I lead antispam team and several machine learning teams at Mail.ru Group. The subject of this article is self-development for team leads/managers. But in reality many techniques and recipes do not depend on the role at all. This really concerns me because machine learning is developing extremely fast, and it takes a lot of time to stay up to date. So the question about what should be done for development and how is quite topical.

Of course, the content of this article is not the ultimate truth but just a description of the results of my continuing quest. It tells about approaches based on books and workshops, trials and errors, which have worked for me. It'll be good to have a discussion with you in comments.
Read more →
Total votes 26: ↑26 and ↓0 +26
Comments 0

Google Interviewing Process for Software Developer Role in 2020

Reading time 6 min
Views 83K

Hello! I just finished interviewing with Google and wanted to quickly catch you up on some interesting and frustrating steps of the process so that you can understand what to expect from Google interviews and the steps involved. I will also share some tips on how to prepare for the interview and mistakes to avoid.

If you’re looking for a success story, this is the wrong post for you. I actually failed the interviewing process, but the whole experience was pretty interesting for me and leads me on to another stage of my career. I will share more details on this at the end of the post. All names and identifying details have been changed to protect the privacy of Google employees.
Read more →
Total votes 71: ↑70 and ↓1 +69
Comments 9

Tips and tricks from my Telegram-channel @pythonetc, December 2019

Reading time 2 min
Views 1.5K


It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.


Different asyncio tasks obviously have different stacks. You can view at all of them at any moment using asyncio.all_tasks() to get all currently running tasks and task.get_stack() to get a stack for each task.
Read more →
Total votes 27: ↑26 and ↓1 +25
Comments 0

Accelerating PHP connectors for Tarantool using Async, Swoole, and Parallel

Reading time 6 min
Views 2.2K


In the PHP ecosystem, there are currently two connectors for the Tarantool server: the official PECL extension tarantool/tarantool-php written in C, and tarantool-php/client written in PHP. I am the author of the latter one.

In this article I would like to share the results of performance testing of both these libraries and show how you can achieve 3x-5x performance improvement (on synthetic tests!) with minimal changes in code.
Read more →
Total votes 39: ↑39 and ↓0 +39
Comments 0

Tips and tricks from my Telegram-channel @pythonetc, November 2019

Reading time 3 min
Views 2.6K

Tips and tricks from my Telegram-channel @pythonetc, November 2019

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.



PATH is an environment variable that stores paths where executables are looked for. When you ask your shell to run ls, the shell looks for the ls executable file across all paths that are presented in PATH.
Read more →
Total votes 31: ↑30 and ↓1 +29
Comments 1

Faster ENUM

Reading time 9 min
Views 2.2K

tl;dr


github.com/QratorLabs/fastenum
pip install fast-enum

What are enums


(If you think you know that — scroll down to the “Enums in Standard Library” section).

Imagine that you need to describe a set of all possible states for the entities in your database model. You'll probably use a bunch of constants defined as module-level attributes:
# /path/to/package/static.py:
INITIAL = 0
PROCESSING = 1
PROCESSED = 2
DECLINED = 3
RETURNED = 4
...

...or as class-level attributes defined in their own class:
class MyModelStates:
  INITIAL = 0
  PROCESSING = 1
  PROCESSED = 2
  DECLINED = 3
  RETURNED = 4

That helps you refer to those states by their mnemonic names, while they persist in your storage as simple integers. By this, you get rid of magic numbers scattered through your code and make it more readable and self-descriptive.

But, both the module-level constant and the class with the static attributes suffer from the inherent nature of python objects: they are all mutable. You may accidentally assign a value to your constant at runtime, and that is a mess to debug and rollback your broken entities. So, you might want to make your set of constants immutable, which means both the number of constants declared and the values they are mapped to must not be modified at runtime.
Read more →
Total votes 28: ↑28 and ↓0 +28
Comments 0

Deploying Tarantool Cartridge applications with zero effort (Part 1)

Reading time 8 min
Views 1.9K


We have already presented Tarantool Cartridge that allows you to develop and pack distributed applications. Now let's learn how to deploy and control these applications. No panic, it's all under control! We have brought together all the best practices of working with Tarantool Cartridge and wrote an Ansible role, which will deploy the package to servers, start and join instances into replica sets, configure authorization, bootstrap vshard, enable automatic failover and patch cluster configuration.

Interesting, huh? Dive in, check details under the cut.
Read more →
Total votes 29: ↑29 and ↓0 +29
Comments 0

Nginx's office is being searched due to Rambler Group's lawsuit. The complaintant press service confirmed the suit

Reading time 5 min
Views 17K
According to one of the employees Nginx's Moscow office is being searched due to the criminal case brought by Rambler Group (the official response of the company's press office to this issue and confirmation of claims against Nginx is below). The photo of the search warrant is provided as the evidence of the criminal case initiated on December 4, 2019 under Article 146 of the Criminal Code of the Russian Federation 'Violation of Author's and Neighboring Rights'.

Nginx search warrant


It is assumed the complaintant is Rambler, and the defendant is still an 'unidentified group of persons', and in the long run — the founder of Nginx, Igor Sysoyev.

The point of the claim: Igor started working on Nginx as an employee of Rambler and only after the tool became popular he founded a separate company and attracted investments.

It is not clear why Rambler revised its 'property' only 15 years later.
Total votes 78: ↑78 and ↓0 +78
Comments 4

СodeSide. The new game for Russian AI Cup

Reading time 3 min
Views 2.4K


The AI Cup community and Mail.ru Group in collaboration with Codeforces.com invite you to the real battle! Get ready for the sleepless nights and calloused hands — take part in Russian AI Cup, which is one of the most challenging and vivid artificial intelligence programming competitions in the world. Believe us, managers of this madness did their best to create the game you'd want to play.


To become part of the competition, you need Internet access, computer, creativity, and enthusiasm for being a part of this extraordinary Cup. By the way, you might need some coffee. Welcome!

Read more →
Total votes 33: ↑33 and ↓0 +33
Comments 0

PVS-Studio in the Clouds: GitLab CI/CD

Reading time 10 min
Views 1K

Рисунок 2

This article continues the series of publications on usage of PVS-Studio in cloud systems. This time we'll look at the way the analyzer works along with GitLab CI, which is a product made by GitLab Inc. Static analyzer integration in a CI system allows detecting bugs right after the project build and is a highly effective way to reduce the cost of finding bugs.
Read more →
Total votes 30: ↑29 and ↓1 +28
Comments 0

Tips and tricks from my Telegram-channel @pythonetc, October 2019

Reading time 3 min
Views 1K

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications


If you want to iterate over several iterables at once, you can use the zip function (it has nothing to do with ZIP file format):
Read more →
Total votes 29: ↑28 and ↓1 +27
Comments 0

On request of Embedded Developers: Detecting Errors in Amazon FreeRTOS

Reading time 15 min
Views 1.3K
Anyone who programs microcontrollers probably knows about FreeRTOS, or at least heard of this operating system. Amazon developers decided to enhance the abilities of this operating system to work with AWS Internet of Things services. This is how Amazon FreeRTOS appeared. We, developers of the PVS-Studio static code analyzer, were asked by mail and in comments to check these projects. Well, now get what you asked for. Keep reading to find out what came out of it.


Read more →
Total votes 27: ↑27 and ↓0 +27
Comments 0

Tarantool Kubernetes Operator

Reading time 10 min
Views 1.8K


Kubernetes has already become a de-facto standard for running stateless applications, mainly because it can reduce time-to-market for new features. Launching stateful applications, such as databases or stateful microservices, is still a complex task, but companies have to meet the competition and maintain a high delivery rate. So they create a demand for such solutions.

We want to introduce our solution for launching stateful Tarantool Cartridge clusters: Tarantool Kubernetes Operator, more under the cut.
Read more →
Total votes 34: ↑34 and ↓0 +34
Comments 0