Pull to refresh

Development

Show first
Rating limit
Level of difficulty

Building a GPT-like Model from Scratch with Detailed Theory and Code Implementation

Reading time14 min
Views34K

Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.

Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.

Read more
Total votes 25: ↑25 and ↓0+25
Comments1

Mēris botnet, climbing to the record

Reading time7 min
Views16K

Introduction

For the last five years, there have virtually been almost no global-scale application-layer attacks.

During this period, the industry has learned how to cope with the high bandwidth network layer attacks, including amplification-based ones. It does not mean that botnets are now harmless.

End of June 2021, Qrator Labs started to see signs of a new assaulting force on the Internet – a botnet of a new kind. That is a joint research we conducted together with Yandex to elaborate on the specifics of the DDoS attacks enabler emerging in almost real-time.

Read more
Total votes 28: ↑28 and ↓0+28
Comments0

Tarantool: an analyst's view

Reading time8 min
Views1.9K
Hi all! I'm Andrey Kapustin. I work as a system analyst at Mail.ru Group. Our products form a unified ecosystem. Many independent infrastructures generate data in it: taxi and food delivery services, email services, social networks, etc. The faster and more precise we can predict a client's needs, the sooner and more correctly we can offer our products. 

Many system analysts and engineers are keen to know: 

  1. How to design the architecture of a trigger platform for real-time marketing?
  2. How to arrange a data structure that would be in line with the requirements of a marketing strategy for interacting with clients?
  3. How to ensure the stable operations of the  system under very heavy workloads? 

Such systems are based on technologies of high-load processing and Big Data analysis. We have accumulated considerable experience in these areas. Our expertise is in high demand on the market.  I'm going to show how we help our customers to switch from off-line to on-line in their interactions with clients using Real-Time Marketing solutions based on Tarantool.
Read more →
Total votes 26: ↑26 and ↓0+26
Comments0

The 2020 National Internet Segment Reliability Research

Reading time9 min
Views9.4K

The National Internet Segment Reliability Research explains how the outage of a single Autonomous System might affect the connectivity of the impacted region with the rest of the world. Most of the time, the most critical AS in the region is the dominant ISP on the market, but not always.

As the number of alternate routes between AS’s increases (and do not forget that the Internet stands for “interconnected network” — and each network is an AS), so does the fault-tolerance and stability of the Internet across the globe. Although some paths are from the beginning more important than others, establishing as many alternate routes as possible is the only viable way to ensure an adequately robust network.

The global connectivity of any given AS, regardless of whether it is an international giant or regional player, depends on the quantity and quality of its path to Tier-1 ISPs.

Usually, Tier-1 implies an international company offering global IP transit service over connections with other Tier-1 providers. Nevertheless, there is no guarantee that such connectivity will be maintained all the time. For many ISPs at all “tiers”, losing connection to just one Tier-1 peer would likely render them unreachable from some parts of the world.
Read more →
Total votes 26: ↑26 and ↓0+26
Comments0

Looking back at 3 months of the global traffic shapeshifting

Reading time9 min
Views3.2K
image
There would be no TL;DR in this article, sorry.

Those have been three months that genuinely changed the world. An entire lifeline passed from February, 1, when the coronavirus pandemics just started to spread outside of China and European countries were about to react, to April, 30, when nations were locked down in quarantine measures almost all over the entire world. We want to take a look at the repercussions, cyclic nature of the reaction and, of course, provide DDoS attacks and BGP incidents overview on a timeframe of three months.

In general, there seems to be an objective pattern in almost every country’s shift into the quarantine lockdown.
Read more →
Total votes 27: ↑27 and ↓0+27
Comments0

The Anatomy of LuaJIT Tables and What’s Special About Them

Reading time10 min
Views3.1K
I don't know about you, but I really like to get inside all sorts of systems. In this article, I’m going to tell you about the internals of Lua tables and special considerations for their use. Lua is my primary professional programming language, and if one wants to write good code, one needs at least to peek behind the curtain. If you are curious, follow me.


Read more →
Total votes 28: ↑28 and ↓0+28
Comments0

Tips and tricks from my Telegram-channel @pythonetc, December 2019

Reading time2 min
Views1.6K


It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.


Different asyncio tasks obviously have different stacks. You can view at all of them at any moment using asyncio.all_tasks() to get all currently running tasks and task.get_stack() to get a stack for each task.
Read more →
Total votes 27: ↑26 and ↓1+25
Comments0

Accelerating PHP connectors for Tarantool using Async, Swoole, and Parallel

Reading time6 min
Views2.2K


In the PHP ecosystem, there are currently two connectors for the Tarantool server: the official PECL extension tarantool/tarantool-php written in C, and tarantool-php/client written in PHP. I am the author of the latter one.

In this article I would like to share the results of performance testing of both these libraries and show how you can achieve 3x-5x performance improvement (on synthetic tests!) with minimal changes in code.
Read more →
Total votes 39: ↑39 and ↓0+39
Comments0

Tips and tricks from my Telegram-channel @pythonetc, November 2019

Reading time3 min
Views2.7K

Tips and tricks from my Telegram-channel @pythonetc, November 2019

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.



PATH is an environment variable that stores paths where executables are looked for. When you ask your shell to run ls, the shell looks for the ls executable file across all paths that are presented in PATH.
Read more →
Total votes 31: ↑30 and ↓1+29
Comments1

Faster ENUM

Reading time9 min
Views2.3K

tl;dr


github.com/QratorLabs/fastenum
pip install fast-enum

What are enums


(If you think you know that — scroll down to the “Enums in Standard Library” section).

Imagine that you need to describe a set of all possible states for the entities in your database model. You'll probably use a bunch of constants defined as module-level attributes:
# /path/to/package/static.py:
INITIAL = 0
PROCESSING = 1
PROCESSED = 2
DECLINED = 3
RETURNED = 4
...

...or as class-level attributes defined in their own class:
class MyModelStates:
  INITIAL = 0
  PROCESSING = 1
  PROCESSED = 2
  DECLINED = 3
  RETURNED = 4

That helps you refer to those states by their mnemonic names, while they persist in your storage as simple integers. By this, you get rid of magic numbers scattered through your code and make it more readable and self-descriptive.

But, both the module-level constant and the class with the static attributes suffer from the inherent nature of python objects: they are all mutable. You may accidentally assign a value to your constant at runtime, and that is a mess to debug and rollback your broken entities. So, you might want to make your set of constants immutable, which means both the number of constants declared and the values they are mapped to must not be modified at runtime.
Read more →
Total votes 28: ↑28 and ↓0+28
Comments0

Deploying Tarantool Cartridge applications with zero effort (Part 1)

Reading time8 min
Views1.9K


We have already presented Tarantool Cartridge that allows you to develop and pack distributed applications. Now let's learn how to deploy and control these applications. No panic, it's all under control! We have brought together all the best practices of working with Tarantool Cartridge and wrote an Ansible role, which will deploy the package to servers, start and join instances into replica sets, configure authorization, bootstrap vshard, enable automatic failover and patch cluster configuration.

Interesting, huh? Dive in, check details under the cut.
Read more →
Total votes 29: ↑29 and ↓0+29
Comments0

СodeSide. The new game for Russian AI Cup

Reading time3 min
Views2.4K


The AI Cup community and Mail.ru Group in collaboration with Codeforces.com invite you to the real battle! Get ready for the sleepless nights and calloused hands — take part in Russian AI Cup, which is one of the most challenging and vivid artificial intelligence programming competitions in the world. Believe us, managers of this madness did their best to create the game you'd want to play.


To become part of the competition, you need Internet access, computer, creativity, and enthusiasm for being a part of this extraordinary Cup. By the way, you might need some coffee. Welcome!

Read more →
Total votes 33: ↑33 and ↓0+33
Comments0

PVS-Studio in the Clouds: GitLab CI/CD

Reading time10 min
Views1K

Рисунок 2

This article continues the series of publications on usage of PVS-Studio in cloud systems. This time we'll look at the way the analyzer works along with GitLab CI, which is a product made by GitLab Inc. Static analyzer integration in a CI system allows detecting bugs right after the project build and is a highly effective way to reduce the cost of finding bugs.
Read more →
Total votes 30: ↑29 and ↓1+28
Comments0

On request of Embedded Developers: Detecting Errors in Amazon FreeRTOS

Reading time15 min
Views1.3K
Anyone who programs microcontrollers probably knows about FreeRTOS, or at least heard of this operating system. Amazon developers decided to enhance the abilities of this operating system to work with AWS Internet of Things services. This is how Amazon FreeRTOS appeared. We, developers of the PVS-Studio static code analyzer, were asked by mail and in comments to check these projects. Well, now get what you asked for. Keep reading to find out what came out of it.


Read more →
Total votes 27: ↑27 and ↓0+27
Comments0

Tarantool Kubernetes Operator

Reading time10 min
Views1.8K


Kubernetes has already become a de-facto standard for running stateless applications, mainly because it can reduce time-to-market for new features. Launching stateful applications, such as databases or stateful microservices, is still a complex task, but companies have to meet the competition and maintain a high delivery rate. So they create a demand for such solutions.

We want to introduce our solution for launching stateful Tarantool Cartridge clusters: Tarantool Kubernetes Operator, more under the cut.
Read more →
Total votes 34: ↑34 and ↓0+34
Comments0

Scanning the code of Orchard CMS for Bugs

Reading time12 min
Views989

Picture 6

This article reviews the results of a second check of the Orchard project with the PVS-Studio static analyzer. Orchard is an open-source content manager system delivered as part of the ASP.NET Open Source Gallery under the non-profit Outercurve Foundation. Today's check is especially interesting because both the project and the analyzer have come a long way since the first check, and this time we'll be looking at new diagnostic messages and some nice bugs.
Read more →
Total votes 34: ↑33 and ↓1+32
Comments0

Tarantool Data Grid: Architecture and Features

Reading time6 min
Views2.1K


In 2017, we won the competition for the development of the transaction core for Alfa-Bank's investment business and started working at once. (Vladimir Drynkin, Development Team Lead for Alfa-Bank's Investment Business Transaction Core, spoke about the investment business core at HighLoad++ 2018.) This system was supposed to aggregate transaction data in different formats from various sources, unify the data, save it, and provide access to it.

In the process of development, the system evolved and extended its functions. At some point, we realized that we created something much more than just application software designed for a well-defined scope of tasks: we created a system for building distributed applications with persistent storage. Our experience served as a basis for the new product, Tarantool Data Grid (TDG).

I want to talk about TDG architecture and the solutions that we worked out during the development. I will introduce the basic functions and show how our product could become the basis for building turnkey solutions.
Read more →
Total votes 39: ↑38 and ↓1+37
Comments0

MacOS 10.15 no longer supports 32-bit apps. What can you do?

Reading time2 min
Views1.8K

Picture 2

On October 7, 2019, Apple released a new version of its Mac operating system, macOS Catalina. Version 10.15 contains many changes and improvements. One of the significant is the complete phasing out of 32-bit applications. As a developer of such macOS apps, what can you do? That's right, port the app to the 64-bit platform. Will the application work properly from the first attempt? Perhaps, it's possible. Depends on the complexity and amount of the code. But most likely, developers will face a lot of non-obvious errors, which can previously detected using PVS-Studio.
Read more →
Total votes 33: ↑33 and ↓0+33
Comments0

Azure PowerShell: Mostly Harmless

Reading time17 min
Views803

Picture 6

Hello, everyone. Today we have another Microsoft project on the check. By the title of this article, you can guess that this time developers didn't «please» us with a large number of errors. We hope the project's authors won't be offended by the title. After all, a small number of errors is great, isn't it? However, we still managed to find something intriguing in the Azure PowerShell code. We suggest getting to know the features of this project and checking out errors, found using the PVS-Studio C# analyzer.
Read more →
Total votes 27: ↑27 and ↓0+27
Comments0