Python *

Interpreted high-level programming language for general-purpose programming

Articles Posts News Authors

bashnick Jan 25 2023 at 02:03

Building a GPT-like Model from Scratch with Detailed Theory and Code Implementation

14 min

34K

Open Data Science corporate blogPython*Machine learning*Artificial IntelligenceNatural Language Processing*

Tutorial

Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.

Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.

+25

man_of_letters Jul 22 2021 at 09:03

Mode on: Comparing the two best colorization AI's

11 min

3.3K

RUVDS.com corporate blogPython*Image processing*Machine learning*TensorFlow*

This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.

Google Colorizing Transformer vs Deoldify

+17

sismetanin Nov 5 2020 at 14:13

Toxic Comments Detection in Russian

17 min

7.5K

VK corporate blogPython*Machine learning*Social networks and communities

Currently, social network sites tend to be one of the major communication platforms in both offline and online space. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This paper aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned ruBERT achieved F₁ = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.

+16

pushtaev Feb 10 2020 at 17:47

Tips and tricks from my Telegram-channel @pythonetc, January 2020

3 min

1.4K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications.

The order of except blocks matter: if exceptions can be caught by more than one block, the higher block applies. The following code doesn’t work as intended:

+15

pushtaev Jan 15 2020 at 13:32

Tips and tricks from my Telegram-channel @pythonetc, December 2019

2 min

1.6K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications.

Different asyncio tasks obviously have different stacks. You can view at all of them at any moment using asyncio.all_tasks() to get all currently running tasks and task.get_stack() to get a stack for each task.

+25

pushtaev Dec 17 2019 at 12:57

Tips and tricks from my Telegram-channel @pythonetc, November 2019

3 min

2.7K

VK corporate blogPython*Programming*

Tips and tricks from my Telegram-channel @pythonetc, November 2019

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications.

PATH is an environment variable that stores paths where executables are looked for. When you ask your shell to run ls, the shell looks for the ls executable file across all paths that are presented in PATH.

+29

Shapelez Dec 16 2019 at 20:25

Faster ENUM

9 min

2.3K

Qrator Labs corporate blogOpen source*Python*

tl;dr

github.com/QratorLabs/fastenum

pip install fast-enum

What are enums

(If you think you know that — scroll down to the “Enums in Standard Library” section).

Imagine that you need to describe a set of all possible states for the entities in your database model. You'll probably use a bunch of constants defined as module-level attributes:

# /path/to/package/static.py:
INITIAL = 0
PROCESSING = 1
PROCESSED = 2
DECLINED = 3
RETURNED = 4
...

...or as class-level attributes defined in their own class:

class MyModelStates:
  INITIAL = 0
  PROCESSING = 1
  PROCESSED = 2
  DECLINED = 3
  RETURNED = 4

That helps you refer to those states by their mnemonic names, while they persist in your storage as simple integers. By this, you get rid of magic numbers scattered through your code and make it more readable and self-descriptive.

But, both the module-level constant and the class with the static attributes suffer from the inherent nature of python objects: they are all mutable. You may accidentally assign a value to your constant at runtime, and that is a mess to debug and rollback your broken entities. So, you might want to make your set of constants immutable, which means both the number of constants declared and the values they are mapped to must not be modified at runtime.

+28

pushtaev Nov 13 2019 at 16:31

Tips and tricks from my Telegram-channel @pythonetc, October 2019

3 min

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

If you want to iterate over several iterables at once, you can use the zip function (it has nothing to do with ZIP file format):

+27

pushtaev Oct 8 2019 at 11:10

Tips and tricks from my Telegram-channel @pythonetc, September 2019

3 min

1.1K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

+24

Ontologic Sep 25 2019 at 13:55

How to Write a Smart Contract with Python on Ontology? Part 1: the Blockchain & Block API

5 min

Python*Distributed systems*Cryptocurrencies

From sandbox

This is an official tutorial published earlier on Ontology Medium blog
Excited to publish it for Habr readers. Feel free to ask any related questions and suggest a better format for tutorial materials

Foreword

In this article, we will begin to introduce the smart contract API of Ontology. The Ontology’s smart contract API is divided into 7 modules:

Part 1: Blockchain & Block API
Part 2: Storage API
Part 3: Runtime API
Part 4: Native API
Part 5: Upgrade API
Part 6: Execution Engine API
Part 7: Static & Dynamic Call API

In this article, we will introduce the Blockchain & Block API, which is the most basic part of the Ontology smart contract system. The Blockchain API supports basic blockchain query operations, such as obtaining the current block height, whereas the Block API supports basic block query operations, such as querying the number of transactions for a given block.

Let’s get started!

First, create a new contract in SmartX and then follow the instructions below.

1. How to Use Blockchain API

References to smart contract functions are identical to Python’s references. Developers can introduce the appropriate functions as needed. For example, the following statement introduces GetHeight, the function to get the current block height, and GetHeader, the function to get the block header.

+16

pushtaev Sep 5 2019 at 16:18

Tips and tricks from my Telegram-channel @pythonetc, August 2019

4 min

1.6K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

If an instance of a class doesn’t have an attribute with the given name, it tries to access the class attribute with the same name.

>>> class A:
...     x = 2
...
>>> A.x
2
>>> A().x
2

+25

pushtaev Aug 7 2019 at 15:34

Tips and tricks from my Telegram-channel @pythonetc, July 2019

4 min

1.4K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

You can’t mutate closure variables by simply assigning them. Python treats assignment as a definition inside a function body and doesn’t make closure at all.

+20

sismetanin Aug 1 2019 at 13:35

Contextual Emotion Detection in Textual Conversations Using Neural Networks

10 min

3.7K

VK corporate blogPython*Data Mining*Big Data*Machine learning*

Nowadays, talking to conversational agents is becoming a daily routine, and it is crucial for dialogue systems to generate responses as human-like as possible. As one of the main aspects, primary attention should be given to providing emotionally aware responses to users. In this article, we are going to describe the recurrent neural network architecture for emotion detection in textual conversations, that participated in SemEval-2019 Task 3 “EmoContext”, that is, an annual workshop on semantic evaluation. The task objective is to classify emotion (i.e. happy, sad, angry, and others) in a 3-turn conversational data set.

+37

markbenson123 Jul 31 2019 at 15:23

Python Vs R — Data Science

3 min

3.7K

Python*Programming*R*Learning languages

Tutorial

When mulling over the best programming language to use for data science, Python and R ring a bell (very quickly). While there are a lot of languages like C, C++, Java, Julia, Perl, and Scala, it's protected to state that Python and R are the harbingers in data science.

While a great deal of data researchers will discuss the customary shortcomings like data wrangling in R or data representation in Python, ongoing improvements like Altair for Python or R have adequately reacted to these shortcomings.

So which one would it be a good idea for you to decide for your next data investigation venture?

R has been ruling this space for a long time now. This bodes well as this programming language was explicitly intended for analysts.

+11

pushtaev Jul 4 2019 at 13:35

Tips and tricks from my Telegram-channel @pythonetc, June 2019

3 min

2.6K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

The \ symbol in regular string have special meaning. \t is tab character, \r is carriage return and so on.

You can use raw-strings to disable this behaviour. r'\t' is just backslash and t.

You obviously can’t use ' inside r'...'. However, it still can be escaped by \, but \ is preserved in the string:

+16

intellimath Jul 2 2019 at 14:16

Python consumes a lot of memory or how to reduce the size of objects?

7 min

85K

Python*

A memory problem may arise when a large number of objects are active in RAM during the execution of a program, especially if there are restrictions on the total amount of available memory.

Below is an overview of some methods of reducing the size of objects, which can significantly reduce the amount of RAM needed for programs in pure Python.

Note: This is english version of my original post (in russian).

+11

pushtaev Jun 4 2019 at 16:21

Tips and tricks from my Telegram-channel @pythonetc, May 2019

5 min

1.7K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

← Previous publications

+29

Syurmakov May 21 2019 at 13:53

A selection of Datasets for Machine learning

5 min

Python*Data Mining*Open data*Machine learning*Artificial Intelligence

Hi guys,

Before you is an article guide to open data sets for machine learning. In it, I, for a start, will collect a selection of interesting and fresh (relatively) datasets. And as a bonus, at the end of the article, I will attach useful links on independent search of datasets.

Less words, more data.

A selection of datasets for machine learning:

Data deaths and battles from the game of thrones — This data set combines three data sources, each based on information from a series of books.
Global Terrorism Database — Over 180,000 terrorist attacks worldwide, 1970-2017.
Bitcoin, historical data — Bitcoin data with an interval of 1 minute from selected exchanges, January 2012 — March 2019

+10

pushtaev May 8 2019 at 11:06

Tips and tricks from my Telegram-channel @pythonetc, April 2019

4 min

1.5K

VK corporate blogPython*Programming*

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.

+22

sismetanin Apr 30 2019 at 11:42

Google News and Leo Tolstoy: visualizing Word2Vec word embeddings using t-SNE

7 min

13K

VK corporate blogPython*Big Data*Data visualization*Machine learning*

Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.

This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.

+28