Pull to refresh
343.21

Python *

Interpreted high-level programming language for general-purpose programming

Show first
Rating limit
Level of difficulty

Building a GPT-like Model from Scratch with Detailed Theory and Code Implementation

Reading time14 min
Views34K

Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.

Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.

Read more
Total votes 25: ↑25 and ↓0+25
Comments1

Mode on: Comparing the two best colorization AI's

Reading time11 min
Views3.3K

This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.


Google Colorizing Transformer vs Deoldify

Read more →
Total votes 17: ↑17 and ↓0+17
Comments0

Toxic Comments Detection in Russian

Reading time17 min
Views7.5K

Currently, social network sites tend to be one of the major communication platforms in both offline and online space. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This paper aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned ruBERT achieved F1 = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.
Read more →
Total votes 18: ↑17 and ↓1+16
Comments0

Tips and tricks from my Telegram-channel @pythonetc, January 2020

Reading time3 min
Views1.4K


It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.


The order of except blocks matter: if exceptions can be caught by more than one block, the higher block applies. The following code doesn’t work as intended:
Read more →
Total votes 15: ↑15 and ↓0+15
Comments0

Tips and tricks from my Telegram-channel @pythonetc, December 2019

Reading time2 min
Views1.6K


It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.


Different asyncio tasks obviously have different stacks. You can view at all of them at any moment using asyncio.all_tasks() to get all currently running tasks and task.get_stack() to get a stack for each task.
Read more →
Total votes 27: ↑26 and ↓1+25
Comments0

Tips and tricks from my Telegram-channel @pythonetc, November 2019

Reading time3 min
Views2.7K

Tips and tricks from my Telegram-channel @pythonetc, November 2019

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.



PATH is an environment variable that stores paths where executables are looked for. When you ask your shell to run ls, the shell looks for the ls executable file across all paths that are presented in PATH.
Read more →
Total votes 31: ↑30 and ↓1+29
Comments1

Faster ENUM

Reading time9 min
Views2.3K

tl;dr


github.com/QratorLabs/fastenum
pip install fast-enum

What are enums


(If you think you know that — scroll down to the “Enums in Standard Library” section).

Imagine that you need to describe a set of all possible states for the entities in your database model. You'll probably use a bunch of constants defined as module-level attributes:
# /path/to/package/static.py:
INITIAL = 0
PROCESSING = 1
PROCESSED = 2
DECLINED = 3
RETURNED = 4
...

...or as class-level attributes defined in their own class:
class MyModelStates:
  INITIAL = 0
  PROCESSING = 1
  PROCESSED = 2
  DECLINED = 3
  RETURNED = 4

That helps you refer to those states by their mnemonic names, while they persist in your storage as simple integers. By this, you get rid of magic numbers scattered through your code and make it more readable and self-descriptive.

But, both the module-level constant and the class with the static attributes suffer from the inherent nature of python objects: they are all mutable. You may accidentally assign a value to your constant at runtime, and that is a mess to debug and rollback your broken entities. So, you might want to make your set of constants immutable, which means both the number of constants declared and the values they are mapped to must not be modified at runtime.
Read more →
Total votes 28: ↑28 and ↓0+28
Comments0

How to Write a Smart Contract with Python on Ontology? Part 1: the Blockchain & Block API

Reading time5 min
Views3K
image

This is an official tutorial published earlier on Ontology Medium blog
Excited to publish it for Habr readers. Feel free to ask any related questions and suggest a better format for tutorial materials

Foreword


In this article, we will begin to introduce the smart contract API of Ontology. The Ontology’s smart contract API is divided into 7 modules:


In this article, we will introduce the Blockchain & Block API, which is the most basic part of the Ontology smart contract system. The Blockchain API supports basic blockchain query operations, such as obtaining the current block height, whereas the Block API supports basic block query operations, such as querying the number of transactions for a given block.

Let’s get started!

First, create a new contract in SmartX and then follow the instructions below.

1. How to Use Blockchain API


References to smart contract functions are identical to Python’s references. Developers can introduce the appropriate functions as needed. For example, the following statement introduces GetHeight, the function to get the current block height, and GetHeader, the function to get the block header.
Read more →
Total votes 20: ↑18 and ↓2+16
Comments0

Tips and tricks from my Telegram-channel @pythonetc, August 2019

Reading time4 min
Views1.6K


It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications


If an instance of a class doesn’t have an attribute with the given name, it tries to access the class attribute with the same name.

>>> class A:
...     x = 2
...
>>> A.x
2
>>> A().x
2
Read more →
Total votes 29: ↑27 and ↓2+25
Comments1

Tips and tricks from my Telegram-channel @pythonetc, July 2019

Reading time4 min
Views1.4K

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications


You can’t mutate closure variables by simply assigning them. Python treats assignment as a definition inside a function body and doesn’t make closure at all.
Read more →
Total votes 22: ↑21 and ↓1+20
Comments0

Contextual Emotion Detection in Textual Conversations Using Neural Networks

Reading time10 min
Views3.7K

Nowadays, talking to conversational agents is becoming a daily routine, and it is crucial for dialogue systems to generate responses as human-like as possible. As one of the main aspects, primary attention should be given to providing emotionally aware responses to users. In this article, we are going to describe the recurrent neural network architecture for emotion detection in textual conversations, that participated in SemEval-2019 Task 3 “EmoContext”, that is, an annual workshop on semantic evaluation. The task objective is to classify emotion (i.e. happy, sad, angry, and others) in a 3-turn conversational data set.
Read more →
Total votes 37: ↑37 and ↓0+37
Comments0

Python Vs R — Data Science

Reading time3 min
Views3.7K
When mulling over the best programming language to use for data science, Python and R ring a bell (very quickly). While there are a lot of languages like C, C++, Java, Julia, Perl, and Scala, it's protected to state that Python and R are the harbingers in data science.

While a great deal of data researchers will discuss the customary shortcomings like data wrangling in R or data representation in Python, ongoing improvements like Altair for Python or R have adequately reacted to these shortcomings.

So which one would it be a good idea for you to decide for your next data investigation venture?

R has been ruling this space for a long time now. This bodes well as this programming language was explicitly intended for analysts.
Read more →
Total votes 11: ↑11 and ↓0+11
Comments0

Tips and tricks from my Telegram-channel @pythonetc, June 2019

Reading time3 min
Views2.6K

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications


The \ symbol in regular string have special meaning. \t is tab character, \r is carriage return and so on.

You can use raw-strings to disable this behaviour. r'\t' is just backslash and t.

You obviously can’t use ' inside r'...'. However, it still can be escaped by \, but \ is preserved in the string:
Read more →
Total votes 24: ↑20 and ↓4+16
Comments0

Python consumes a lot of memory or how to reduce the size of objects?

Reading time7 min
Views85K

A memory problem may arise when a large number of objects are active in RAM during the execution of a program, especially if there are restrictions on the total amount of available memory.


Below is an overview of some methods of reducing the size of objects, which can significantly reduce the amount of RAM needed for programs in pure Python.


Note: This is english version of my original post (in russian).

Read more →
Total votes 15: ↑13 and ↓2+11
Comments3

A selection of Datasets for Machine learning

Reading time5 min
Views7K
Hi guys,

Before you is an article guide to open data sets for machine learning. In it, I, for a start, will collect a selection of interesting and fresh (relatively) datasets. And as a bonus, at the end of the article, I will attach useful links on independent search of datasets.

Less words, more data.

image

A selection of datasets for machine learning:


Read more →
Total votes 12: ↑11 and ↓1+10
Comments0

Google News and Leo Tolstoy: visualizing Word2Vec word embeddings using t-SNE

Reading time7 min
Views13K

Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.

This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.

Read more →
Total votes 28: ↑28 and ↓0+28
Comments0
1

Authors' contribution