Python *

Interpreted high-level programming language for general-purpose programming

ArticlesPostsNewsAuthors

bashnick Jan 24 2023 at 23:03

Building a GPT-like Model from Scratch with Detailed Theory and Code Implementation

14 min

39K

Open Data Science corporate blogPython*Machine learning*Artificial IntelligenceNatural Language Processing*

Tutorial

Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.

Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.

+25

NickThe1 Dec 18 2022 at 09:30

Asymmetric horizontal distribution for time series

4 min

1.6K

Python*

From sandbox

The goal of paper is to demonstrate non-trivial approaches to give statistical estimate for forecast result. Idea comes from probability cone concept. A probability cone is an indicator that forecasts a statistical distribution from a set point in time into the future. This acticle provide alternative approaches using machine learning, regression analysis.

netsvetaev Dec 2 2022 at 15:02

InvokeAI 2.2: UI Outpainting, Embedding Management and more

2 min

6.3K

Python*Image processing*Machine learning*Graphic design*Artificial Intelligence

InvokeAI 2.2 is now available to everyone. This update brings in exciting features, like UI Outpainting, Embedding Management and more. See highlighted updates below, or the full release notes for everything included in the release.

What’s new?

sepulkary Nov 23 2022 at 10:48

Python Junior Plus, or the beginner's Roadmap to becoming a Python programmer

8 min

7.9K

Python*IT careerProgramming*

Hello! My name is Mikhail Emelyanov, I am embedded software engineer, and I was inspired to write this little roadmap on the capabilities of Python language by a certain commonality among the existing Python tutorials found on the web.

The usual suggestions to study, say, “Algorithms and Data Structures” or “Databases” are especially jarring. You can spend years studying these topics, and even after decades you'd still be able to find something you didn't know yet even without ever venturing outside the scope of Algorithms!

Using video game analogies, we can say that novice programmers often stand on the shore of the lake of boiling lava with an island with the ever-coveted jobs in the center, while the islands in between, which you have to jump on, gradually increasing your skills in successive mini-quests, are either missing, or arranged haphazardly, or their fairly smooth sequence breaks off, never having managed to get you any farther from the shore. Let's try to build a path of hint islands, a number of which, although not without effort, will finally allow us to reach our goal.

netsvetaev Nov 13 2022 at 00:16

I trained a neural network on my drawings and give the model for free (and teach you to create your own)

2 min

3.5K

Python*Image processing*Machine learning*Graphic design*Artificial Intelligence

Tutorial

Great for seamless patterns, abstract drawings, and watercolor-styled images. How to use it and train a neural network on your own pictures?

Download the model here: https://huggingface.co/netsvetaev/netsvetaev-free

I wanna know!

netsvetaev Nov 4 2022 at 08:24

InvokeAI 2.1 Release

2 min

1.5K

Artificial IntelligencePython*Image processing*

The InvokeAI team is excited to share our latest feature release, with a set of new features, UI enhancements, and CLI capabilities.

DevPer3301 Oct 13 2022 at 21:51

What are OAuth 2 and OpenID Connect (OIDC)?

6 min

2.9K

Ruby*Python*JavaScript*C++*Node.JS*

Author: Denis Zherdetskiy

vldmrvslv Jun 29 2022 at 14:24

Detecting attempts of mass influencing via social networks using NLP. Part 2

3 min

1.1K

Data Mining*Twitter API*Natural Language Processing*Python*Big Data*

Tutorial

In Part 1 of this article, I built and compared two classifiers to detect trolls on Twitter. You can check it out here.

Now, time has come to look more deeply into the datasets to find some patterns using exploratory data analysis and topic modelling.

EDA

To do just that, I first created a word cloud of the most common words, which you can see below.

vldmrvslv Jun 29 2022 at 14:20

Detecting attempts of mass influencing via social networks using NLP. Part 1

5 min

1.6K

Twitter API*Natural Language Processing*Data Mining*Python*Big Data*

Tutorial

During the last decades, the world’s population has been developing as an information society, which means that information started to play a substantial end-to-end role in all life aspects and processes. In view of the growing demand for a free flow of information, social networks have become a force to be reckoned with. The ways of war-waging have also changed: instead of conventional weapons, governments now use political warfare, including fake news, a type of propaganda aimed at deliberate disinformation or hoaxes. And the lack of content control mechanisms makes it easy to spread any information as long as people believe in it.

Based on this premise, I’ve decided to experiment with different NLP approaches and build a classifier that could be used to detect either bots or fake content generated by trolls on Twitter in order to influence people.

In this first part of the article, I will cover the data collection process, preprocessing, feature extraction, classification itself and the evaluation of the models’ performance. In Part 2, I will dive deeper into the troll problem, conduct exploratory analysis to find patterns in the trolls’ behaviour and define the topics that seemed of great interest to them back in 2016.

Features for analysis

From all possible data to use (like hashtags, account language, tweet text, URLs, external links or references, tweet date and time), I settled upon English tweet text, Russian tweet text and hashtags. Tweet text is the main feature for analysis because it contains almost all essential characteristics that are typical for trolling activities in general, such as abuse, rudeness, external resources references, provocations and bullying. Hashtags were chosen as another source of textual information as they represent the central message of a tweet in one or two words.

vldmrvslv Jun 23 2022 at 15:04

How we tackled document recognition issues for autonomus and automatic payments using OCR and NER

5 min

1.3K

Python*Natural Language Processing*

From sandbox

In this article, I would like to describe how we’ve tackled the named entity recognition (aka NER) issue at Sber with the help of advanced AI techniques. It is one of many natural language processing (NLP) tasks that allows you to automatically extract data from unstructured text. This includes monetary values, dates, or names, surnames and positions.

Just imagine countless textual documents even a medium-sized organisation deals with on a daily basis, let alone huge corporations. Take Sber, for example: it is the largest financial institution in Russia, Central and Eastern Europe that has about 16,500 offices with over 250,000 employees, 137 million retail and 1.1 million corporate clients in 22 countries. As you can imagine, with such an enormous scale, the company collaborates with hundreds of suppliers, contractors and other counterparties, which implies thousands of contracts. For instance, the estimated number of legal documents to be processed in 2022 has been over 65,000, each of them consisting of 30 pages on average. During the lifecycle of a contract, a contract usually updated with 3 to 5 additional agreements. On top of this, a contract is accompanied by various source documents describing transactions. And in the PDF format, too.

Previously, the processing duty befell our service centre’s employees who checked whether payment details in a bill match those in the contract and then sent the document to the Accounting Department where an accountant double-checked everything. This is quite a long journey to a payment, right?

danilovmy Jun 7 2022 at 03:59

Django ModelAdmins autoregister

4 min

Python*Django*

Some time ago I discovered that Django has the ability to auto-register ModelAdmins. Since this is not common knowledge and carries a number of benefits, I decided to write an article about it to bring it to the attention of the Django community.

Read about Django ModelAdmin autoregistry

danilovmy Apr 10 2022 at 18:21

The Python and Django conferences you don't want to miss in 2022

2 min

1.6K

Python*Django*

Quick reminder for those, who would like to broaden their horizons: there is “PyCon DE Berlin 2022 ” coming up already next week!

More about conferences 2022 in Europa

danilovmy Apr 5 2022 at 10:34

Django admin dynamic Inline positioning

5 min

12K

Python*Django*

Tutorial

Translation

Recently I've received an interesting request from a client about one of our Django projects.
He asked if it would be possible to show an inline component above other fields in the Django admin panel.

At the beginning I thought, that there shouldn't be any issue with that.
Though there was no easy solution other then installing another battery to the project. My gut feeling told me, there were another way around that problem.

Stop losing clients! Or how a developer can test a website, by the example of PVS-Studio. Part 1

15 min

PVS-Studio corporate blogIT systems testing*Python*Django*Web services testing*

Tutorial

A website with bugs could be a real pain in the neck for business. Just one 404 or 500 error could end up costing an obscene amount of money for the company and hurt a good reputation. But there is a way to avoid this issue: the website testing. That's sort of what this article is about. After reading this article, you will learn how to test code in Django, create your "own website tester" and much more. Welcome to the article.

byko3y Oct 25 2021 at 09:51

Easy concurrency with Python Shared Object

23 min

8.9K

Concurrent computing*Open source*Python*

Project repository.
Year old article about general concepts of the project.

So you want to build a multitasking system using python? But you actually hesitate because you know you'll have to either use multitasking module, which is slow and/or somewhat inconvenient, or a more powerfull external tool like Redis or RabbitMQ or even large DBMS like MongoDB or PostgreSQL, which require some glue (i.e. very far from native python code) and apply their own restrictions on what you can do with your data. If you think «why do I need so much hassle if I just want to run few worker threads in python using the data structures I already have in my python program and using functions I've already written? I just want to run this code in threads! Oh, I wish there was no GIL in Python» — then welcome to the club.

Of course many of us can build from scratch a decent tool that would make use of multiple cores. However, having already existing working software (Pandas, Tensorflow, SciPy, etc) is always cheaper than any development of new software. But the status quo in CPython tells us one thing: you cannot remove GIL because everything is based on GIL. Although making shit into gold could require much work, the ability to alleviate the transition from slow single-threaded shit to a slow not-so-single-threaded gold-looking shit might be worth it, so you won't have to rewrite your whole system from scratch.

snakers4 Oct 6 2021 at 14:20

We have published a model for text repunctuation and recapitalization for four languages

7 min

7.3K

Machine learning*Python*Natural Language Processing*Big Data*

Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.

Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.

For this reason, we would like to share a system that:

Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
Works for 4 languages (Russian, English, German, Spanish) and can be extended;
By design is domain agnostic and is not based on any hard-coded rules;
Has non-trivial metrics and succeeds in the task of improving text readability;

To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.

man_of_letters Jul 22 2021 at 06:03

Mode on: Comparing the two best colorization AI's

11 min

RUVDS.com corporate blogPython*TensorFlow*Machine learning*Image processing*

This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.

Google Colorizing Transformer vs Deoldify

+13

m31 Jul 1 2021 at 13:40

Data Phoenix Digest — 01.07.2021

5 min

1.9K

Python*Algorithms*Big Data*Machine learning*Artificial Intelligence

We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.

Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!

The new issue of the new Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, Facebook.

-1

m31 Jun 24 2021 at 10:09

DataScience Digest — 24.06.21

5 min

1.9K

Artificial IntelligenceMachine learning*Big Data*Algorithms*Python*

The new issue of DataScienceDigest is here!

The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more.

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, LinkedIn, Facebook.

RUDIEMEANT Jun 14 2021 at 14:35

[Tutorial] How to set up Atom IDE for python development

3 min

20K

Development for Windows*Programming*Python*GitHub**nix*

From sandbox

Setting up Atom for working with python is quite a tricky task. I've spent a lot of time making it work. Autocompleting, autoformatting, type hints, and much more will be available to you after reading this tutorial.

Open tutorial

1 2

4 5 6 7 8