This is a translated and adopted article I wrote for the Aha'22 (30 May 2022) conference. It describes an approach to a marketplace prices optimisation. Here I've outlined some important definitions and tried to define the scopes and roles of ML, algorithms, and humans in optimal pricing. Although the article covers rather basic things, still, you can find out some new formulas and ideas, because these basics are somewhat "well-known only in a very closed clubs", and besides, the real gem found here is the detailed recipe for ML engineers how to build optimal pricing systems.
Machine learning *
The basis of artificial intelligence
In this post I'll share my experience in adjustment of WiFi physical channel. The channel was implemented on a software defined radio (SDR) platform. WiFi looks like a very complicated thing standardized over hundreds of pages. Could a non-expert with a PC and a couple of 100$ devices (HackRFs) somehow improve it? Here I try to develop a WiFi optimization approach basically agnostic of protocol implementation details. There's some math and Python programming in it.
Video recording of our webinar about dstack and reproducible ML workflows, AVL binary tree operations, Ultralytics YOLOv8, training XGBoost, productionize ML models, introduction to forecasting ensembles, domain expansion of image generators, Muse, X-Decoder, Box2Mask, RoDynRF, AgileAvatar and more.
Unlock the power of Transformer Neural Networks and learn how to build your own GPT-like model from scratch. In this in-depth guide, we will delve into the theory and provide a step-by-step code implementation to help you create your own miniGPT model. The final code is only 400 lines and works on both CPUs as well as on the GPUs. If you want to jump straight to the implementation here is the GitHub repo.
Transformers are revolutionizing the world of artificial intelligence. This simple, but very powerful neural network architecture, introduced in 2017, has quickly become the go-to choice for natural language processing, generative AI, and more. With the help of transformers, we've seen the creation of cutting-edge AI products like BERT, GPT-x, DALL-E, and AlphaFold, which are changing the way we interact with language and solve complex problems like protein folding. And the exciting possibilities don't stop there - transformers are also making waves in the field of computer vision with the advent of Vision Transformers.
InvokeAI 2.2 is now available to everyone. This update brings in exciting features, like UI Outpainting, Embedding Management and more. See highlighted updates below, or the full release notes for everything included in the release.
I trained a neural network on my drawings and give the model for free (and teach you to create your own)
Toloka is a crowdsourcing platform and microtasking project launched by Yandex to quickly markup large amounts of data. But how can such a simple concept play a crucial role in improving the work of neural networks?
The present invention relates to an analog signal capturing devices generally and monochrome or color image capture sensors, such as a scanner or a Charge-Coupled-Device (“CCD”) for video and photo camera in particular, which are almost free from moiré and aliasing. The present invention relates to methods for enhancing the resolution of an image capture device and device for digital color/grey image displaying also.
FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub.
FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. The paper, presentation, and appendix are available in DistributedML’21 Proceedings (https://dl.acm.org/doi/abs/10.1145/3488659.3493775).
The project is distributed in open source form under Apache License Version 2.0. Code Repository: https://github.com/burlachenkok/flpytorch.
To become familiar with that tool, I recommend the following sequence of steps:
Alexander Volchek, IT entrepreneur, CEO educational platform GeekBrains
Pretty much everyone in the IT community is talking metaverses, NFTs, blockchain and cryptocurrency. This time we will discuss metaverses, and come back to everything else in the letters to follow. Entrepreneurs and founders of tech giants are passionate about this idea, and investors are allocating millions of dollars for projects dealing with metaverses. Let's start with the basics.
Explaining through simple examples
For a long time, people have been thinking on how to create a computer that could think like a person. The advent of artificial neural networks is a significant step in this direction. Our brain consists of neurons that receive information from sensory organs and process it: we recognize people we know by their faces, and we feel hungry when we see delicious food. All of this is the result of brain neurons working and interacting with each other. This is also the principle that artificial neural networks are based on, simulating the processes occurring in the human brain.
What are neural networks
Artificial neural networks are a software code that imitates the work of a brain and is capable of self-learning. Like a biological network, an artificial network also consists of neurons, but they have a simpler structure.
If you connect neurons into a sufficiently large network with controlled interaction, they will be able to perform quite complex tasks. For example, determining what is shown in a picture, or independently creating a photorealistic image based on a text description.
In this article, we shall provide some background on how multilingual multi-speaker models work and test an Indic TTS model that supports 9 languages and 17 speakers (Hindi, Malayalam, Manipuri, Bengali, Rajasthani, Tamil, Telugu, Gujarati, Kannada).
It seems a bit counter-intuitive at first that one model can support so many languages and speakers provided that each Indic language has its own alphabet, but we shall see how it was implemented.
Also, we shall list the specs of these models like supported sampling rates and try something cool – making speakers of different Indic languages speak Hindi. Please, if you are a native speaker of any of these languages, share your opinion on how these voices sound, both in their respective language and in Hindi.
The first text-based CAPTCHA ( we’ll call it just CAPTCHA for the sake of brevity ) was used in 1997 by AltaVista search engine. It prevented bots from adding Uniform Resource Locator (URLs) to their web search engine.
Back then it was a decent defense measure. However the progress can't be stopped, and this defense was bypassed using OCR available at those times (for example FineReader).
CAPTCHA became more complex, noise was added to it, along with distortions, so the popular OCRs couldn’t recognize this text. And then OCRs custom made for this task appeared. It costed extra money and knowledge for the attacking side. The CAPTCHA developers were required to understand the challenges the attackers met, what distortions to add, in order to make the automation of the CAPTCHA recognition more complex.
The misunderstanding of the principles the OCRs were based on, some CAPTCHAs were given such distortions, that they were more of a hassle for regular users than for a machine.
OCRs for different types of CAPTCHAs were made using heuristics, and the most complicated part of it was the CAPTCHA segmentation for the stand along symbols, that subsequently could be easily recognized by the CNN (for example LeNet-5), also SVM showed a good result even on the raw pixels.
In this article I’ll try to grasp the whole history of CAPTCHA recognition, from heuristics to the contemporary automated recognition systems. We’ll figure out, if a CAPTCHA is still alive.
I’ll review the yandex.com CAPTCHA. The Russian version of the same CAPTCHA is more complex.
In our last article we made a bunch of promises about our speech synthesis.
After a lot of hard work we finally have delivered upon these promises:
- Model size reduced 2x;
- New models are 10x faster;
- We added flags to control stress;
- Now the models can make proper pauses;
- High quality voice added (and unlimited "random" voices);
- All speakers squeezed into the same model;
- Input length limitations lifted, now models can work with paragraphs of text;
- Pauses, speed and pitch can be controlled via SSML;
- Sampling rates of 8, 24 or 48 kHz are supported;
- Models are much more stable — they do not omit words anymore;
This is a truly break-through achievement for us and we are not planning to stop anytime soon. We will be adding as many languages as possible shortly (the CIS languages, English, European languages, Hindic languages). Also we are still planning to make our models additional 2-5x faster.
We are also planning to add phonemes and a new model for stress, as well as to reduce the minimum amount of audio required to train a high-quality voice to 5 — 15 minutes.
Multimodality has led the pack in machine learning in 2021. Neural networks are wolfing down images, text, speech and music all at the same time. OpenAI is, as usual, top dog, but as if in defiance of their name, they are in no hurry to share their models openly. At the beginning of the year, the company presented the DALL-E neural network, which generates 256x256 pixel images in answer to a written request. Descriptions of it can be found as articles on arXiv and examples on their blog.
As soon as DALL-E flushed out of the bushes, Chinese researchers got on its tail. Their open-source CogView neural network does the same trick of generating images from text. But what about here in Russia? One might say that “investigate, master, and train” is our engineering motto. Well, we caught the scent, and today we can say that we created from scratch a complete pipeline for generating images from descriptive textual input written in Russian.
In this article we present the ruDALL-E XL model, an open-source text-to-image transformer with 1.3 billion parameters as well as ruDALL-E XXL model, an text-to-image transformer with 12.0 billion parameters which is available in DataHub SberCloud, and several other satellite models.
Lingtrain Aligner. How to make parallel books for language learning. Part 1. Python and Colab version
If you're interested in learning new languages or teaching them, then you probably know such a way as parallel reading. It helps to immerse yourself in the context, increases the vocabulary, and allows you to enjoy the learning process. When it comes to reading, you most likely want to choose your favorite author, theme, or something familiar and this is often impossible if no one has published such a variant of a parallel book. It's becoming even worse when you're learning some cool language like Hungarian or Japanese.
Today we are taking a big step forward toward breaking this situation.
We will use the lingtrain_aligner tool. It's an open-source project on Python which aims to help all the people eager to learn foreign languages. It's a part of the Lingtrain project, you can follow us on Telegram, Facebook and Instagram. Let's start!
Find the texts
At first, we should find two texts we want to align. Let's take two editions of "To Kill a Mockingbird" by Harper Lee, in Russian and the original one.
When we move towards the digital world, we shouldn’t forget that cybersecurity has been playing a major role in our life. Talks about digital security have been stiff. The main challenge we would face is abnormality.
During an online transaction, most of the product-lovers prefer credit cards. The credit limit available in credit cards would allow us to purchase even when our bank balance is insufficient. But this is great news for cyber attackers eyeing your money.
For tackling this problem, we should depend upon a system to make hardpressed transactions effortless.
This is where we need a system to track the transaction patterns. With AI, we can abort any abnormal transaction, precisely for credit card fraud detection AI.
As of now, we will come across a number of machine learning algorithms to classify unusual transactions where Artificial Intelligence detect fraud. We only need past data and the right algorithm to fit the data in the right form in case of credit card fraud detection ai.
How do we make this happen? Let’s look into the process of credit card fraud detection AI:
Import the needed libraries
The best step to detect credit card fraud detection with AI is to import the libraries. The best practice would be to import the necessary libraries in a single section for the purpose of quick modification. To use the credit card data, we can use the PCA’s transformed version or RFECV, RFE, VIF and SelectKBest to get the best model features.
Machine learning helps with fraud detection. It’s quite simple to import the dataset when you use the pandas module in python. You can run the run command for importing your data.
Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.
Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.
For this reason, we would like to share a system that:
- Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
- Works for 4 languages (Russian, English, German, Spanish) and can be extended;
- By design is domain agnostic and is not based on any hard-coded rules;
- Has non-trivial metrics and succeeds in the task of improving text readability;
To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.
This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.
This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.
A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.
Google Colorizing Transformer vs Deoldify
We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.
Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!
The new issue of the new Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…