Pull to refresh
659.27

Machine learning *

The basis of artificial intelligence

Show first
Rating limit
Level of difficulty

FL_PyTorch is publicly available on GitHub

Reading time2 min
Views1.3K

FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub.

FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. The paper, presentation, and appendix are available in DistributedML’21 Proceedings (https://dl.acm.org/doi/abs/10.1145/3488659.3493775).

The project is distributed in open source form under Apache License Version 2.0. Code Repository: https://github.com/burlachenkok/flpytorch.

To become familiar with that tool, I recommend the following sequence of steps:

Read more
Total votes 1: ↑0 and ↓1-1
Comments0

Metaverses: hype or the future to come?

Reading time5 min
Views1.5K

Alexander Volchek, IT entrepreneur, CEO educational platform GeekBrains

Pretty much everyone in the IT community is talking metaverses, NFTs, blockchain and cryptocurrency. This time we will discuss metaverses, and come back to everything else in the letters to follow. Entrepreneurs and founders of tech giants are passionate about this idea, and investors are allocating millions of dollars for projects dealing with metaverses. Let's start with the basics.

Read more
Total votes 2: ↑1 and ↓10
Comments0

What are neural networks and what do we need them for?

Reading time4 min
Views4.4K

Explaining through simple examples

For a long time, people have been thinking on how to create a computer that could think like a person. The advent of artificial neural networks is a significant step in this direction. Our brain consists of neurons that receive information from sensory organs and process it: we recognize people we know by their faces, and we feel hungry when we see delicious food. All of this is the result of brain neurons working and interacting with each other. This is also the principle that artificial neural networks are based on, simulating the processes occurring in the human brain.

What are neural networks

Artificial neural networks are a software code that imitates the work of a brain and is capable of self-learning. Like a biological network, an artificial network also consists of neurons, but they have a simpler structure.

If you connect neurons into a sufficiently large network with controlled interaction, they will be able to perform quite complex tasks. For example, determining what is shown in a picture, or independently creating a photorealistic image based on a text description.

Read more
Total votes 1: ↑1 and ↓0+1
Comments1

Multilingual Text-to-Speech Models for Indic Languages

Reading time5 min
Views2.6K

In this article, we shall provide some background on how multilingual multi-speaker models work and test an Indic TTS model that supports 9 languages and 17 speakers (Hindi, Malayalam, Manipuri, Bengali, Rajasthani, Tamil, Telugu, Gujarati, Kannada).

It seems a bit counter-intuitive at first that one model can support so many languages and speakers provided that each Indic language has its own alphabet, but we shall see how it was implemented.

Also, we shall list the specs of these models like supported sampling rates and try something cool – making speakers of different Indic languages speak Hindi. Please, if you are a native speaker of any of these languages, share your opinion on how these voices sound, both in their respective language and in Hindi.

Read more
Total votes 2: ↑2 and ↓0+2
Comments0

Text-based CAPTCHA in 2022

Reading time7 min
Views5.5K

The first text-based CAPTCHA ( we’ll call it just CAPTCHA for the sake of brevity ) was used in 1997 by AltaVista search engine. It prevented bots from adding Uniform Resource Locator (URLs) to their web search engine.

Back then it was a decent defense measure. However the progress can't be stopped, and this defense was bypassed using OCR available at those times (for example FineReader).

CAPTCHA became more complex, noise was added to it, along with distortions, so the popular OCRs couldn’t recognize this text. And then OCRs custom made for this task appeared. It costed extra money and knowledge for the attacking side. The CAPTCHA developers were required to understand the challenges the attackers met, what distortions to add, in order to make the automation of the CAPTCHA recognition more complex.

The misunderstanding of the principles the OCRs were based on, some CAPTCHAs were given such distortions, that they were more of a hassle for regular users than for a machine.

OCRs for different types of CAPTCHAs were made using heuristics, and the most complicated part of it was the CAPTCHA segmentation for the stand along symbols, that subsequently could be easily recognized by the CNN (for example LeNet-5), also SVM showed a good result even on the raw pixels.

In this article I’ll try to grasp the whole history of CAPTCHA recognition, from heuristics to the contemporary automated recognition systems. We’ll figure out, if a CAPTCHA is still alive.

I’ll review the yandex.com CAPTCHA. The Russian version of the same CAPTCHA is more complex.

Read more
Total votes 2: ↑1 and ↓1+2
Comments0

Our new public speech synthesis in super-high quality, 10x faster and more stable

Reading time3 min
Views4.6K

hero_image


In our last article we made a bunch of promises about our speech synthesis.


After a lot of hard work we finally have delivered upon these promises:


  • Model size reduced 2x;
  • New models are 10x faster;
  • We added flags to control stress;
  • Now the models can make proper pauses;
  • High quality voice added (and unlimited "random" voices);
  • All speakers squeezed into the same model;
  • Input length limitations lifted, now models can work with paragraphs of text;
  • Pauses, speed and pitch can be controlled via SSML;
  • Sampling rates of 8, 24 or 48 kHz are supported;
  • Models are much more stable — they do not omit words anymore;

This is a truly break-through achievement for us and we are not planning to stop anytime soon. We will be adding as many languages as possible shortly (the CIS languages, English, European languages, Hindic languages). Also we are still planning to make our models additional 2-5x faster.


We are also planning to add phonemes and a new model for stress, as well as to reduce the minimum amount of audio required to train a high-quality voice to 5 — 15 minutes.


As usual you can try our model in our repo or in colab.

Read more →
Total votes 13: ↑13 and ↓0+13
Comments0

ruDALL-E: Generating Images from Text. Facing down the biggest computational challenge in Russia

Reading time11 min
Views10K

Multimodality has led the pack in machine learning in 2021. Neural networks are wolfing down images, text, speech and music all at the same time.  OpenAI is, as usual, top dog, but as if in defiance of their name, they are in no hurry to share their models openly.  At the beginning of the year, the company presented the DALL-E neural network, which generates 256x256 pixel images in answer to a written request.  Descriptions of it can be found as articles on arXiv and examples on their blog.  

As soon as DALL-E flushed out of the bushes, Chinese researchers got on its tail.  Their open-source CogView neural network does the same trick of generating images from text.  But what about here in Russia? One might say that “investigate, master, and train” is our engineering motto.  Well, we caught the scent, and today we can say that we created from scratch a complete pipeline for generating images from descriptive textual input written in Russian.

In this article we present the ruDALL-E XL model, an open-source text-to-image transformer with 1.3 billion parameters as well as ruDALL-E XXL model, an text-to-image transformer with 12.0 billion parameters which is available in DataHub SberCloud, and several other satellite models.

Read more
Total votes 3: ↑3 and ↓0+3
Comments4

Lingtrain Aligner. How to make parallel books for language learning. Part 1. Python and Colab version

Reading time8 min
Views3.5K

title


If you're interested in learning new languages or teaching them, then you probably know such a way as parallel reading. It helps to immerse yourself in the context, increases the vocabulary, and allows you to enjoy the learning process. When it comes to reading, you most likely want to choose your favorite author, theme, or something familiar and this is often impossible if no one has published such a variant of a parallel book. It's becoming even worse when you're learning some cool language like Hungarian or Japanese.


Today we are taking a big step forward toward breaking this situation.


We will use the lingtrain_aligner tool. It's an open-source project on Python which aims to help all the people eager to learn foreign languages. It's a part of the Lingtrain project, you can follow us on Telegram, Facebook and Instagram. Let's start!


Find the texts


At first, we should find two texts we want to align. Let's take two editions of "To Kill a Mockingbird" by Harper Lee, in Russian and the original one.

Read more →
Total votes 5: ↑5 and ↓0+5
Comments0

Using the Machine Learning model to detect credit card fraud

Reading time3 min
Views1.7K

When we move towards the digital world, we shouldn’t forget that cybersecurity has been playing a major role in our life. Talks about digital security have been stiff. The main challenge we would face is abnormality.

During an online transaction, most of the product-lovers prefer credit cards. The credit limit available in credit cards would allow us to purchase even when our bank balance is insufficient. But this is great news for cyber attackers eyeing your money.

For tackling this problem, we should depend upon a system to make hardpressed transactions effortless.

This is where we need a system to track the transaction patterns. With AI, we can abort any abnormal transaction, precisely for credit card fraud detection AI.

As of now, we will come across a number of machine learning algorithms to classify unusual transactions where Artificial Intelligence detect fraud. We only need past data and the right algorithm to fit the data in the right form in case of credit card fraud detection ai.

How do we make this happen? Let’s look into the process of credit card fraud detection AI:

Import the needed libraries

The best step to detect credit card fraud detection with AI is to import the libraries. The best practice would be to import the necessary libraries in a single section for the purpose of quick modification. To use the credit card data, we can use the PCA’s transformed version or RFECV, RFE, VIF and SelectKBest to get the best model features.

Import Dataset

Machine learning helps with fraud detection. It’s quite simple to import the dataset when you use the pandas module in python. You can run the run command for importing your data. 

Read more
Rating0
Comments2

We have published a model for text repunctuation and recapitalization for four languages

Reading time7 min
Views6.7K


Open In Colab


Working with speech recognition models we often encounter misconceptions among potential customers and users (mostly related to the fact that people have a hard time distinguishing substance over form). People also tend to believe that punctuation marks and spaces are somehow obviously present in spoken speech, when in fact real spoken speech and written speech are entirely different beasts.


Of course you can just start each sentence with a capital letter and put a full stop at the end. But it is preferable to have some relatively simple and universal solution for "restoring" punctuation marks and capital letters in sentences that our speech recognition system generates. And it would be really nice if such a system worked with any texts in general.


For this reason, we would like to share a system that:


  • Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
  • Works for 4 languages (Russian, English, German, Spanish) and can be extended;
  • By design is domain agnostic and is not based on any hard-coded rules;
  • Has non-trivial metrics and succeeds in the task of improving text readability;

To reiterate — the purpose of such a system is only to improve the readability of the text. It does not add information to the text that did not originally exist.

Read more →
Total votes 4: ↑3 and ↓1+2
Comments0

Mode on: Comparing the two best colorization AI's

Reading time11 min
Views3.6K

This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.


Google Colorizing Transformer vs Deoldify

Read more →
Total votes 17: ↑17 and ↓0+17
Comments0

Data Phoenix Digest — 01.07.2021

Reading time5 min
Views1.9K

We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.

Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!

The new issue of the new Data Phoenix Digest is here! AI that helps write code, EU’s ban on biometric surveillance, genetic algorithms for NLP, multivariate probabilistic regression with NGBoosting, alias-free GAN, MLOps toys, and more…

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: TwitterFacebook.

Read more
Total votes 1: ↑0 and ↓1-1
Comments0

DataScience Digest — 24.06.21

Reading time5 min
Views1.9K

The new issue of DataScienceDigest is here!

The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more.

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, LinkedIn, Facebook.

Read more
Total votes 2: ↑1 and ↓10
Comments0

Flitter Your Business With AI Integrated Flutter App Development

Reading time5 min
Views3K

As we all are aware of the fact that the digital market is heavily leaning towards a reliable UX-driven process, app development has become quite complex, especially for targeting the industry for mobile platforms.

For every organization, creating a product that is beneficial for their customer needs always comes up with a plethora of challenges. 

From the technical point of time, there are various challenges that every business faces, including selecting the right platform for the app, the right technology stack or framework, and creating an app that fulfills the needs and expectations of customers.  

Similarly, there are more challenges that every business faces and needs to cope with while creating its dream product. 

So, what to do??

Well, what if I say that the answer to all your queries and questions is Flutter app development with Artificial Intelligence (AI)  integration……

Surprised? Wondering how? 

Well, AI in Flutter app development is one of the best advancements in the software market. The concept of AI was first introduced during the 20th century with loads of innovations and advancements that we are still integrating into our mobile app development. 

But, what are Artificial Intelligence and Flutter app development? 

Read more
Rating0
Comments1

Data Science Digest — 21.04.21

Reading time3 min
Views1K

Hi All,

I’m pleased to invite you all to enroll in the Lviv Data Science Summer School, to delve into advanced methods and tools of Data Science and Machine Learning, including such domains as CV, NLP, Healthcare, Social Network Analysis, and Urban Data Science. The courses are practice-oriented and are geared towards undergraduates, Ph.D. students, and young professionals (intermediate level). The studies begin July 19–30 and will be hosted online. Make sure to apply — Spots are running fast!

If you’re more used to getting updates every day, follow us on social media:

Telegram
Twitter
LinkedIn
Facebook

Regards,
Dmitry Spodarets.

Read more
Total votes 2: ↑1 and ↓1+1
Comments0

Neural network Telegram bot with StyleGAN and GPT-2

Reading time3 min
Views5.3K

The Beginning


So we have already played with different neural networks. Cursed image generation using GANs, deep texts from GPT-2 — we have seen it all.


This time I wanted to create a neural entity that would act like a beauty blogger. This meant it would have to post pictures like Instagram influencers do and generate the same kind of narcissistic texts. \


Initially I planned to post the neural content on Instagram but using the Facebook Graph API which is needed to go beyond read-only was too painful for me. So I reverted to Telegram which is one of my favorite social products overall.


The name of the entity/channel (Aida Enelpi) is a bad neural-oriented pun mostly generated by the bot itself.


One of the first posts generated by Aida

Read more →
Rating0
Comments1

Data Science Digest — We Are Back

Reading time5 min
Views1.1K

Hi All,

I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

If you’re more used to getting updates every day, follow us on social media:

Telegram - https://t.me/DataScienceDigest
Twitter - https://twitter.com/Data_Digest
LinkedIn - https://www.linkedin.com/company/data-science-digest/
Facebook - https://www.facebook.com/DataScienceDigest/

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Regards,
Dmitry Spodarets.

Read more
Rating0
Comments0

Authors' contribution