Machine learning *

The basis of artificial intelligence

ArticlesPostsNewsAuthors

m31 May 28 2021 at 11:29

DataScience Digest — 28.05.21

5 min

883

Python*Algorithms*Big Data*Machine learning*Artificial Intelligence

The new issue of Data Science Digest is here! Hop to learn about the latest news, articles, tutorials, research papers, and event materials on DataScience, AI, ML, and BigData. All sections are prioritized for your convenience. Enjoy!

alexpatel Apr 26 2021 at 09:29

Flitter Your Business With AI Integrated Flutter App Development

5 min

3.1K

Flutter*Artificial IntelligenceMachine learning*Development of mobile applications*Software

From sandbox

As we all are aware of the fact that the digital market is heavily leaning towards a reliable UX-driven process, app development has become quite complex, especially for targeting the industry for mobile platforms.

For every organization, creating a product that is beneficial for their customer needs always comes up with a plethora of challenges.

From the technical point of time, there are various challenges that every business faces, including selecting the right platform for the app, the right technology stack or framework, and creating an app that fulfills the needs and expectations of customers.

Similarly, there are more challenges that every business faces and needs to cope with while creating its dream product.

So, what to do??

Well, what if I say that the answer to all your queries and questions is Flutter app development with Artificial Intelligence (AI) integration……

Surprised? Wondering how?

Well, AI in Flutter app development is one of the best advancements in the software market. The concept of AI was first introduced during the 20th century with loads of innovations and advancements that we are still integrating into our mobile app development.

But, what are Artificial Intelligence and Flutter app development?

m31 Apr 21 2021 at 09:38

Data Science Digest — 21.04.21

3 min

Artificial IntelligenceMachine learning*Big Data*Algorithms*Python*

Hi All,

I’m pleased to invite you all to enroll in the Lviv Data Science Summer School, to delve into advanced methods and tools of Data Science and Machine Learning, including such domains as CV, NLP, Healthcare, Social Network Analysis, and Urban Data Science. The courses are practice-oriented and are geared towards undergraduates, Ph.D. students, and young professionals (intermediate level). The studies begin July 19–30 and will be hosted online. Make sure to apply — Spots are running fast!

If you’re more used to getting updates every day, follow us on social media:

Telegram
Twitter
LinkedIn
Facebook

Regards,
Dmitry Spodarets.

gui_tar_gz Apr 19 2021 at 12:53

Neural network Telegram bot with StyleGAN and GPT-2

3 min

5.4K

Social networks and communitiesMachine learning*Artificial IntelligencePython*

The Beginning

So we have already played with different neural networks. Cursed image generation using GANs, deep texts from GPT-2 — we have seen it all.

This time I wanted to create a neural entity that would act like a beauty blogger. This meant it would have to post pictures like Instagram influencers do and generate the same kind of narcissistic texts. \

Initially I planned to post the neural content on Instagram but using the Facebook Graph API which is needed to go beyond read-only was too painful for me. So I reverted to Telegram which is one of my favorite social products overall.

The name of the entity/channel (Aida Enelpi) is a bad neural-oriented pun mostly generated by the bot itself.

One of the first posts generated by Aida

m31 Apr 15 2021 at 19:34

Data Science Digest — We Are Back

5 min

1.2K

Artificial IntelligenceMachine learning*Big Data*Algorithms*Python*

Hi All,

I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

If you’re more used to getting updates every day, follow us on social media:

Telegram - https://t.me/DataScienceDigest
Twitter - https://twitter.com/Data_Digest
LinkedIn - https://www.linkedin.com/company/data-science-digest/
Facebook - https://www.facebook.com/DataScienceDigest/

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Regards,
Dmitry Spodarets.

lukyanchikov Apr 3 2021 at 12:09

Distributed Artificial Intelligence with InterSystems IRIS

7 min

1.1K

InterSystems corporate blogMachine learning*Distributed systems*Artificial Intelligence

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

What is Distributed Artificial Intelligence (DAI)?

Attempts to find a “bullet-proof” definition have not produced result: it seems like the term is slightly “ahead of time”. Still, we can analyze semantically the term itself – deriving that distributed artificial intelligence is the same AI (see our effort to suggest an “applied” definition) though partitioned across several computers that are not clustered together (neither data-wise, nor via applications, not by providing access to particular computers in principle). I.e., ideally, distributed artificial intelligence should be arranged in such a way that none of the computers participating in that “distribution” have direct access to data nor applications of another computer: the only alternative becomes transmission of data samples and executable scripts via “transparent” messaging. Any deviations from that ideal should lead to an advent of “partially distributed artificial intelligence” – an example being distributed data with a central application server. Or its inverse. One way or the other, we obtain as a result a set of “federated” models (i.e., either models trained each on their own data sources, or each trained by their own algorithms, or “both at once”).

Distributed AI scenarios “for the masses”

We will not be discussing edge computations, confidential data operators, scattered mobile searches, or similar fascinating yet not the most consciously and wide-applied (not at this moment) scenarios. We will be much “closer to life” if, for instance, we consider the following scenario (its detailed demo can and should be watched here): a company runs a production-level AI/ML solution, the quality of its functioning is being systematically checked by an external data scientist (i.e., an expert that is not an employee of the company). For a number of reasons, the company cannot grant the data scientist access to the solution but it can send him a sample of records from a required table following a schedule or a particular event (for example, termination of a training session for one or several models by the solution). With that we assume, that the data scientist owns some version of the AI/ML mechanisms already integrated in the production-level solution that the company is running – and it is likely that they are being developed, improved, and adapted to concrete use cases of that concrete company, by the data scientist himself. Deployment of those mechanisms into the running solution, monitoring of their functioning, and other lifecycle aspects are being handled by a data engineer (the company employee).

snakers4 Mar 30 2021 at 03:33

High-Quality Text-to-Speech Made Accessible, Simple and Fast

8 min

11K

Natural Language Processing*SoundMachine learning*

There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.

But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:

Naturally sounding speech;
A large library of voices in many languages;
Support for 16kHz and 8kHz out of the box;
No GPUs / ML engineering team / training required;
Unique voices not infringing upon third-party licenses;
High throughput on slow hardware. Decent performance on one CPU thread;
Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

chdan Feb 25 2021 at 16:35

Chatbox on Top of SIEM Solution

2 min

Information Security*Machine learning*

One of the most time-consuming steps while implementing a SIEM solution is writing and tuning "Playbook" – a set of reaction procedures SOC Team has to follow in case of alert triggering.

So during one of our projects we stoped for a moment and thought: "How can we optimize (ideally automate) the Playbook?"

kryma Jan 20 2021 at 09:16

Doing «Data Science» even if you have never heard the words before

12 min

1.4K

Python*Algorithms*Mathematics*Machine learning*Artificial Intelligence

There’s a lot of talk about machine learning nowadays. A big topic – but, for a lot of people, covered by this terrible layer of mystery. Like black magic – the chosen ones’ art, above the mere mortal for sure. One keeps hearing the words “numpy”, “pandas”, “scikit-learn” - and looking each up produces an equivalent of a three-tome work in documentation.

I’d like to shatter some of this mystery today. Let’s do some machine learning, find some patterns in our data – perhaps even make some predictions. With good old Python only – no 2-gigabyte library, and no arcane knowledge needed beforehand.

Interested? Come join us.

fralik Jan 14 2021 at 13:21

CLIP from OpenAI: what is it and how you can try it out yourself

7 min

12K

Machine learning*

Neural networks (NN) and computer vision models in particular are known to perform well in specific tasks, but often fail to generalize to tasks they have not been trained on. A model that performs well on a food data may perform poorly on satellite images.

A new model from OpenAI named CLIP claims to close this gap by a large margin. The paper Open AI wrote presenting CLIP demonstrates how the model may be used on a various classification datasets in a zero-shot manner.

In this article, I will explain the key ideas of the model they proposed and show you the code to use it.

snakers4 Jan 14 2021 at 07:09

Modern Portable Voice Activity Detector Released

3 min

Open source*SoundMachine learning*

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.

Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but Voice Activity and Number Detection are quite general tasks.

Key features:

Modern, portable;
Low memory footprint;
Superior metrics to WebRTC;
Trained on huge spoken corpora and noise / sound libraries;
Slower than WebRTC, but fast enough for IOT / edge / mobile applications;
Unlike WebRTC (which mostly tells silence from voice), our VAD can tell voice from noise / music / silence;
PyTorch (JIT) and ONNX checkpoints;

Typical use cases:

Spoken corpora anonymization;
Can be used together with WebRTC;
Voice activity detection for IOT / edge / mobile use cases;
Data cleaning and preparation, number and voice detection in general;
PyTorch and ONNX can be used with a wide variety of deployment options and backends in mind;

oduvan Dec 28 2020 at 13:42

9 Reasons Why Students Don’t Want You as a Teacher

1 min

1.5K

Learning languagesMachine learning*Programming*Social networks and communitiesStudying in IT

Teaching is hard! Finding a way to explain ideas and concepts, finding an approach to each individual among your students, each having a unique mind and learning capabilities. Being patient and creative, friendly but respective, kind but fair. You have to understand complex stuff and be able to present them in the simplest of ways. There are so many things that you must balance and consider in your work. Teachers, you are heroes, the every-day heroes! With this heroic work comes a responsibility. A responsibility of keeping yourself accountable for your student’s education. Some teachers forget about that and stay oblivious to the mistakes they are making. We’ve compiled a list of 9 Reasons Why Students Don’t Want You as a Teacher. We sincerely hope that it will help you to self-reflect, better connect with your students and achieve better results during your lessons.

-1

snakers4 Dec 5 2020 at 09:55

Playing with Nvidia's New Ampere GPUs and Trying MIG

11 min

4.4K

Big Data*Natural Language Processing*Computer hardwareMachine learning*Image processing*

Every time when the essential question arises, whether to upgrade the cards in the server room or not, I look through similar articles and watch such videos.

Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:

The authors usually take into account only the "adequacy" for the market of new cards in the United States;
The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
The popular mantra to train more and more gigantic models makes its own adjustments to the comparison;

The answer to the question "which card is better?" is not rocket science: Cards of the 20* series didn't get much popularity, while the 1080 Ti from Avito (Russian craigslist) still are very attractive (and, oddly enough, don't get cheaper, probably for this reason).

All this is fine and dandy and the standard benchmarks are unlikely to lie too much, but recently I learned about the existence of Multi-Instance-GPU technology for A100 video cards and native support for TF32 for Ampere devices and I got the idea to share my experience of the real testing cards on the Ampere architecture (3090 and A100). In this short note, I will try to answer the questions:

Is the upgrade to Ampere worth it? (spoiler for the impatient — yes);
Are the A100 worth the money (spoiler — in general — no);
Are there any cases when the A100 is still interesting (spoiler — yes);
Is MIG technology useful (spoiler — yes, but for inference and for very specific cases for training);

sismetanin Nov 5 2020 at 11:13

Toxic Comments Detection in Russian

17 min

8.2K

VK corporate blogSocial networks and communitiesMachine learning*Python*

Currently, social network sites tend to be one of the major communication platforms in both offline and online space. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This paper aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned ruBERT achieved F₁ = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.

+16

fralik Sep 24 2020 at 05:44

Machine learning in browser: ways to cook up a model

12 min

2.2K

Python*Node.JS*Machine learning*

With ML projects still on the rise we are yet to see integrated solutions in almost every device around us. The need for processing power, memory and experimentation has led to machine learning and DL frameworks targeting desktop computers first. However once trained, a model may be executed in a more constrained environment on a smartphone or on an IoT device. A particularly interesting environment to run the model on is browser. Browser-based solutions may be used on a wide range of devices, desktop and mobile, online and offline. The topic of this post is how to prepare a model for the in-browser usage.

This post presents an end-to-end implementations of a model creation in Python and Node.js. The end goal is to create a model and to use it in a browser. I'll use TensorFlow and TensorFlow.js as main frameworks. One could train a model in Python and convert it to JS. Alternative is to train a model directly in javascript, hence omitting the conversion step.

I have more experience in Python and use it in my everyday work. I occasionally use javascript, but have very little experience in the contemporary front-end development. My hope from this post that python developers with little JS experience could use it to kick start their JS usage.

lukyanchikov Sep 20 2020 at 20:33

InterSystems IRIS – the All-Purpose Universal Platform for Real-Time AI/ML

22 min

1.1K

InterSystems corporate blogData Engineering*DevOps*Artificial IntelligenceMachine learning*

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

Challenges of real-time AI/ML computations

We will start from the examples that we faced as Data Science practice at InterSystems:

A “high-load” customer portal is integrated with an online recommendation system. The plan is to reconfigure promo campaigns at the level of the entire retail network (we will assume that instead of a “flat” promo campaign master there will be used a “segment-tactic” matrix). What will happen to the recommender mechanisms? What will happen to data feeds and updates into the recommender mechanisms (the volume of input data having increased 25000 times)? What will happen to recommendation rule generation setup (the need to reduce 1000 times the recommendation rule filtering threshold due to a thousandfold increase of the volume and “assortment” of the rules generated)?
An equipment health monitoring system uses “manual” data sample feeds. Now it is connected to a SCADA system that transmits thousands of process parameter readings each second. What will happen to the monitoring system (will it be able to handle equipment health monitoring on a second-by-second basis)? What will happen once the input data receives a new bloc of several hundreds of columns with data sensor readings recently implemented in the SCADA system (will it be necessary, and for how long, to shut down the monitoring system to integrate the new sensor data in the analysis)?
A complex of AI/ML mechanisms (recommendation, monitoring, forecasting) depend on each other’s results. How many man-hours will it take every month to adapt those AI/ML mechanisms’ functioning to changes in the input data? What is the overall “delay” in supporting business decision making by the AI/ML mechanisms (the refresh frequency of supporting information against the feed frequency of new input data)?

snakers4 Sep 17 2020 at 16:48

Modern Google-level STT Models Released

2 min

5.5K

Start-up developmentMachine learning*SoundBig Data*

We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:

English;
German;
Spanish;

You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.

	PyTorch	ONNX	TensorFlow	Quality
English (en_v1)	✓	✓	✓	link
German (de_v1)	✓	✓	✓	link
Spanish (es_v1)	✓	✓	✓	link

HMS-service Aug 11 2020 at 10:21

Как с помощью HUAWEI ML Kit самостоятельно создать апплет для фото на документы

5 min

2.1K

DIYDevelopment for Android*Image processing*Machine learning*

Общая информация

В предыдущей статье мы рассказали о том, как создать камеру для улыбок с помощью HUAWEI ML Kit. В этот раз я собираюсь представить вам новую функцию HUAWEI ML Kit.

Вас когда-нибудь просили на учебе или работе принести фотографию определенного размера с цветным фоном для документов? В большинстве случаев у человека не окажется под рукой подходящей фотографии. Однажды в институте нам решили оформить персональные пропуска, но фотостудия оказалась закрыта. Тогда я сфотографировался на телефон, использовав простыню в качестве фона. И получил выговор от преподавателя. Но с помощью инструмента HUAWEI ML Kit вы сможете интегрировать SDK для сегментации изображений в ваше приложение и разработать апплет, чтобы создавать фото на документы самостоятельно и решить проблему отсутствия нужных фотографий.

Самое главное, что этот SDK абсолютно бесплатный и работает на всех телефонах на базе Android.

Разработка апплета для фото на документы самостоятельно

1. Подготовка

1.1 Добавьте репозиторий Maven Huawei в файл на уровне проекта build.gradle

Откройте файл build.gradle в корневом каталоге вашего проекта Android Studio.

KRRGuest Jul 14 2020 at 08:55

Objects Representations for Machine Learning system based on Lattice Theory

5 min

1.4K

Mathematics*Machine learning*

This is a fourth article in the series of works (see also first one, second one, and third one) describing Machine Learning system based on Lattice Theory named 'VKF-system'. The program uses Markov chain algorithms to generate causes of the target property through computing random subset of similarities between some subsets of training objects. This article describes bitset representations of objects to compute these similarities as bit-wise multiplications of corresponding encodings. Objects with discrete attributes require some technique from Formal Concept Analysis. The case of objects with continuous attributes asks for logistic regression, entropy-based separation of their ranges into subintervals, and a presentation corresponding to the convex envelope for subintervals those similarity is computed.

got idea!

KRRGuest Jul 12 2020 at 17:50

Mathematics of Machine Learning based on Lattice Theory

7 min

Mathematics*Machine learning*

This is a third article in the series of works (see also first one and second one) describing Machine Learning system based on Lattice Theory named 'VKF-system'. It uses structural (lattice theoretic) approach to representing training objects and their fragments considered to be causes of the target property. The system computes these fragments as similarities between some subsets of training objects. There exists the algebraic theory for such representations, called Formal Concept Analysis (FCA). However the system uses randomized algorithms to remove drawbacks of the unrestricted approach. The details follow…
Areas of Formal Concept Analysis

1 2

4 5 6