DataScience Digest — 24.06.21 / Habr

The new issue of DataScienceDigest is here!

The impact of NLP and the growing budgets to drive AI transformations. How Airbnb standardized metric computation at scale. Cross-Validation, MASA-SR, AgileGAN, EfficientNetV2, and more.

If you’re more used to getting updates every day, subscribe to our Telegram channel or follow us on social media: Twitter, LinkedIn, Facebook.

NEWS

AI loan apps ruin credit scores. AI’s inroads into chemistry and drug development, including vaccines against Covid-19. NASA’s struggle to secure spacecrafts. Latest trends in the IT market. The impact of NLP and the growing budgets to drive AI transformations.

India’s technology market is booming, especially in such industries as fintech and insurance. The impact of at-scale digitalization is mostly positive, but it seems that AI-powered loan applications may be more of a problem than help to many young Indians. This once again proves that AI is a double-edged sword that must be handled with caution.

Speaking of the other edge of the sword… AI is extensively used to drive research in chemistry and translate the insights into specific results in drug discovery and drug development. Snapdragon Chemistry does exactly that. The advances in AI have also helped develop, test, and administer vaccines against Covid (i.e. to combat vaccine hesitancy and boost efforts to hit the herd immunity threshold).

Another frontier of AI is exploited by NASA. The agency uses AI and machine learning to speed up physical fault diagnosis in spacecraft and spaceflight systems, improving mission efficiency by reducing down-time.

Overall, enterprises are accelerating their AI transformations and ballooning their budgets. In less than a year, AI budgets have increased by 55% and range from $500,000 to $5 million per year. The same is true for NLP-based transformations that are reported to increasingly drive value for organizations in various industries.

A great deal of these transformations are spurred by the pandemic of Covid-19. And, as reported by Gartner, that is not the only change that will shape the IT market in the future.

ARTICLES

Deep Reinforcement Learning for AGV Routing
In this article by Samir Saci, you’ll learn how to use reinforcement learning to organize the routing of automated guided vehicles (AGV) that bring the shelves directly to the operators, to ensure optimal productivity. The article features the author’s previous research on the topic and next steps that will help you with your supply chain tasks.

How Airbnb Standardized Metric Computation at Scale
The engineering team of Airbnb reveals the design principles of Minerva compute infrastructure. Minerva is a single source of truth metric platform that standardizes the way business metrics are created, computed, served, and consumed. The article features the link to the first post on Minerva. Check it out, too!

How to Generate Automated PDF Documents with Python
In this tutorial by M Khorasani, you’ll learn how to automatically generate PDF documents with your own data, charts and images all bundled together with a dazzling look and structure. The directions are easy enough for beginners to follow and include creating PDF docs, inserting images, text, and numbers, and visualizing data.

AI Can Now Emulate Text Style in Images in One Shot — Using Just a Single Word
In this article, the engineering team of Facebook AI presents TextStyleBrush, an AI research project that can copy the style of text in a photo using just a single word. With this AI model, you can edit and replace text in images. The team hopes to spur dialogue and research into detecting potential misuse of this type of technology, so make sure to contribute.

3x Times Faster Pandas with PyPolars
In this article by Satyam Kumar, you’ll learn how to use PyPolars, an open-source Python data frame library similar to Pandas, to accelerate your Pandas workflow. PyPolars can help you handle data faster and more easily. It scales better than Pandas and allows handling large size datasets more conveniently.

PAPERS

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution
In this paper, Liying Li et al. propose Match & Extraction Module that can significantly reduce the computational cost by a coarse-to-fine correspondence matching scheme. The Spatial Adaptation Module learns the difference of distribution between the LR and Ref images, and remaps the distribution of Ref features to that of LR features in a spatially adaptive way. This scheme makes the network robust to handle different reference images.

EfficientNetV2: Smaller Models and Faster Training
EfficientNetV2 is a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. They were developed by using a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency. EfficientNetV2 significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets.

Cross-Validation: What Does It Estimate and How Well Does It Do It?
The behavior of cross validation is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. The team proves that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population.

Comparing Test Sets with Item Response Theory
In this paper, Clara Vania et al. use the Item Response Theory to evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.

AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning
In this paper, Song Guoxian et al. introduce AgileGAN, a framework that can generate high quality stylistic portraits via inversion-consistent transfer learning. A novel hierarchical variational autoencoder is used to ensure that the inverse mapped distribution conforms to the original latent Gaussian distribution, while augmenting the original space to a multi-resolution latent space.

Books

Mathematics for Machine Learning
“Mathematics for Machine Learning” by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong brings the mathematical foundations of basic ML concepts to all those who struggle with the mathematical knowledge required to read an ML textbook. This book is intended to be a guidebook to the vast mathematical literature that forms the foundations of modern machine learning.

COURSES

Deep Learning Do It Yourself!
The website is a collection of more than 20 modules on learning deep learning. As a student, you can walk through the modules at your own pace and interact with others. You can also contribute to the materials by adding new modules yourself.

PODCASTS & INTERVIEWS

The Banana Data Podcast
Meet a new season of the Banana Data Podcast, which will be focused on humanizing data science. This week, you’ll listen to Emma Irwin, Solutions Engineer at Dataiku, to discuss differing methodologies and functionalities within the data science field.

Practical AI — Learning to learn deep learning
Chris and Daniel, hosts of the Practical AI podcast, discuss some exciting AI developments including wav2vec-u, a new book “How To Learn Deep Learning And Thrive In The Digital World”, as well as engineering skills for AI developers.

VIDEOS

Conversational AI
In this video, Merve Noyan, Google Developer Expert on Machine Learning, gives an overview of the conversational AI niche. The talk is hosted by Alexey Grigorev, the founder of DataTalks.Club.

DataScience Digest is a collection of the best and latest articles, videos, datasets, events, books, and jobs on Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, and other aspects of Data Science. It’s the easiest way for you to, literally, be in the know: Just follow us on Telegram, Twitter, Facebook and get your daily dose of news. OR, subscribe to our newsletter and receive weekly updates right to your inbox.