• ## Data Science Digest — We Are Back

Hi All,

I have some good news for you…

Data Science Digest is back! We’ve been “offline” for a while, but no worries — You’ll receive regular digest updates with top news and resources on AI/ML/DS every Wednesday, starting today.

If you’re more used to getting updates every day, follow us on social media:

And finally, your feedback is very much appreciated. Feel free to share any ideas with me and the team, and we’ll do our best to make Data Science Digest a better place for all.

Regards,
Dmitry Spodarets.

• ## Qrator Labs' Value Partnership Programs

Why is it valuable to get into the Qrator Labs partnership program?

In Qrator Labs, we firmly believe that working together brings a better result. Which is the reason why, for years, we were trying to find meaningful partnerships with all kinds of companies. They either seek to provide their existing customers with the top-notch DDoS mitigation technology developed at Qrator Labs with many additional ecosystem solutions or want to succeed the other way around. By getting their product available for Qrator Labs' customers by integrating into the Qrator anycast filtering network.

• ## Algebra of text. Examples

• Translation

The previous work from ref [1] describes the method of transforming a sign sequence into algebra through an example of a linguistic text. Two other examples of algebraic structuring of texts of a different nature are given to illustrate the method.

• ## Ugly API

In this article, I’d like to talk about the problems I faced while integrating an API for the HTTP protocol and share my experience in solving them.

- REST vs Non REST architecture

- Ignoring Header Accept: application/json

- Mixing JSON keys case types

- Different response to the same request

• ## Algorithms in Go: Bit Manipulation

This article is a part of Algorithms in Go series where we discuss common algorithmic problems and their solution patterns.

In this edition, we take a closer look at bit manipulations. Bit operations can be extremely powerful and useful in an entire class of algorithmic problems, including problems that at first glance does not have to do anything with bits.

Let's consider the following problem: six friends meet in the bar and decide who pays for the next round. They would like to select a random person among them for that. How can they do a random selection using only a single coin?

The solution to this problem is not particularly obvious (for me:), so let's simplify a problem for a moment to develop our understanding. How would we do the selection if there were only three friends? In other words, how would we "mimic" a three-sided coin with a two-sided coin?

• ## GDB Tutorial for Reverse Engineers: Breakpoints, Modifying Memory and Printing its Contents

• Tutorial

GDB is THE debugger for Linux programs. It’s super powerful. But its user-friendliness or lack thereof can actually make you throw your PC out of the window. But what’s important to understand about GDB is that GDB is not simply a tool, it’s a debugging framework for you to build upon. In this article, I’m gonna walk you through GDB setup for reverse engineering and show you all of the necessary commands and shortcuts for your debugging workflow.

• ## Pitfalls in String Pool, or Another Reason to Think Twice Before Interning Instances of String Class in C#

As software developers, we always want our software to work properly. We'll do everything to improve the software quality. To find the best solution, we are ready to use parallelizing or applying any various optimization techniques. One of these optimization techniques is the so-called string interning. It allows users to reduce memory usage. It also makes string comparison faster. However, everything is good in moderation. Interning at every turn is not worth it. Further, I'll show you how not to slip up with creating a hidden bottleneck in the form of the String.Intern method for your application.

AdBlock has stolen the banner, but banners are not teeth — they will be back

• ## 11 Kubernetes implementation mistakes – and how to avoid them

I manage a team that designs and introduces in-house Kubernetes aaS at Mail.ru Cloud Solutions. And we often see a lack of understanding as to this technology, so I’d like to talk about common strategic mistakes at Kubernetes implementation in major projects.

Most of the problems arise because the technology is quite sophisticated. There are unobvious implementation and operation challenges, as well as poorly used advantages, all of those resulting in money loss. Another issue is the global lack of knowledge and experience with Kubernetes. Learning its use by the book can be tricky, and hiring qualified staff can be challenging. All the hype complicates Kubernetes-related decision making. Curiously enough, Kubernetes is often implemented rather formally – just for it to be there and make their lives better in some way.

Hopefully, this post will help you to make a decision you will feel proud of later (and won’t regret or feel like building a time machine to undo it).
• ## Go Quiz

In this series, we will be discussing interesting aspects and corner cases of Golang. Some questions will be obvious, and some will require a closer look even from an experienced Go developer. These question will help to deeper the understanding of the programming language, and its underlying philosophy. Without much ado, let's start with the first part.

## Value assignment

What value y will have at the end of the execution?

func main() {
var y int
for y, z := 1, 1; y < 10; y++ {
_ = y
_ = z
}
fmt.Println(y)
}


According to the specification,

• ## Build (CI/CD) of non-JVM projects using gradle/kotlin

In some projects, the build script is playing the role of Cinderella. The team focuses its main effort on code development. And the build process itself could be handled by people who are far from development (for example, those responsible for operation or deployment). If the build script works somehow, then everyone prefers not to touch it, and noone ever is thinking about optimization. However, in large heterogeneous projects, the build process could be quite complex, and it is possible to approach it as an independent project. If however you treat the build script as a secondary unimportant project, then the result will be an indigestible imperative script, the support of which will be rather difficult.

In the previous post we looked at what criteria we used to choose the toolkit, and why we chose gradle/kotlin, and in this post we will take a look at how we use gradle/kotlin to automate the build of non-JVM projects. (There is also a Russian version.)

## Introduction

Gradle for JVM projects is a universally recognized tool and does not need additional recommendations. For projects outside of the JVM platform, it is also used. For instance, the official documentation describes usage scenarios for C++ and Swift projects. We use gradle to automate the build, test, and deployment of a heterogeneous project that includes modules in node.js, golang, terraform.

• ## Building projects (CI/CD), instruments

In some projects, the build script is playing the role of Cinderella. The team focuses its main effort on code development. And the build process itself could be handled by people who are far from development (for example, those responsible for operation or deployment). If the build script works somehow, then everyone prefers not to touch it, and no one ever is thinking about optimization. However, in large heterogeneous projects, the build process could be quite complex, and it is possible to approach it as an independent project.If you treat the build script as a secondary unimportant project, then the result will be an indigestible imperative script, the support of which will be rather difficult.

In this note we will take look at the criteria by which we chose the toolkit, and in the next one — how we use this toolkit. (There is also a Russian version.)

• ## Distributed Artificial Intelligence with InterSystems IRIS

Author: Sergey Lukyanchikov, Sales Engineer at InterSystems

What is Distributed Artificial Intelligence (DAI)?

Attempts to find a “bullet-proof” definition have not produced result: it seems like the term is slightly “ahead of time”. Still, we can analyze semantically the term itself – deriving that distributed artificial intelligence is the same AI (see our effort to suggest an “applied” definition) though partitioned across several computers that are not clustered together (neither data-wise, nor via applications, not by providing access to particular computers in principle). I.e., ideally, distributed artificial intelligence should be arranged in such a way that none of the computers participating in that “distribution” have direct access to data nor applications of another computer: the only alternative becomes transmission of data samples and executable scripts via “transparent” messaging. Any deviations from that ideal should lead to an advent of “partially distributed artificial intelligence” – an example being distributed data with a central application server. Or its inverse. One way or the other, we obtain as a result a set of “federated” models (i.e., either models trained each on their own data sources, or each trained by their own algorithms, or “both at once”).

Distributed AI scenarios “for the masses”

We will not be discussing edge computations, confidential data operators, scattered mobile searches, or similar fascinating yet not the most consciously and wide-applied (not at this moment) scenarios. We will be much “closer to life” if, for instance, we consider the following scenario (its detailed demo can and should be watched here): a company runs a production-level AI/ML solution, the quality of its functioning is being systematically checked by an external data scientist (i.e., an expert that is not an employee of the company). For a number of reasons, the company cannot grant the data scientist access to the solution but it can send him a sample of records from a required table following a schedule or a particular event (for example, termination of a training session for one or several models by the solution). With that we assume, that the data scientist owns some version of the AI/ML mechanisms already integrated in the production-level solution that the company is running – and it is likely that they are being developed, improved, and adapted to concrete use cases of that concrete company, by the data scientist himself. Deployment of those mechanisms into the running solution, monitoring of their functioning, and other lifecycle aspects are being handled by a data engineer (the company employee).

• ## Paper-based TOTP tokens

Enterprise policies are different, and in some cases weird. In this article, we will describe a very unusual problem raised by one of our customers. In a nutshell, the organization does not allow bringing any devices onsite, no smartphones, no mobile phones, and even no hardware tokens are allowed on-premises. At the same time, the organization is using Office 365 services from Microsoft and has enforced multi-factor authentication for all users to be activated.

To address this issue, our research and development team has spent some time and found a solution, which is a paper-based TOTP token. We are hereby presenting the solution, which is available for free (well, if you don't count the paper and ink cost).

Our solution is a web-based tool that generates the list of one-time passwords (OTPs) for an arbitrary seed. The list can be printed out and handed over to the end-users to serve as their second factor for authenticating in Azure AD with multi-factor authentication enabled. To associate this paper TOTP token with a user, you can follow the same procedure as with the regular TOTP tokens.

The procedure is simple, you enter the seed and click on submit to get the list generated. You will get a printable list similar to the one shown below for the next few days. By changing the number of future OTPs you can make the list longer or shorter.

• ## Twilio vs Sendbird vs CONTUS MirrorFly Feature Comparison | Twilio vs Competitors

Looking out for the best in app chat solution providers to enhance your business.

Getting confused with so many options to choose among  Sendbird and Twilio competitors.

Then, let’s have some clarity with a detailed discussion over the feature comparison to go for the best Twilio and Sendbird alternative.

• ## High-Quality Text-to-Speech Made Accessible, Simple and Fast

There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.

But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:

• Naturally sounding speech;
• A large library of voices in many languages;
• Support for 16kHz and 8kHz out of the box;
• No GPUs / ML engineering team / training required;
• Unique voices not infringing upon third-party licenses;
• High throughput on slow hardware. Decent performance on one CPU thread;
• Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
• Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
• Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

• ## Shrinking .NET Console Application

• Translation

Building console application can produce an unexpectedly much larger file than you would think due to implicit references.

What is the problem and how it can be solved?

• ## MacOS Kernel, How Good Is This Apple?

At the very beginning of this year, Apple released the source code for macOS – Big Sur. It includes XNU, the kernel of the macOS operating system. A few years ago, PVS-Studio has already checked the kernel source code. It coincided with the analyzer release on macOS. It's been a while since then. The new kernel source code has been released. A second check? Why not?

• ## Converting text into algebra

• Translation

Algebra and language (writing) are two different learning tools. When they are combined, we can expect new methods of machine understanding to emerge. To determine the meaning (to understand) is to calculate how the part relates to the whole. Modern search algorithms already perform the task of meaning recognition, and Google’s tensor processors perform matrix multiplications (convolutions) necessary in an algebraic approach. At the same time, semantic analysis mainly uses statistical methods. Using statistics in algebra, for instance, when looking for signs of numbers divisibility, would simply be strange. Algebraic apparatus is also useful for interpreting the calculations results when recognizing the meaning of a text.

• ## Multiple violations of policies in RMS open letter

Author: Chris Punches (@cmpunches, Silo group). License: "Please feel free to share unmodified".

The following text is an unmodified copy of now removed issue #2250 on rms-open-letter.github.io repository. The text claims multiple violations of different policies, codes of conduct and other documents in creation, content and support of the "Open letter to remove Richard M. Stallman from all leadership positions". The issue has not been addressed.

• ## How to Start Reverse Engineering in 2021

Reverse engineering might seem so complex, that not everyone has the bravery required to tackle it. But is it really that hard? Today we are gonna dive into the process of learning how to reverse engineer.

First of all, try to answer yourself, what are you hoping to achieve with reverse engineering? Because reverse engineering is a tool. And you should choose the right tool for your task. So when reverse engineering might be useful?