Pull to refresh
21.71

Statistics in IT

Statistics, research, tendencies.

Show first
Rating limit
Level of difficulty

2. Information Theory + ML. Mutual Information

Reading time11 min
Views1.1K

In Part 1, we became familiar with the concept of entropy.

In this part, we will delve into the concept of Mutual Information, which opens doors to error-resistant coding, compression algorithms, and offers a fresh perspective on regression and Machine Learning tasks.

It is an essential component that will pave the way, in the next section, for tackling Machine Learning problems as tasks of extracting mutual information between features and the predicted variable.

Here, there will be three interesting and crucial visualizations.

The first one will visualize entropy for two random variables and their mutual information.
The second one will shed light on the very concept of dependency between two random variables, emphasizing that zero correlation does not imply independence.
The third one will demonstrate that the bandwidth of an information channel has a straightforward geometric interpretation through the convexity measure of the entropy function.

In the meantime, we will prove a simplified version of the Shannon-Hartley theorem regarding the maximum bandwidth of a noisy channel. Let's dive in!

Read more

1. Information theory + ML. Entropy

Reading time10 min
Views1.2K

I've long wanted to create educational materials on the topic of Information Theory + Machine Learning. I found some old drafts and decided to polish them up here, on Habr.

Information Theory and Machine Learning seem to me like an interesting pair of fields, the deep connection between which is often unknown to ML engineers, and whose synergy has not yet been fully revealed.

Let's start with basic concepts like Entropy, Information in a message, Mutual Information, and channel capacity. Next, there will be materials on the similarity between tasks of maximizing Mutual Information and minimizing Loss in regression problems. Then there will be a section on Information Geometry: Fisher metric, geodesics, gradient methods, and their connection to Gaussian processes (moving along the gradient using SGD is moving along the geodesic with noise).

It's also necessary to touch upon AIC, Information Bottleneck, and discuss how information flows in neural networks – Mutual Information between layers (Information Theory of Deep Learning, Naftali Tishby), and much more. It's not certain that I'll be able to cover everything listed, but I'll try to get started.

Read more

Methodology for calculating results of a task set: taking into account its level of difficulty

Reading time3 min
Views2.4K

In the world of academic knowledge evaluation, objective calculation of large data presents a serious problem. Can a student studying in an Advanced Maths class and getting B-marks be evaluated equally with another student, getting B-marks in a General Maths class? Can we create a system that would take into account the level of difficulty those students face?

This article will describe a system of independent evaluation we have been using for school olympics in five subjects (Mathematics, English Language, Russian Language, Tatar Language, Social Science) for students grades 1 to 11. In each academic year we organise six qualification tournaments, with about 15,000 students from different regions of Russia. Then we select the top ten participants in each subject and each grade for their future participation in the final (seventh) tournament, where only the best of the best are chosen. It means that 550 participants compete in the final tournament, which is about 5.5% of all participants in the academic year. 

It is obvious that those multiple tournaments cannot be absolutely homogenous, and inevitably the levels of difficulty for each set of tasks vary. Therefore, it is critical for us to take into consideration those variations of difficulty and calculate the results in the most objective manner.

Read more

How to choose the appropriate level of statistical significance for an AB-test

Reading time9 min
Views1.1K

Nowadays a lot of product managers have to confirm most of their decisions with AB-tests. Yet, it is far not always clear how to choose the parameters for the test. A particularly difficult parameter to tune is often the level of statistical significance. If we choose too high level - tests will fail even though improvements do exist. If we choose too low level - we'll be getting lots of "confirmations" of false improvements.

When we make decisions based on AB-tests, once in a while we'll be making mistakes. We can limit the losses caused by such mistakes by choosing the appropriate level of statistical significance.

Read more

COVID-19 and Internet

Reading time4 min
Views1.7K

СOVID-19 and Internet


Recent events caused by coronavirus spread have highlighted quite a few problem areas in society, economics, technology… And it’s not only about the panic, which is inevitable and will come back with any following global issue. But it is really about the consequences: crowded hospitals, empty shelves in supermarkets, people having to stay at home and use up the Internet which turns out to not be enough for everyone who’s going through the hard days and nights of #stayathome.

What already happened

Read more →

Habr — best articles, authors and statistics 2019

Reading time6 min
Views2.9K
2019 is coming to an end, and it's Christmas soon. It is also the time to grab all data and collect statistics and a rating of the most interesting Habr's articles for this period.



In this post the best articles and best Habr authors 2019 will be presented, I also will show some statistical graphs that I find interesting or unusual.

Let's get started.
Read more →