Pull to refresh
83.5

Statistics in IT

Statistics, research, tendencies.

Show first
Period
Level of difficulty

Habr — best articles, authors and statistics 2019

Reading time6 min
Views2.8K
2019 is coming to an end, and it's Christmas soon. It is also the time to grab all data and collect statistics and a rating of the most interesting Habr's articles for this period.



In this post the best articles and best Habr authors 2019 will be presented, I also will show some statistical graphs that I find interesting or unusual.

Let's get started.
Read more →
Total votes 23: ↑22 and ↓1+21
Comments11

Ethicality of automatic contributions

Reading time2 min
Views1.8K
Hey Habr! Today I would like to talk about ethics, namely ethics in the professional field. It will be a question of services that perform 'fake' (automated human-like) activity and of those doubts in which they can result both the ordinary ordinary user, and the professional of development sphere.



So, let's start. What I mean by the phrase «fake activity» is not difficult to guess: it is the manipulation and compromising of the data that are responsible for the indicator of your activity, or more simply, of actions on the Internet. With this, of course, every one of you who used social networks at least once came across: Facebook, Instagram, and so on.

I will describe this scheme on the example of Instagram: each person has his own account, and for developers API access is provided. And what did we do? We started to launch bots that can perform all sorts of activity through a person’s account (such as like, subscribe, comment on other people's posts, or even independently manage their (or owners) page, for example @neuralcat ). And soon this opportunity began to be actively used in the business sphere. Attracting a new audience by targeting according to certain criteria and carrying out activity on their page. Everything would be fine, but over time it went beyond all limits. Every day dozens of incomprehensible accounts like your photos, leave spam comments, tag you on advertising posts and so on.

Bot activity has gone beyond all limits of prudence that today Instagram closes its API, and here is one of the reasons: “Most of the services that work with auto-posting, likes and OML-like likes — do it through private api — login / password, but not through the official API.”
Read more →
Total votes 18: ↑17 and ↓1+16
Comments0

1. Information theory + ML. Entropy

Reading time10 min
Views991

I've long wanted to create educational materials on the topic of Information Theory + Machine Learning. I found some old drafts and decided to polish them up here, on Habr.

Information Theory and Machine Learning seem to me like an interesting pair of fields, the deep connection between which is often unknown to ML engineers, and whose synergy has not yet been fully revealed.

Let's start with basic concepts like Entropy, Information in a message, Mutual Information, and channel capacity. Next, there will be materials on the similarity between tasks of maximizing Mutual Information and minimizing Loss in regression problems. Then there will be a section on Information Geometry: Fisher metric, geodesics, gradient methods, and their connection to Gaussian processes (moving along the gradient using SGD is moving along the geodesic with noise).

It's also necessary to touch upon AIC, Information Bottleneck, and discuss how information flows in neural networks – Mutual Information between layers (Information Theory of Deep Learning, Naftali Tishby), and much more. It's not certain that I'll be able to cover everything listed, but I'll try to get started.

Read more
Total votes 3: ↑3 and ↓0+3
Comments0

2. Information Theory + ML. Mutual Information

Reading time11 min
Views751

In Part 1, we became familiar with the concept of entropy.

In this part, we will delve into the concept of Mutual Information, which opens doors to error-resistant coding, compression algorithms, and offers a fresh perspective on regression and Machine Learning tasks.

It is an essential component that will pave the way, in the next section, for tackling Machine Learning problems as tasks of extracting mutual information between features and the predicted variable.

Here, there will be three interesting and crucial visualizations.

The first one will visualize entropy for two random variables and their mutual information.
The second one will shed light on the very concept of dependency between two random variables, emphasizing that zero correlation does not imply independence.
The third one will demonstrate that the bandwidth of an information channel has a straightforward geometric interpretation through the convexity measure of the entropy function.

In the meantime, we will prove a simplified version of the Shannon-Hartley theorem regarding the maximum bandwidth of a noisy channel. Let's dive in!

Read more
Total votes 2: ↑2 and ↓0+2
Comments0

Authors' contribution