Author: Chris Punches (@cmpunches, Silo group). License: "Please feel free to share unmodified".
The following text is an unmodified copy of now removed issue #2250 on rms-open-letter.github.io repository. The text claims multiple violations of different policies, codes of conduct and other documents in creation, content and support of the "Open letter to remove Richard M. Stallman from all leadership positions". The issue has not been addressed.
Currently, social network sites tend to be one of the major communication platforms in both offline and online space. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This paper aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned ruBERT achieved F1 = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.
2019 is coming to an end, and it's Christmas soon. It is also the time to grab all data and collect statistics and a rating of the most interesting Habr's articles for this period.
In this post the best articles and best Habr authors 2019 will be presented, I also will show some statistical graphs that I find interesting or unusual.