Pull to refresh
VK
Building the Internet
Show first

Google News and Leo Tolstoy: visualizing Word2Vec word embeddings using t-SNE

Reading time7 min
Views13K

Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.

This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.

Read more →
Total votes 28: ↑28 and ↓0+28
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 3

Reading time8 min
Views1.1K


This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here and here). In further parts, I’ll talk about the accidents and outages in detail. But first let me highlight something I should’ve talked about in the first article but didn’t. I found out about it from my readers’ feedback. This article gives me a chance to fix this annoying shortcoming.
Read more →
Total votes 23: ↑23 and ↓0+23
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 2

Reading time8 min
Views977


This is a second article out of a series «Citymobil — a manual for improving availability amid business growth for startups». You can read the first part here. Let’s continue to talk about the way we managed to improve the availability of Citymobil services. In the first article, we learned how to count the lost trips. Ok, we are counting them. What now? Now that we are equipped with an understandable tool to measure the lost trips, we can move to the most interesting part — how do we decrease losses? Without slowing down our current growth! Since it seemed to us that the lion’s share of technical problems causing the trips loss had something to do with the backend, we decided to turn our attention to the backend development process first. Jumping ahead of myself, I’m going to say that we were right — the backend became the main site of the battle for the lost trips.
Read more →
Total votes 23: ↑22 and ↓1+21
Comments0

Citymobil — a manual for improving availability amid business growth for startups. Part 1

Reading time4 min
Views1.3K


In this first part of an article series «Citymobil — a manual for improving availability amid business growth for startups» I’m going to break down the way we managed to dramatically scale up the availability of Citymobil services. The article opens with the story about our business, our task, the reason for this task to increase the availability emerged and limitations. Citymobil is a rapid-growing taxi aggregator. In 2018, it increased by more than 15 times in terms of number of successfully completed trips. Some months showed 50% increase compared with the previous month.

The business grew like a weed in every direction (it still does): there was an increase in server load, team size and number of deployments. At the same time the new threats to service availability emerged. The company faced a task of the most importance — how to increase availability without compromising company growth. In this article, I’ll talk about the way we managed to solve this task in a relatively short time.
Read more →
Total votes 24: ↑24 and ↓0+24
Comments0

Liza Alert: volunteers, who save lives

Reading time6 min
Views4.7K


Liza Alert search-and-rescue team has existed for eight years. It’s a volunteer organization, the fellowship of the ones who care, that searches for missing people effectively collaborating with the Ministry of Emergency Situations and the Ministry of Internal Affairs. Liza Alert collects the entries on missing people; they conduct various educational events, search management and search operations themselves. The team isn’t involved in any business activities, doesn’t have a checking account and doesn’t accept monetary donations.

Mail.Ru has recently provided Liza Alert with a free platform for services, extensively used for search-and-rescue operations. We decided to talk to Sergey Chumak — the head of Liza Alert IT branch — about the work of the volunteer emergency response group and how high-tech solutions aid them.
Read more →
Total votes 36: ↑34 and ↓2+32
Comments0

Tips and tricks from my Telegram-channel @pythonetc, February 2019

Reading time6 min
Views1.7K
image

It is new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.

Structures comparing


Sometimes you want to compare complex structures in tests ignoring some values. Usually, it can be done by comparing particular values with the structure:
Read more →
Total votes 19: ↑18 and ↓1+17
Comments0

VShard — horizontal scaling in Tarantool

Reading time14 min
Views2.3K


Hi, my name is Vladislav, and I am a member of the Tarantool development team. Tarantool is a DBMS and an application server all in one. Today I am going to tell the story of how we implemented horizontal scaling in Tarantool by means of the VShard module.

Some basic knowledge first.

There are two types of scaling: horizontal and vertical. And there are two types of horizontal scaling: replication and sharding. Replication ensures computational scaling whereas sharding is used for data scaling.

Sharding is also subdivided into two types: range-based sharding and hash-based sharding.

Range-based sharding implies that some shard key is computed for each cluster record. The shard keys are projected onto a straight line that is separated into ranges and allocated to different physical nodes.

Hash-based sharding is less complicated: a hash function is calculated for each record in a cluster; records with the same hash function are allocated to the same physical node.

I will focus on horizontal scaling using hash-based sharding.
Read more →
Total votes 19: ↑18 and ↓1+17
Comments1

Designing Sound for Pathfinder: Kingmaker

Reading time16 min
Views3.2K


Pathfinder: Kingmaker (PF:K for short) is a role-playing video game created by Owlcat Games, released in Fall 2018 on Steam and GoG. Inspired by classic Bioware games, this project uses a popular board game system ruleset, combat takes place in Real-Time with Pause, follows an isometric camera, and has a non-linear story with multiple unique endings.


In this article, I will share a little about how we worked on designing the audio throughout the game’s development including task management, the search for inspiration, and troubleshooting. An experienced specialist may not find anything particularly groundbreaking in this recap, but beginners and enthusiasts will definitely discover some points of interest.

Total votes 32: ↑32 and ↓0+32
Comments0

Lua in Moscow 2019 conference

Reading time1 min
Views1.6K


On the first Sunday of March, Mail.ru Group’s Moscow office will be hosting the third international Lua conference, Lua in Moscow 2019. The program features talks by Roberto Ierusalimschy and the leading experts in Lua and LuaJIT from Russia and other countries.

Lua is a unique programming language used not only in computer games, but also as an embedded language in such web-programming products as Redis, nginx, Tarantool, OpenResty. Lua is also used for big data analysis and scientific calculations. You can find Lua in many routers, printers and other devices.

You are welcome to join, even if you haven’t been writing in Lua so far. We bet the conference will give you unexpected insights!
Read more →
Total votes 33: ↑31 and ↓2+29
Comments0

Information

Website
vk.com
Registered
Founded
Employees
5,001–10,000 employees
Location
Россия
Representative
Миша Берггрен