Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.
This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.
We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.