Pull to refresh
26.77
Rating

Natural Language Processing *

Computer analysis and synthesis of natural languages

Show first
  • New
  • Top
Rating limit
  • All
  • ≥0
  • ≥10
  • ≥25
  • ≥50
  • ≥100

Context category

Search engines *Semantics *Algorithms *Natural Language Processing *
Translation

The mathematical model of signed sequences with repetitions (texts) is a multiset. The multiset was defined by D. Knuth in 1969 and later studied in detail by A. B. Petrovsky [1]. The universal property of a multiset is the existence of identical elements. The limiting case of a multiset with unit multiplicities of elements is a set. A set with unit multiplicities corresponding to a multiset is called its generating set or domain. A set with zero multiplicity is an empty set.

Read more
Total votes 1: ↑1 and ↓0 +1
Views 650
Comments 0

High-Quality Text-to-Speech Made Accessible, Simple and Fast

Machine learning *Sound Natural Language Processing *

image


There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.


But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:


  • Naturally sounding speech;
  • A large library of voices in many languages;
  • Support for 16kHz and 8kHz out of the box;
  • No GPUs / ML engineering team / training required;
  • Unique voices not infringing upon third-party licenses;
  • High throughput on slow hardware. Decent performance on one CPU thread;
  • Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
  • Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
  • Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

Total votes 5: ↑5 and ↓0 +5
Views 3K
Comments 5

Converting text into algebra

Search engines *Semantics *Algorithms *Natural Language Processing *
Translation

Algebra and language (writing) are two different learning tools. When they are combined, we can expect new methods of machine understanding to emerge. To determine the meaning (to understand) is to calculate how the part relates to the whole. Modern search algorithms already perform the task of meaning recognition, and Google’s tensor processors perform matrix multiplications (convolutions) necessary in an algebraic approach. At the same time, semantic analysis mainly uses statistical methods. Using statistics in algebra, for instance, when looking for signs of numbers divisibility, would simply be strange. Algebraic apparatus is also useful for interpreting the calculations results when recognizing the meaning of a text.

Читать далее
Total votes 1: ↑1 and ↓0 +1
Views 796
Comments 0

Playing with Nvidia's New Ampere GPUs and Trying MIG

Image processing *Big Data *Machine learning *Computer hardware Natural Language Processing *


Every time when the essential question arises, whether to upgrade the cards in the server room or not, I look through similar articles and watch such videos.


Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:


  • The authors usually take into account only the "adequacy" for the market of new cards in the United States;
  • The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
  • The popular mantra to train more and more gigantic models makes its own adjustments to the comparison;

The answer to the question "which card is better?" is not rocket science: Cards of the 20* series didn't get much popularity, while the 1080 Ti from Avito (Russian craigslist) still are very attractive (and, oddly enough, don't get cheaper, probably for this reason).


All this is fine and dandy and the standard benchmarks are unlikely to lie too much, but recently I learned about the existence of Multi-Instance-GPU technology for A100 video cards and native support for TF32 for Ampere devices and I got the idea to share my experience of the real testing cards on the Ampere architecture (3090 and A100). In this short note, I will try to answer the questions:


  • Is the upgrade to Ampere worth it? (spoiler for the impatient — yes);
  • Are the A100 worth the money (spoiler — in general — no);
  • Are there any cases when the A100 is still interesting (spoiler — yes);
  • Is MIG technology useful (spoiler — yes, but for inference and for very specific cases for training);
Read more →
Total votes 5: ↑5 and ↓0 +5
Views 1.8K
Comments 0

How to find an English teacher. Part 2

Python *Programming *Data visualization Machine learning *Natural Language Processing *
image

This is a continuation of story about using Data Science for finding an English teacher. If you have not read it yet - there is an opportunity to become familiar with it

Briefly  -  we had information about language teachers and tried to apply some basic ideas using pandas and our expectations. Unfortunately we got stuck on the third step, because there is not enough information for resolving our the last requirements  -  we need not more 3 candidates at the end.

Disclaimer
It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.
Rating 0
Views 555
Comments 0

How to find an English teacher. Part 1

Python *Programming *Data Mining *Data visualization Natural Language Processing *


In the modern world, here and there ideas are arising about using data science for an extra benefit. For instance, Google can use a history of watched videos for providing recommendations about new ones. Online shops are using a recommendation system for increasing your receipt. However… if companies use the data for their benefit, could we do the same for own needs such as looking an online English teacher?


Disclaimer

It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.

Total votes 2: ↑1 and ↓1 0
Views 1.1K
Comments 0

Keyword Tree: graph analysis for semantic extraction

Data visualization Machine learning *Natural Language Processing *

image


This post is a small abstract of full-scaled research focused on keyword recognition. Technique of semantics extraction was initially applied in field of social media research of depressive patterns. Here I focus on NLP and math aspects without psychological interpretation. It is clear that analysis of single word frequencies is not enough. Multiple random mixing of collection does not affect the relative frequency but destroys information totally — bag of words effect. We need more accurate approach for the mining of semantics attractors.

Read more →
Total votes 8: ↑7 and ↓1 +6
Views 862
Comments 0

Authors' contribution