Voice user interfaces *

using voice input to control computers and devices

ArticlesPostsNewsAuthors

snakers4 Jun 30 2022 at 12:39

Multilingual Text-to-Speech Models for Indic Languages

5 min

2.7K

Machine learning*Natural Language Processing*Voice user interfaces*

In this article, we shall provide some background on how multilingual multi-speaker models work and test an Indic TTS model that supports 9 languages and 17 speakers (Hindi, Malayalam, Manipuri, Bengali, Rajasthani, Tamil, Telugu, Gujarati, Kannada).

It seems a bit counter-intuitive at first that one model can support so many languages and speakers provided that each Indic language has its own alphabet, but we shall see how it was implemented.

Also, we shall list the specs of these models like supported sampling rates and try something cool – making speakers of different Indic languages speak Hindi. Please, if you are a native speaker of any of these languages, share your opinion on how these voices sound, both in their respective language and in Hindi.

snakers4 Apr 12 2022 at 18:08

Our new public speech synthesis in super-high quality, 10x faster and more stable

3 min

Natural Language Processing*Voice user interfaces*Machine learning*

hero_image

In our last article we made a bunch of promises about our speech synthesis.

After a lot of hard work we finally have delivered upon these promises:

Model size reduced 2x;
New models are 10x faster;
We added flags to control stress;
Now the models can make proper pauses;
High quality voice added (and unlimited "random" voices);
All speakers squeezed into the same model;
Input length limitations lifted, now models can work with paragraphs of text;
Pauses, speed and pitch can be controlled via SSML;
Sampling rates of 8, 24 or 48 kHz are supported;
Models are much more stable — they do not omit words anymore;

This is a truly break-through achievement for us and we are not planning to stop anytime soon. We will be adding as many languages as possible shortly (the CIS languages, English, European languages, Hindic languages). Also we are still planning to make our models additional 2-5x faster.

We are also planning to add phonemes and a new model for stress, as well as to reduce the minimum amount of audio required to train a high-quality voice to 5 — 15 minutes.

As usual you can try our model in our repo or in colab.

+13

elena_zz Jul 8 2021 at 14:11

The benefits of offering VoIP to your customers under your own brand

3 min

1.9K

Zadarma corporate blogCloud services*Development of communication systems*API*Voice user interfaces*

The potential of VoIP to your customers is simply phenomenal. Businesses are experiencing the advantages of VoIP’s cost-efficiency and reliability and now you can pass these benefits onto your own customers very easily. Cloud telecommunication is sophisticated and easily integrated. Confidence in this technology is growing fast. There has never been a better time to start talking to your customers about adopting this solution. It will deliver huge business benefits for them and has the potential to increase business income and profitability.

elena_zz Dec 5 2019 at 12:34

Unleashing the Potential of CRM and Virtual Telephony Integration

3 min

Zadarma corporate blogCRM systems*Voice user interfaces*

Virtual phone systems and CRM need little introduction to anyone active in the business world. A look at a range of recent studies indicates that close to 40% of American businesses make at least partial use of VoIP today. The CRM software market is also growing rapidly with estimates that it will be bringing in an astonishing $80 billion in revenues in five years time. What do these two technologies share in common?