There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.
But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:
- Naturally sounding speech;
- A large library of voices in many languages;
- Support for
8kHzout of the box;
- No GPUs / ML engineering team / training required;
- Unique voices not infringing upon third-party licenses;
- High throughput on slow hardware. Decent performance on one CPU thread;
- Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
- Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
- Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;
We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.