In our last article we made a bunch of promises about our speech synthesis.
After a lot of hard work we finally have delivered upon these promises:
Model size reduced 2x;
New models are 10x faster;
We added flags to control stress;
Now the models can make proper pauses;
High quality voice added (and unlimited "random" voices);
All speakers squeezed into the same model;
Input length limitations lifted, now models can work with paragraphs of text;
Pauses, speed and pitch can be controlled via SSML;
Sampling rates of 8, 24 or 48 kHz are supported;
Models are much more stable — they do not omit words anymore;
This is a truly break-through achievement for us and we are not planning to stop anytime soon. We will be adding as many languages as possible shortly (the CIS languages, English, European languages, Hindic languages). Also we are still planning to make our models additional 2-5x faster.
We are also planning to add phonemes and a new model for stress, as well as to reduce the minimum amount of audio required to train a high-quality voice to 5 — 15 minutes.
As usual you can try our model in our repo or in colab.
After Epic released the UE5 technology demo at the beginning of 2021, the discussion about UE5 has never stopped. Related technical discussions mainly centered on two new features: global illumination technology Lumen and extremely high model detail technology Nanite. There have been some articles [1 ][2] analyzing Nanite technology in more detail. This article mainly starts from the RenderDoc analysis and source code of UE5, combined with some existing technical data, aims to provide an intuitive and overview understanding of Nanite, and clarify its algorithm principles and design ideas, without involving too many source code level Implementation details.
Lumen is UE5’s GI system, it is different from the traditional real-time GI which only includes the contribution of indirect diffuse reflection. It also includes indirect diffuse reflection and indirect highlight, providing a new set of complete indirect lighting. Lumen supports both hardware-based RTX and software-based Trace algorithms. The starting point of this article is that Lumen GI uses the process, algorithm, and data structure analysis of indirect diffuse reflection part based on software Trace to understand the basic principle and operation mechanism of Lumen from a macro perspective.
Today, we will share some knowledge points related to resource memory leak. A memory leak is the most common issue that we continuously see and also are afraid of. What is the reason behind it? Because we can’t predict the extent of the leak before we locate the leak bottleneck, we had no idea whether it will burst out at a certain moment on the line. We have received feedback from developers that their players had no problem playing for half an hour, but they would get more and more stuck after 3 to 4 hours of playing, which they never expected before. How can it be solved? Today’s sharing will answer such questions.
UWA’s GOT Online-Assets report has a resource occupancy trend chart. If there is a rising trend like the one below, you must pay special attention.
Since 2011 all Intel GPUs (integrated and discrete Intel Graphics products) include Intel Quick Sync Video (QSV) — the dedicated hardware core for video encoding and decoding. Intel QSV is supported by all popular video processing applications across multiple OSes including FFmpeg. The tutorial focuses on Intel QSV based video encoding and decoding acceleration in Windows native (desktop) applications using FFmpeg/libavcodec for video processing. To illustrate concepts described, the open source 3D Streaming Toolkit is used.
This article continues the series of articles on load tests. Today we will analyze the testing methodology and answer the question: "How many IP cameras can be connected to a WebRTC server?"
Do you remember how just a few years ago it was a disaster to lose a camera at the end of a vacation? All memorable pictures and videos then disappeared along with the lost device. Probably, this fact prompted the great minds to invent cloud storage, so that the safety of records no longer depends on the presence of the devices on which these records are made.
We continue to review variants of load tests. In this article we will go over the testing methodology and conduct a load test that we will use to try and determine the number of users that could watch and stream at the same time, meaning the users will simultaneously publish and view the streams.
This article is a continuation of our series of write-ups about load tests for our server. We have already discussed how to compile metrics and how to use them to choose the equipment, and we also provided an overview of various load testing methods. Today we shall look at how the server handles stream mixing.
So, I finally found a moment to write a bit about how we created the water for TReload. Our basic goal was to flood all of the levels with acid - a lot of acid, as the flooded area is massive :) Here’s one of the results which we got out of this process:
In the previous article we went over a load test whose data could be used to choose a load-appropriate server. In the course of the testing, we would publish a stream on one WCS, and we would pick up that stream several times using a second WCS. The acquired results could be used as a basis for decisions on server operability.
Some would (justly) have concerns regarding the possible biases in such a test — after all, one of our servers was used to test another one of our servers. Could it be that we were using a specially optimized code that skewed the results in our favor?
In any project, a great deal of importance is placed on the selection of server hardware and WebRTC streaming is no exception. One of the key principles of such a selection is balance – the hardware should be powerful enough to handle the streams with no drops in quality, but not too powerful so as to waste resources. So, how does one choose the right server?
In the comment sections of our articles about our server there are often users who say: "Why would you jump through so many hoops, when you can do the same with a single line of code in FFmpeg!?"
In this article we will once again return to the tired topic of webinars and webinar hosting tools. And no, we're not about to code a whole new system for webinar hosting – there are already plenty of those. Instead, we will talk about connecting drawing software to the webinar, so that you could manually draw and broadcast the process.
A few weeks ago we wrote an article about Docker and WebRTC servers and talked about the intricacies of launching containers. Our readers (rightly) questioned whether Docker was a suitable tool for production, for the following reasons:
PVS-Studio has a mascot that became inseparable from the brand - a unicorn. Lately we've been getting many questions about our magic steed: why the unicorn, why has he changed so much, does he have hooves, how come he doesn't wear pants, and how do we draw him. The answers are finally here, in this very article.
Attention: there will be a lot of pictures. And I mean A LOT.
The potential of VoIP to your customers is simply phenomenal. Businesses are experiencing the advantages of VoIP’s cost-efficiency and reliability and now you can pass these benefits onto your own customers very easily. Cloud telecommunication is sophisticated and easily integrated. Confidence in this technology is growing fast. There has never been a better time to start talking to your customers about adopting this solution. It will deliver huge business benefits for them and has the potential to increase business income and profitability.
The popularity of online education is increasing every month. And since there is an increase in popularity, there is an increase in competition and an improvement in quality: many authors of online schools realized that nowadays it's not enough to just make something mediocre using the frontal camera of the iPhone 4. You need to work on the quality of the picture, sound, presentation of material, etc. Not only did the pandemic prove that you can teach salsa and even crochet online, so it also gave an excellent push to improve the materials taught + diversity in the labor market. Moms on maternity leave can rejoice - now you can truly earn a hundred thousand million per second, sitting at home.
During the lockdown, which is now weakening, then becoming more active with a vengeance, a huge number of new professions have appeared. Proficients in creating involving Instagram Stories, course producers, e-learning specialists and other unknown-before-creatures - all of them have mastered a new job that gave them the holy bread in the pandemic. Once you hire a creator of involving Instagram Stories (and let's all hope this person was teached by a professional, not a snake oil salesman), your business grows just with adding some creativity in advertising that will take the heart of even the most fastidious customer. Same works with e-learning specialists: you recruit them and they do all the important tasks from writing articles that will also help in promoting your classes to arranging lessons on your Coursera course. A complete flight of imagination: if it seems to you that some jobs don't exist, then you just don't spend enough time online. In general, online work has become popularized as never and now hardly anyone will say: "He is engaged in some kind of garbage - he makes TikToks but it would be better for him to toil at the factory!". In total, online work has become popularized as never and now hardly anyone will say: "He is engaged in some kind of garbage - he makes TikToks, but it would be better for him to toil at the factory!".