Pull to refresh
44.36
Rating

Sound

This "bzzz" is not for nothing

Show first
Rating limit

Audio API Quick Start Guide: Playing and Recording Sound on Linux, Windows, FreeBSD and macOS

Programming *C *Development for Linux *Development for Windows *Sound
Tutorial

Hearing is one of the few basic senses that we humans have along with the other our abilities to see, smell, taste and touch. If we couldn't hear, the world as we know it would be less interesting and colorful to us. It would be a total silence - a scary thing, even to imagine. And speaking makes our life so much fun, because what else can be better than talking to our friends and family? Also, we're able to listen to our favorite music wherever we are, thanks to computers and headphones. With the help of tiny microphones integrated into our phones and laptops we are now able to talk to the people around the world from any place with an Internet connection. But computer hardware alone isn't enough - it is computer software that really defines the way how and when the hardware should operate. Operating Systems provide the means for that to the apps that want to use computer's audio capabilities. In real use-cases audio data usually goes the long way from one end to another, being transformed and (un)compressed on-the-fly, attenuated, filtered, and so on. But in the end it all comes down to just 2 basic processes: playing the sound or recording it.

Today we're going to discuss how to make use of the API that popular OS provide: this is an essential knowledge if you want to create an app yourself which works with audio I/O. But there's just one problem standing on our way: there is no single API that all OS support. In fact, there are completely different API, different approaches, slightly different logic. We could just use some library which solves all those problems for us, but in that case we won't understand what's really going on under the hood - what's the point? But humans are built the way that we sometimes want to dig a little bit deeper, to learn a little bit more than what just lies on the surface. That's why we're going to learn the API that OS provide by default: ALSA (Linux), PulseAudio (Linux), WASAPI (Windows), OSS (FreeBSD), CoreAudio (macOS).

Read more
Rating 0
Views 1.6K
Comments 0

Riddles of the fast Fourier transform

Programming *Algorithms *Mathematics *Sound Visual programming *
Tutorial

• The method of phase-magnitude interpolation (PMI)

• Accurate measure of frequency, magnitude and phase of signal harmonics

• Detection of resonances

The Fast Fourier Transform (FFT) algorithm is an important tool for analyzing and processing signals of various nature.

It allows to reconstruct magnitude and phase spectrum of a signal into the frequency domain by magnitude sample into the time domain, while the method is computationally optimized with modest memory consumption.

Although there is not losing of any information about the signal during the conversion process (calculations are reversible up to rounding), the algorithm has some peculiarities, which hinder high-precision analysis and fine processing of results further.

The article presents an effective way to overcome such "inconvenient" features of the algorithm.

Read in Russian

Read in English
Rating 0
Views 716
Comments 0

Music on the Commodore PET and the Faulty Robots

Abnormal programming *Assembler *Demoscene Old hardware Sound

After completion of the System Beeps, I wasn’t planning to make another stand alone album release with the pseudo polyphonic music, as I felt the topic had been explored enough. This, however, wouldn’t mean I couldn’t apply the experience and skills gained to make more utilitarian stuff, like an actual retro game OST or an old school demoscene project. Such an opportunity arose in Autumn 2020, as David Murray of The 8-bit Guy Youtube channel fame announced his new game to be in development, the Attack of The PETSCII Robots for Commodore PET and some other Commodore 8-bitters. As I previously worked with David on his previous big release, Planet X3 game for MS-DOS, and this was a perfect opportunity to satisfy my interest towards the pre-graphics era PCs as well as apply my vast experience both in the minimalistic computer music and 6502 assembly programming, I offered my services that had been accepted. Besides the sound code I also had hopes to participate as a music composer this time.

Unfortunately, this time the project didn’t went well on my side, and lots of issues of all kinds eventually turned it into a small scale development hell (you can learn more from a series of posts at my Patreon blog)  The end result was that my code and sound effects were only used in the VIC-20 port, and music for other versions has been created by other people. However, I was left with the full working code of the sound system for PET, and a number of music sketches. It would be a pity to file it into the archive, PET projects aren’t a frequent thing these days, so another chance to use the stuff wouldn’t come any time soon. So I got the idea to develop my music sketches into full songs, and release it as an alternative OST, and having David’s approval it has been done and released in the Winter 2021 as Faulty Robots, a small music album for PET that is available as a digital audio release and a runnable program for the actual PET computer.

Read more
Total votes 5: ↑5 and ↓0 +5
Views 1.2K
Comments 2

Full motion video with digital audio on the classic 8-bit game console

Abnormal programming *Assembler *Old hardware Games and game consoles Sound

Back in 2016 an United States based music composer and performer Sergio Elisondo released an one-man band music album A Winner Is You (know your meme), with multi-instrumental cover versions of tunes from numerous memorable classic NES games. A special feature of this release has been its version released in the NES cartridge format that would run on a classic unmodified console and play digitized audio of the full album, instead of the typical chiptune sound you would expect to come from this humble console. I was involved with the software development part of this project.

This year Sergio makes a return with a brand new music release. This time it is all original music album You Are Error, heavily influenced with the video game music aesthetics. It also comes with a special extra. This time we have raised the stakes, and a new NES cartridge release includes not only the digitized audio, but full motion videos for each song, done in the silhouette cutout style similar to the famous Bad Apple video. Yet again, this project is crowdfunded via Kickstarter. It already got the asked amount in a mere 7 hours, but there is still a little time to jump on the bandwagon and get yourself a copy. In the meantime I would like to share an insight on the technical side of both projects.

Read more
Total votes 4: ↑4 and ↓0 +4
Views 843
Comments 0

High-Quality Text-to-Speech Made Accessible, Simple and Fast

Machine learning *Sound Natural Language Processing *

image


There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.


But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:


  • Naturally sounding speech;
  • A large library of voices in many languages;
  • Support for 16kHz and 8kHz out of the box;
  • No GPUs / ML engineering team / training required;
  • Unique voices not infringing upon third-party licenses;
  • High throughput on slow hardware. Decent performance on one CPU thread;
  • Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
  • Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
  • Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

Total votes 5: ↑5 and ↓0 +5
Views 5.3K
Comments 8

Short-lived Music or MuseScore Code Analysis

PVS-Studio corporate blog Open source *C++ *C *Sound

Having only programming background, it is impossible to develop software in some areas. Take the difficulties of medical software development as an example. The same is with music software, which will be discussed in this article. Here you need an advice of subject matter experts. However, it's more expensive for software development. That is why developers sometimes save on code quality. The example of the MuseScore project check, described in the article, will show the importance of code quality expertise. Hopefully, programming and musical humor will brighten up the technical text.

Читать далее
Total votes 2: ↑1 and ↓1 0
Views 598
Comments 0

Modern Portable Voice Activity Detector Released

Open source *Machine learning *Sound

image


Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.


Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but Voice Activity and Number Detection are quite general tasks.


Key features:


  • Modern, portable;
  • Low memory footprint;
  • Superior metrics to WebRTC;
  • Trained on huge spoken corpora and noise / sound libraries;
  • Slower than WebRTC, but fast enough for IOT / edge / mobile applications;
  • Unlike WebRTC (which mostly tells silence from voice), our VAD can tell voice from noise / music / silence;
  • PyTorch (JIT) and ONNX checkpoints;

Typical use cases:


  • Spoken corpora anonymization;
  • Can be used together with WebRTC;
  • Voice activity detection for IOT / edge / mobile use cases;
  • Data cleaning and preparation, number and voice detection in general;
  • PyTorch and ONNX can be used with a wide variety of deployment options and backends in mind;
Read more →
Rating 0
Views 2.5K
Comments 0

Modern Google-level STT Models Released

Big Data *Machine learning *Start-up development Sound


We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:


  • English;
  • German;
  • Spanish;

You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.


PyTorch ONNX TensorFlow Quality Colab
English (en_v1) link Open In Colab
German (de_v1) link Open In Colab
Spanish (es_v1) link Open In Colab
Read more →
Total votes 9: ↑9 and ↓0 +9
Views 3.7K
Comments 1

Juggling work and study at ITMO University: CS edition

ITMO University corporate blog Programming *Studying in IT IT career Sound
We talked to the graduates of the Speech Information Systems MA program at ITMO about the ways our university helped jumpstart their careers. [More stories from our startups]:


Read more →
Total votes 6: ↑5 and ↓1 +4
Views 768
Comments 0

Bluetooth stack modifications to improve audio quality on headphones without AAC, aptX, or LDAC codecs

Development for Android *Wireless technologies *Sound
Before reading this article, it is recommended to read the previous one: Audio over Bluetooth: most detailed information about profiles, codecs, and devices / по-русски

Some wireless headphone users note low sound quality and lack of high frequencies when using the standard Bluetooth SBC codec, which is supported by all headphones and other Bluetooth audio devices. A common recommendation to get better sound quality is to buy devices and headphones with aptX or LDAC codecs support. These codecs require licensing fees, that's why devices with them are more expensive.

It turns out that the low quality of SBC is caused by artificial limitations of all current Bluetooth stacks and headphones' configuration, and this limitation can be circumvented on any existing device with software modification only.
Read more →
Total votes 22: ↑21 and ↓1 +20
Views 34K
Comments 3

Audio over Bluetooth: most detailed information about profiles, codecs, and devices

Wireless technologies *Sound
XKCD comic. How standards proliferate. SITUATION: there are 14 competing standards. Geek: 14?! Ridiculous! We need to develop one universal standard that covery everyone's use cases. Geek's girlfriend: yeah! SOON: Situation: there are 15 competing standards.

This article is also available in Russian / Эта статья также доступна на русском языке

The mass market of smartphones without the 3.5 mm audio jack changed headphones industry, wireless Bluetooth headphones have become the main way to listen to music and communicate in headset mode for many users.
Bluetooth device manufacturers rarely disclose detailed product specifications, and Bluetooth audio articles on the Internet are contradictory and sometimes incorrect. They do not tell about all the features, and often publish the same false information.
Let's try to understand the protocol, the capabilities of Bluetooth stacks, headphones and speakers, Bluetooth codecs for music and speech, find out what affects the quality of the transmitted audio and the delay, learn how to capture and decode information about supported codecs and other device features.

TL;DR:

  • SBC codec is OK
  • Headphones have their own per-codec equalizer and post processing configuration
  • aptX is not as good as the advertisements say
  • LDAC is a marketing fluff
  • Voice audio quality is still low
  • Browsers are able to execute audio encoders compiled to WebAssembly from C using emscripten, and they won't even lag.

Total votes 24: ↑23 and ↓1 +22
Views 256K
Comments 9

Designing Sound for Pathfinder: Kingmaker

VK corporate blog Game development *Sound


Pathfinder: Kingmaker (PF:K for short) is a role-playing video game created by Owlcat Games, released in Fall 2018 on Steam and GoG. Inspired by classic Bioware games, this project uses a popular board game system ruleset, combat takes place in Real-Time with Pause, follows an isometric camera, and has a non-linear story with multiple unique endings.


In this article, I will share a little about how we worked on designing the audio throughout the game’s development including task management, the search for inspiration, and troubleshooting. An experienced specialist may not find anything particularly groundbreaking in this recap, but beginners and enthusiasts will definitely discover some points of interest.

Total votes 32: ↑32 and ↓0 +32
Views 2.8K
Comments 0

PC Speaker To Eleven

Abnormal programming *Assembler *Demoscene Old hardware Sound
Known now as a «motherboard speaker», or just «beeper», PC Speaker has been introduced in 1981 along with the first personal IBM computer. Being a successor of the big serious computers for serious business, it has been designed to produce very basic system beeps, so it never really had a chance to shine bright as a music device in numerous entertainment programs of the emerging home market. Overshadowed by much more advanced sound chips of popular home game systems, quickly replaced with powerful sound cards, it mostly served as a fallback option, playing severely downgraded content of better sound hardware.

«System Beeps» is a music album in shape of an MS-DOS program that features original music composed for PC Speaker using the same basic old techniques like ones found in classic PC games. It follows the usual retro computing demoscene formula — take something rusty and obsolete, and push it to eleven — and attempts to reveal the long hidden potential of this humble little sound device. You can hear it in action and form an opinion on how successful this attempt was at Bandcamp, or in the video below. The following article is an in-depth overview of the original PC Speaker capabilities and making of the project, for those who would like to know more.

Total votes 34: ↑32 and ↓2 +30
Views 33K
Comments 3

Authors' contribution