Pull to refresh
76.62

Sound

This "bzzz" is not for nothing

Show first
Rating limit
Level of difficulty

Toward the January meetup on portable SystemVerilog examples in Silicon Valley

Reading time4 min
Views1.2K

The team developing a set of portable SystemVerilog examples decided to organize the first event in Silicon Valley on Sunday, January 14 from 2PM till 5PM at Hacker Dojo in Mountain View, CA. If the first event is successful we are going to make it recurrent. You can register for the event on Meetup or LinkedIn.

The current directions of the group:

Read more
Total votes 6: ↑4 and ↓2+6
Comments0

Wi-Fi internet radio from a router with station switching capability

Level of difficultyMedium
Reading time8 min
Views1.9K
Wi-Fi интернет-радиоприёмник.
ChatGPT was not used in writing this article.
The animated image uses the webp file format instead of gif.
In this article, you will find a complete description of how to make a Wi-Fi internet radio receiver from a router that can play mp3 streams from internet radio stations.
It is also possible to switch between two internet radio stations. We will use OpenWRT firmware installed on the router to create a Wi-Fi internet radio. It is possible to complete this project without using a soldering iron. All the components can be placed inside the router to create a finished device — a Wi-Fi internet radio.
To make this, you will need:
Read more →
Total votes 4: ↑4 and ↓0+4
Comments1

http://GuitarFX.Net Digital (SOFTWARE) modeling of analog guitar distortion, reverb, noise gate etc. effects

Reading time5 min
Views1.4K

The sound of rock music, in particular of hard rock and heavy metal, is largely based on a specially distorted guitar sound, for which electronic “distortion” devices, tube amplifiers in “overloaded” mode, computers with appropriate software and digital processors are used. increasingly using neural network algorithms.

The distorted sound of electric guitars began to gain popularity around the 1960s. Since that time, the sound of overloaded tube amplifiers, connected to powerful dedicated guitar speakers with large dedicated speakers, has been considered the benchmark in rock music. But tube amplifiers were relatively expensive and inconvenient to operate. Therefore, semiconductor distortion devices were developed.

At that time, the electrical circuitry of distortion devices was relatively simple and the signal output from their output only vaguely resembled the sound of an overloaded tube amplifier. Nevertheless, it was still somewhat similar to the “sound of a lamp” and this provided a powerful incentive for designers of analog semiconductor distortion circuits to continue their research, complicate circuits and propose new circuit solutions. The heyday of analog solid-state distortion was around 1995-2010. The most popular were electrical circuit diagrams like those shown in the figure below.

Read more
Total votes 3: ↑3 and ↓0+3
Comments4

Audio API Quick Start Guide: Playing and Recording Sound on Linux, Windows, FreeBSD and macOS

Reading time35 min
Views19K

Hearing is one of the few basic senses that we humans have along with the other our abilities to see, smell, taste and touch. If we couldn't hear, the world as we know it would be less interesting and colorful to us. It would be a total silence - a scary thing, even to imagine. And speaking makes our life so much fun, because what else can be better than talking to our friends and family? Also, we're able to listen to our favorite music wherever we are, thanks to computers and headphones. With the help of tiny microphones integrated into our phones and laptops we are now able to talk to the people around the world from any place with an Internet connection. But computer hardware alone isn't enough - it is computer software that really defines the way how and when the hardware should operate. Operating Systems provide the means for that to the apps that want to use computer's audio capabilities. In real use-cases audio data usually goes the long way from one end to another, being transformed and (un)compressed on-the-fly, attenuated, filtered, and so on. But in the end it all comes down to just 2 basic processes: playing the sound or recording it.

Today we're going to discuss how to make use of the API that popular OS provide: this is an essential knowledge if you want to create an app yourself which works with audio I/O. But there's just one problem standing on our way: there is no single API that all OS support. In fact, there are completely different API, different approaches, slightly different logic. We could just use some library which solves all those problems for us, but in that case we won't understand what's really going on under the hood - what's the point? But humans are built the way that we sometimes want to dig a little bit deeper, to learn a little bit more than what just lies on the surface. That's why we're going to learn the API that OS provide by default: ALSA (Linux), PulseAudio (Linux), WASAPI (Windows), OSS (FreeBSD), CoreAudio (macOS).

Read more
Rating0
Comments0

Riddles of the fast Fourier transform

Reading time10 min
Views1.6K

• The method of phase-magnitude interpolation (PMI)

• Accurate measure of frequency, magnitude and phase of signal harmonics

• Detection of resonances

The Fast Fourier Transform (FFT) algorithm is an important tool for analyzing and processing signals of various nature.

It allows to reconstruct magnitude and phase spectrum of a signal into the frequency domain by magnitude sample into the time domain, while the method is computationally optimized with modest memory consumption.

Although there is not losing of any information about the signal during the conversion process (calculations are reversible up to rounding), the algorithm has some peculiarities, which hinder high-precision analysis and fine processing of results further.

The article presents an effective way to overcome such "inconvenient" features of the algorithm.

Read in Russian

Read in English
Rating0
Comments0

Music on the Commodore PET and the Faulty Robots

Reading time26 min
Views2.9K

After completion of the System Beeps, I wasn’t planning to make another stand alone album release with the pseudo polyphonic music, as I felt the topic had been explored enough. This, however, wouldn’t mean I couldn’t apply the experience and skills gained to make more utilitarian stuff, like an actual retro game OST or an old school demoscene project. Such an opportunity arose in Autumn 2020, as David Murray of The 8-bit Guy Youtube channel fame announced his new game to be in development, the Attack of The PETSCII Robots for Commodore PET and some other Commodore 8-bitters. As I previously worked with David on his previous big release, Planet X3 game for MS-DOS, and this was a perfect opportunity to satisfy my interest towards the pre-graphics era PCs as well as apply my vast experience both in the minimalistic computer music and 6502 assembly programming, I offered my services that had been accepted. Besides the sound code I also had hopes to participate as a music composer this time.

Unfortunately, this time the project didn’t went well on my side, and lots of issues of all kinds eventually turned it into a small scale development hell (you can learn more from a series of posts at my Patreon blog)  The end result was that my code and sound effects were only used in the VIC-20 port, and music for other versions has been created by other people. However, I was left with the full working code of the sound system for PET, and a number of music sketches. It would be a pity to file it into the archive, PET projects aren’t a frequent thing these days, so another chance to use the stuff wouldn’t come any time soon. So I got the idea to develop my music sketches into full songs, and release it as an alternative OST, and having David’s approval it has been done and released in the Winter 2021 as Faulty Robots, a small music album for PET that is available as a digital audio release and a runnable program for the actual PET computer.

Read more
Total votes 5: ↑5 and ↓0+5
Comments1

Full motion video with digital audio on the classic 8-bit game console

Reading time13 min
Views1.4K

Back in 2016 an United States based music composer and performer Sergio Elisondo released an one-man band music album A Winner Is You (know your meme), with multi-instrumental cover versions of tunes from numerous memorable classic NES games. A special feature of this release has been its version released in the NES cartridge format that would run on a classic unmodified console and play digitized audio of the full album, instead of the typical chiptune sound you would expect to come from this humble console. I was involved with the software development part of this project.

This year Sergio makes a return with a brand new music release. This time it is all original music album You Are Error, heavily influenced with the video game music aesthetics. It also comes with a special extra. This time we have raised the stakes, and a new NES cartridge release includes not only the digitized audio, but full motion videos for each song, done in the silhouette cutout style similar to the famous Bad Apple video. Yet again, this project is crowdfunded via Kickstarter. It already got the asked amount in a mere 7 hours, but there is still a little time to jump on the bandwagon and get yourself a copy. In the meantime I would like to share an insight on the technical side of both projects.

Read more
Total votes 4: ↑4 and ↓0+4
Comments0

High-Quality Text-to-Speech Made Accessible, Simple and Fast

Reading time8 min
Views10K

image


There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.


But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:


  • Naturally sounding speech;
  • A large library of voices in many languages;
  • Support for 16kHz and 8kHz out of the box;
  • No GPUs / ML engineering team / training required;
  • Unique voices not infringing upon third-party licenses;
  • High throughput on slow hardware. Decent performance on one CPU thread;
  • Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
  • Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
  • Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

Total votes 5: ↑5 and ↓0+5
Comments8

Short-lived Music or MuseScore Code Analysis

Reading time11 min
Views828

Having only programming background, it is impossible to develop software in some areas. Take the difficulties of medical software development as an example. The same is with music software, which will be discussed in this article. Here you need an advice of subject matter experts. However, it's more expensive for software development. That is why developers sometimes save on code quality. The example of the MuseScore project check, described in the article, will show the importance of code quality expertise. Hopefully, programming and musical humor will brighten up the technical text.

Читать далее
Total votes 2: ↑1 and ↓10
Comments0

Modern Portable Voice Activity Detector Released

Reading time3 min
Views5.4K

image


Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.


Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but Voice Activity and Number Detection are quite general tasks.


Key features:


  • Modern, portable;
  • Low memory footprint;
  • Superior metrics to WebRTC;
  • Trained on huge spoken corpora and noise / sound libraries;
  • Slower than WebRTC, but fast enough for IOT / edge / mobile applications;
  • Unlike WebRTC (which mostly tells silence from voice), our VAD can tell voice from noise / music / silence;
  • PyTorch (JIT) and ONNX checkpoints;

Typical use cases:


  • Spoken corpora anonymization;
  • Can be used together with WebRTC;
  • Voice activity detection for IOT / edge / mobile use cases;
  • Data cleaning and preparation, number and voice detection in general;
  • PyTorch and ONNX can be used with a wide variety of deployment options and backends in mind;
Read more →
Rating0
Comments0

Modern Google-level STT Models Released

Reading time2 min
Views5.2K


We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:


  • English;
  • German;
  • Spanish;

You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.


PyTorch ONNX TensorFlow Quality Colab
English (en_v1) link Open In Colab
German (de_v1) link Open In Colab
Spanish (es_v1) link Open In Colab
Read more →
Total votes 9: ↑9 and ↓0+9
Comments0

Juggling work and study at ITMO University: CS edition

Reading time3 min
Views1.2K
We talked to the graduates of the Speech Information Systems MA program at ITMO about the ways our university helped jumpstart their careers. [More stories from our startups]:


Read more →
Total votes 5: ↑4 and ↓1+4
Comments0

Bluetooth stack modifications to improve audio quality on headphones without AAC, aptX, or LDAC codecs

Reading time7 min
Views69K
Before reading this article, it is recommended to read the previous one: Audio over Bluetooth: most detailed information about profiles, codecs, and devices / по-русски

Some wireless headphone users note low sound quality and lack of high frequencies when using the standard Bluetooth SBC codec, which is supported by all headphones and other Bluetooth audio devices. A common recommendation to get better sound quality is to buy devices and headphones with aptX or LDAC codecs support. These codecs require licensing fees, that's why devices with them are more expensive.

It turns out that the low quality of SBC is caused by artificial limitations of all current Bluetooth stacks and headphones' configuration, and this limitation can be circumvented on any existing device with software modification only.
Read more →
Total votes 22: ↑21 and ↓1+20
Comments3

Audio over Bluetooth: most detailed information about profiles, codecs, and devices

Reading time24 min
Views311K
XKCD comic. How standards proliferate. SITUATION: there are 14 competing standards. Geek: 14?! Ridiculous! We need to develop one universal standard that covery everyone's use cases. Geek's girlfriend: yeah! SOON: Situation: there are 15 competing standards.

This article is also available in Russian / Эта статья также доступна на русском языке

The mass market of smartphones without the 3.5 mm audio jack changed headphones industry, wireless Bluetooth headphones have become the main way to listen to music and communicate in headset mode for many users.
Bluetooth device manufacturers rarely disclose detailed product specifications, and Bluetooth audio articles on the Internet are contradictory and sometimes incorrect. They do not tell about all the features, and often publish the same false information.
Let's try to understand the protocol, the capabilities of Bluetooth stacks, headphones and speakers, Bluetooth codecs for music and speech, find out what affects the quality of the transmitted audio and the delay, learn how to capture and decode information about supported codecs and other device features.

TL;DR:

  • SBC codec is OK
  • Headphones have their own per-codec equalizer and post processing configuration
  • aptX is not as good as the advertisements say
  • LDAC is a marketing fluff
  • Voice audio quality is still low
  • Browsers are able to execute audio encoders compiled to WebAssembly from C using emscripten, and they won't even lag.

Total votes 24: ↑23 and ↓1+22
Comments8

Designing Sound for Pathfinder: Kingmaker

Reading time16 min
Views3.3K


Pathfinder: Kingmaker (PF:K for short) is a role-playing video game created by Owlcat Games, released in Fall 2018 on Steam and GoG. Inspired by classic Bioware games, this project uses a popular board game system ruleset, combat takes place in Real-Time with Pause, follows an isometric camera, and has a non-linear story with multiple unique endings.


In this article, I will share a little about how we worked on designing the audio throughout the game’s development including task management, the search for inspiration, and troubleshooting. An experienced specialist may not find anything particularly groundbreaking in this recap, but beginners and enthusiasts will definitely discover some points of interest.

Total votes 32: ↑32 and ↓0+32
Comments0

PC Speaker To Eleven

Reading time12 min
Views35K
Known now as a «motherboard speaker», or just «beeper», PC Speaker has been introduced in 1981 along with the first personal IBM computer. Being a successor of the big serious computers for serious business, it has been designed to produce very basic system beeps, so it never really had a chance to shine bright as a music device in numerous entertainment programs of the emerging home market. Overshadowed by much more advanced sound chips of popular home game systems, quickly replaced with powerful sound cards, it mostly served as a fallback option, playing severely downgraded content of better sound hardware.

«System Beeps» is a music album in shape of an MS-DOS program that features original music composed for PC Speaker using the same basic old techniques like ones found in classic PC games. It follows the usual retro computing demoscene formula — take something rusty and obsolete, and push it to eleven — and attempts to reveal the long hidden potential of this humble little sound device. You can hear it in action and form an opinion on how successful this attempt was at Bandcamp, or in the video below. The following article is an in-depth overview of the original PC Speaker capabilities and making of the project, for those who would like to know more.

Total votes 34: ↑32 and ↓2+30
Comments3

Authors' contribution