PC Speaker To Eleven
«System Beeps» is a music album in shape of an MS-DOS program that features original music composed for PC Speaker using the same basic old techniques like ones found in classic PC games. It follows the usual retro computing demoscene formula — take something rusty and obsolete, and push it to eleven — and attempts to reveal the long hidden potential of this humble little sound device. You can hear it in action and form an opinion on how successful this attempt was at Bandcamp, or in the video below. The following article is an in-depth overview of the original PC Speaker capabilities and making of the project, for those who would like to know more.
PC Speaker Internals
PC Speaker is a small magnetic driven loudspeaker, or, in more recent times, a piezoelectric buzzer, directly driven by a channel of the 8253 Programmable Interval Timer that divides incoming clock frequency of 1.19 MHz by a programmed 16-bit value. To produce sound, the timer is set to the square wave generator mode. It is a small but handy improvement compared to the contemporary low cost home computers such as Apple II and ZX Spectrum, where speaker has been connected directly to an I/O line and the sound meant to be produced all by the CPU. PIT usage allows CPU to just set up a frequency, and continue to handle other tasks while timer generates sound. However, CPU can produce much more interesting multi-timbral polyphonic sounds, while PC Speaker itself, without major aid from CPU, is incapable to sound loud or quiet, soft or sharp — it is only limited to the plain square wave without volume control.
To play sound effects or music, CPU updates state of the timer, that is, sound on/off and its pitch, in evenly spread out periods of time. Usually this is done in the interrupt handler of the Timer Channel 0 (IRQ0) that is driven by the same PIT chip. The default interrupt rate is 18.2 Hz, but games often increase this rate up to 30...200 Hz, depending on a game. This helps to improve sound complexity a little bit.
PC Speaker can effectively play tones in the 100...2000 Hz range. It can go well lower or higher, but this brings out some issues. The small loudspeakers and piezos can't reproduce other frequencies well enough. Above 2000 Hz it also starts to deviate from musical note frequencies a lot due to the timer resolution. Below 100 Hz the sound frequency can become lower than the update period, leading to losing some of the sound updates, because timer only updates frequency divider once current count with previous divider is done.
PC Speaker is monophonic device — it only has one sound channel or «voice», that is, only capable to play one sound frequency at a time. This is very limiting for musical purposes, thus game developers and music composers were figuring out some ways to achieve true or fake polyphony. Three common approaches has been developed and used over the time.
The first approach considers that PC Speaker plays square wave just like usual, the player code updates its pitch in the timer interrupt at a set rate, making nearly no load to the CPU. But there is 2-3 virtual channels of sound with different parts playing, with only one of them being output to the actual speaker at a given moment, and they're alternating with each update. For example, there is two channels. In one update the first one gets directed to the speaker (frequency divider and speaker on/off state sent to the sound registers), in another update the second one gets directed to the speaker instead. That's how some kind of polyphony achieved in Lotus III and Xenon 2. A quite common case here is that one channel plays bass part while the other one plays melody, and these parts is far separated by frequency, making major pitch jumps, or one channel plays a note while the other one is muted — in both cases it leads to unpleasant constant buzz. It can be reduced by omiting any muted notes in the music at all, as Golden Axe does, which limits means for musical expression a lot (pauses has as much importance as notes in music). Another way is to avoid alternating channels whenever possible, like when only one of the channels plays a note. It helps to make these parts to sound cleaner, like heard in Stunts.
The second approach is seen in many Lucas Art games. The company often put an extra effort to make Speaker versions of their music sound better. These usually play a mostly monophonic melody that has both melodic, bass, and rhythmic parts separated in time. The second channel plays very short staccato notes, such as arpeggio or drums, that take priority, briefly muting the main channel. Examples can be found in Monkey Island, Loom, Indiana Jones. One of the most advanced uses of this technique is heard in Zak McKracken and the Alien Mindbenders. This makes much more realistic and pleasant polyphonic illusion, although quite limited in capabilities.
With the last, third approach, PC Speaker hardware is not used to generate sound on its own, rather being used as a crude DAC that just outputs sound generated by CPU all in software. This takes a lot of CPU time, nearly maxing out 8086 load, so it only gained popularity with arrival of 80386. Sound quality of this method is rather poor and is objectively inferior to that of a simplest Covox replica made of handful of resistors. Nevertheless, it was a truly impressive achievement for the time, so much as one of its implementations, RealSound, has been patented and sub-licensed.
PC Speaker can be turned in a DAC either by disabling timer count then enabling/disabling sound output, which makes it a very basic 1-bit DAC, or by using sound channel of the PIT to generate short pulses of variable width (PWM) at carrier frequency set by the timer interrupt, which effectively gives a much better 6-bit DAC.The former can be heard in Fantasy World Dizzy and Hard Drivin', the latter most often was employed to play digitized sample-based music, like in Pinball Dreams. It also has been used in such fascinating tools of the past as TEMU and VSB — software emulators of the 3-channel Tandy sound chip and digital part of the Sound Blaster, that allowed to have sound through PC Speaker in programs that supported named devices without having the devices itself (required 386SX or better).
Album and How It Was Made
Originally I had no intention to make a music album. I was designing a game project that would be styled as a pseudo graphics game of the XT era. To complete the vision I came up with an idea to make not just stylization, but actual PC Speaker music that would aslo follow the common 8-bit era games formula — a looped track about a minute long. I didn't want to use digital samples or software synthesis, as it would be out of place for considered aesthetics, and wouldn't have an unique character in itself. I felt a special appeal in the monophonic music since I heard it for first time on ZX Spectrum in games such as Ping Pong, Stardust, Score 3020. Having such an opportunity, I was eager to try to make an actual monophonic music, and attempt to squeeze out some previously unheard result via arrangement means — an appealing challenge to me as a music composer.
The work on the music get carried me away, and interested more than the game project itself. Having a blast, I developed and published PCSPE, the VSTi plugin, made many music sketches, and a few cover versions of my older works, in order to develop composing techniques suitable for monophonic arrangement, and to shape up general sound design. Eventually I decided that the result does not fit the game project that by the time lost its appeal to me and was put on hold. However, I had some music done, and feeling that it does have some release-worthy potential, I thought it would be nice to do a quick prod and release it to the public just as a standalone MS-DOS music disk, that would also serve as a demo for the PCSPE capabilities.
Quick prod turned out to be not so quick. A few songs became a few dozens, collection of random songs evolved into conceptualized album, plans has been changing, deadlines postponing. It ended up to be almost 1.5 years to finish the project, from July 2017 till January 7, 2019. During this time I also managed to develop the AONDEMO and compose a song for it (AON sound hardware is a close match to the PC Speaker), as well as participate in the Planet X3 game project as sound code and tools developer. This game also ended up using the very first track I composed for the album as titular song.
Finishing the project get delayed a lot, it was getting more and more songs, until I used up all of the original ideas and drafts that were made while working on the project. The final track list includes 23 songs divided into three groups.
Side A includes 16 primary songs, more or less connected both with theme and sound design. Most of these has been composed for this album from scratch, or were based on old unused material that fit well for monophonic arrangement. For this category I picked those songs that were most original and felt most successful within chosen limitations, compared to others.
Side B includes 6 songs supposedly of B-side quality. Most of these are cover versions of my older works for other platforms, or were based on a more developed unused backlog material that was initially targeted to very different mediums, ranging from an XM tracker module to a pop punk song. This side also included the song that has been used in the Planet X3 game.
Side X includes a bonus track, adapted from AONDEMO with minimal changes. It was separated from the rest into a third side, because humor.
The album itself has been presented as a program for MS-DOS 3.3 or above, that will run on classic IBM PC XT with at least 256K RAM and CGA video card. The original CGA is a subject to the so called «snow», a visual artifact that occurs whenever a program is writing to the video memory during raster scan. Thus the album program comes in two flavors — a cut down sbx.com without snowing, but without spectrum analyzer effect, and fully featured sb.com with complete visuals. A faster CPU is recommended to make the visuals smoother. You can also use a modern PC with FreeDOS loaded from an USB stick, or DosBox emulation.
The album does not employ any advanced tricks such as software sound synthesis or sample playing. It uses the most traditional technology — classic monophonic square wave generated by the PIT channel. Sound update rate is 120 Hz, a tad higher than usual, but nothing extreme. Classic XT is totally up to the task, with enough of CPU time remaining to perform other tasks besides playing the music.
The real trick is in a special approach to the music composing and arranging. Past experience within related areas of knowledge was a great help — the experience of making music for classic sound chips that often required to put a few parts into single channel by interleaving and overlaying elements; the experience of developing game sound engines for old game systems, where sound effects usually has to steal sound channels from music; also experience of composing original music to be played on a floppy drive. As it turned out, I was actually developing the approach that has been used by Lucas Art, although I wasn't aware of it, and only heard Monkey Island theme at the time.
One interesting problem that needed to be solved right away was the drums and percussion. In traditional chiptune they're normally done with mixture of tone and white noise. However, there is no way for PC Speaker to produce white noise while keeping the sound update rate within a range of a dozen to a hundred Hz.
Kick drum and toms worked just fine without using any noise, just as tone slide down with different speed and duration — kick is a rapid slide down from a lower note, toms is a slower and longer slide down from a higher note. The most important, snare drum, however, did not work well as a simple tone slide, it sounds weird and does not cut through sound mix busy with other elements. A trick that is common in SID and AY-3-8910 chiptune worked well here — a brief moment of silence inserted just after slide down starts, that creates a short bounce drum roll effect. This makes the snare sound different enough from the other percussion sounds, and improves audibility of the snare drum in the sound mix. It works especially well considering strong resonances that real PC Speaker does have.
As there is no way to produce white noise, hi-hats were a no go. To make percussion sound more interesing, drum sounds are varying between the songs. Some tracks has longer drums, some short and punchy, sometimes they pitched low or high, extra percussion elements present at places.
A number of general techniques has been developed while working on the music. It can be applied to writing arrangements that would give impression of having polyphony for any monophonic device without volume control — ranging from a music postcard to a CNC machine to a Tesla coil. Variety of the arranging tricks includes:
- The hearing perception trick when a presumably louder sound, such as kick or snare drum, or a note in the primary melody, mutes all other parts, but the brain does not pay much attention to a brief lack of other sound elements.
- Arrangement that allows enough pauses between sounds in general, especially during intro parts. This allows to separate entities to be heard better, so once the arrangement gets more intense, the brain still considers those elements to be there, even though they're barely heard.
- Playing notes slightly off beat, or composing melodies that puts most of its notes to the weak beats or off-beat, as these places tend to have gaps, thus melody notes won't interfere with the bass or other sounds. This makes melodies and backings highly syncopated, adding some special funky feel to the music.
- The usual chiptune arpeggios at different speeds, including blazing fast 120 Hz ones. Wide range of the arpeggio speeds is very useful to add more variety in the sound, considering major lack of timbral variety, because everything is playing with plain square wave.
- Gaps in continuous sounds to allow other parts to cut through, or series of gaps of increasing duration to imitate volume decays.
- Major variety in the notes duration, including use of extremely short ones, to imitate difference in volume. Has been utilized to put emphasis on the bass groove, and echoing effects. Echoing is done by repeating the same part with a delay and with much shorter notes.
Another key element in the production was use of modern tools that made the workflow much efficient and streamlined. The tools included a modern DAW Reaper (an alternative to the FL Studio, Cubase, and alike) along with a set of custom developed VST plugins.
This approach is experimental, quite unconventional in making of «true chiptune» — that is, music intended to play using actual sound devices of the past. Usually making of such music involves use of special software called music trackers, or manual preparation and entering bytes of music data into source code. The very same result certainly could be achieved by these classic means as well (exactly what I did for Planet X3), but this would take much more effort and shift the focus from music creation to overcoming technical issues along the way.
To make PC Speaker music, I created a VSTi plugin called PCSPE. It emulates PC Speaker internals, allowing to hear realistic sound with all of its quirks and limitations, and implements a chiptune instruments system, similar to that of music trackers for various sound chips. It has envelopes for virtual volume (priority), arpeggio, and pitch, defined through text strings of a simple format much like MML (a close relative to the text strings found in the PLAY operator from BASIC). These envelopes allows to design different instruments, for example, drum sounds, or a soloing instrument with slowly increasing vibrato depth.
The main duty of the plugin, however, is mixing a few incoming MIDI tracks that contains various song parts into a single final track. The virtual volume I mentioned above serves as base for the priority system. Only one instrument that has the highest current virtual volume plays at a given moment. For example, a bass instrument has volume 2, main melody has volume 6, and all drum instruments has volume 8. In this case melody notes will take priority over bass notes, muting the bass part while melody plays, and all drums will mute all other sounds. That's how a few simultaneously playing musical parts get mixed into monophonic channel of the PC Speaker.
The plugin also features export function that allows to use music in actual MS-DOS programs. It works like a simple data logger: the plugin has information on what frequency gets output to the Speaker at a given moment, and timings of the frequency changes. When song is played from the beginning to the end with export feature enabled, this data gets captured and written into a file in real time. Now, to replay the music, all that needed is just picking the data from the file and sending it to the actual PC Speaker using recorded timings.
The classic chiptune arpeggio can be created in PCSPE the classic way, using an arpeggio envelope that defines a sequence of semitone offsets from the base note. However, this would require to switch between instruments quite often, and keep in mind which instrument needs to be used for a particular chord. This is not much convenient workflow to be used in a modern DAW.
My another VSTi, an arpeggiator called ChipArp, helps to make classic chiptune arpeggios in a much easier way. It takes the normal polyphonic chords from an incoming MIDI track and turns it into an arpeggio of required configuration (all up, all down, up-down, at required BPM) on-fly — you can just play chords using a MIDI keyboard and hear proper arpeggio right away. Unlike modern arpeggiators that is designed to be used in electronic music, my plugin does not restart a note for each arpeggio step, rather implementing it as a series of quick pitch bends from the base note. This way it does not break instruments with continuous sound flow, and you can still hear them evolving in the time while being arpeggiated. The catch here is that synth plugin needs to support wide pitch bends and react to them immediately, which is not a common thing for modern synths. However, all of my synth VSTs feature support for this.
Nearly all emulators, including PCSPE and things like DosBox, produce an idealized version of the square wave, that sounds quite different from the real hardware. The tiny little loudspeakers that is only intended to produce basic beeps has lots of strong resonances and distortions. This puts major emphasis to the transients, that is, the moments of turning sound on or off, or rapid changes in the frequency. Among other things, this makes drums and short notes to sound way more punchy on actual PC Speaker. To control and employ this quirk for the greater good, I recorded some impulse responses for a number of small loudspeakers, and used them with a free convolver plugin called NadIR — much like impulse responses of guitar cabinets is used these days while recording heavy guitars.
The project has been released under the CC-BY license. This includes the music itself, player source code, and Reaper projects for all the songs. This way you can do any derivative projects, be it music or code related. All the plugins used to create the album is also freely available along with the source code:
MS-DOS album player
Source code and Reaper projects
PCSPE with source code
ChipArp with source code
Tiny speakers impulse responses