Music is math, yeah, sure… but can math reproduce analog audio signals?

A bit of background: I am, by academic training, a failed poet. On the basis of that training, I became a hippie musician/songwriter. I stand second to few in my embrace of willfully undisciplined, fly-by-the-seat-of-intuition artsy-fartsyness. But, somehow, in all that liberal education, I also picked up an appreciation of logic and the Scientific Method. Two very different hats.

When I’m making music, I’m all about vague and indefinable stuff. I act like I believe the artistic muses are real entities (for me, it’s a very useful model; in this realm, wearing my ‘artist’s hat,’ I don’t much care about the literal truth, as long as my perhaps ‘fanciful’ concept helps me describe/predict actual experience). I play it by feelings and hunches and seemingly wild shots in the dark.

But when I’m dealing with gear and technology, clearly that artiste’s beret is a bad fit. And that’s where I bust out the ol’ logic and clear thinking.

OK, that’s out of the way.

Recently, in the social media arena revolving around studio recording, a community member posted his thoughts about how mathematics is inadequate for describing and dealing with analog audio electrical signals. Not unusual in that milieu, he seemed to be largely ignorant of how integral mathematics is to the design and building of analog audio gear. Worse, he seemed to have little understanding that mathematics is a key component to how we describe and understand the entirety and complexity of all of our universe, from the microcosm to the macrocosm.

People who don’t understand the underlying mathematics tend to equate the individual samples of digital audio with the individual frames of film — which ‘fools the brain’ with a succession of images, the faster the film (to a certain point), the smoother the action appears. But film never creates an actual truly continuously moving image… it’s a series of images as presented to the eye.

Digital audio, on the other hand, does produce a continuous wave. Where’s the ‘missing data’ ‘between the samples’? It’s above the agreed upon band limit. If we want to be able to serve up a signal accurate up to, say, 30 kHz (to ‘entertain’ our cats and dogs) we can use a sample rate that affords us that ability, a minimum of something over 60 kHz, typically with a frequency band limit margin that allows upper frequency bandwidth that accommodates the efficiency of the antialias filtering applied.

There is no ‘fooling the ear’ with such a signal — the ear is not creating the ‘illusion’ of continuous analog signal coming out the speakers — the analog signal arriving at the speakers is, indeed, just as continuous as the original signal. If the process was performed properly and accurately, the results should be a precise duplicate — up to the filter boundary. If we set that filter boundary above the threshold of the listener’s hearing, we have a system capable of reproducing an electrical analog audio signal with far greater precision than any previous audio transcription system.

(In addition to conflating digital audio processes with film, it seems some tend to mix digital transcription with the very, very different, and very complex issues revolving aroundlossy perceptual encoding, as with mp3, Vorbis, AAC, etc. Those systems very definitely are designed to ‘fool’ the human audio perception system by eliminating data predicted by the governing algorithms to be ‘unnecessary’ to recreate a perceived semblance of the original signal. The more data that is thrown out, the greater the chance the changes will be perceived. Lower rates can be almost painfully obvious, but once we get to around 1/5 to 1/4 retention of data (320 kbps is ~ 1/4.4 of the data bandwidth of CD-A), the ability to differentiate the lossy format from original is found to be quite rare, even among trained listeners. But it IS, indeed, ‘fooling’ the ear. Unlike non-lossy PCM audio.)

With regard to the upper limits of human hearing, there is roughly a century of testing of human hearing that has gone into the current scientifically accepted understanding of the upper limits of human hearing. There is a large body of direct perception testing (can one hear a given tone by itself at a given level) as well as indirect (testing of program material against the same material with narrower frequency band limits applied to see if the difference between signals can be perceived).

The scientists who have been studying our hearing during the modern era have set the nominal limit of human audio perception at 20 kHz, although by adulthood, most humans are considerably under that, while the very young may perceive somewhat above that threshold. However I’m not aware of any accepted work offering solid evidence of ability to perceive over 22 kHz. There have been what seemed to be some ‘tantalizing’ outlier experimental findings, but the circumstances of those findings have been subject to much questioning and not a little criticism, when they haven’t been rejected outright by professional review. 

The overwhelming mountain of data so far collected suggests the nominally accepted limits are reasonable and realistic. If someone wants to overturn that paradigm, it behooves them to build a more persuasive body of evidence supporting their claims. Until then, the reasonable assumption flows from currently held data.

What is RMS?

Earlier I was reminded by a Facebook friend of 60’s/70’s blues rockers Canned Heat’s old “RMS is Truth!” bumper stickers. Someone in the thread asked for a translation and the original poster recounted the Heat’s advocacy of RMS amplifier power output ratings vis a vis other, more ‘marketable’ spec standards.

Root-mean-square. It’s a power averaging method. As noted, it was used by electronics manufacturers with some integrity, while many others used ‘BS’ ratings like ‘peak power.’ 

One place where you may see it increasingly is in ‘smart levelling’ systems for multimedia playback. 

We’ve all likely had the experience of listening to some nice, classic Billie Holiday at a comfortable level in our play shuffle and then some Skrillex comes on and pins your eyeballs into the back of your skull and your ears into the next dimension. 

The deal is that digital formats have a ‘maximum peak loudness’ (0 dB FS [full scale]) that essentially is the loudest the device can put out. Some material is loud all the way through (or mostly) like Skrillex — the producers have intentionally used extreme settings on one or more compressor/limiters to make all the quiet bits as loud or nearly as loud as the quiet bits. (Now, Skrillex has ‘dynamics’ in the sense that it’s almost full loudness through most of the song but has big level ‘drops’ inserted strategically through the course of the song, presumably to keep attention in focus, since the music is otherwise so mind-numbing. J/K.)

So, even if that Billie Holiday record’s loudest bits are every bit as loud as the Skrillex, on average, the level is much much lower.

How much? Such average levels are typically measured in dB RMS. A difference of 6 dB can be said to be equivalent to a doubling of volume.

That Billie Holiday record might have an RMS average level of, let’s say, -18 dB RMS (particularly if it was mastered for CD before the ‘loudness wars’ era of the late 90s on; Skrillex’s big hit (something about Sprites I think) has an RMS average of around – 6 dB RMS. 

That means that the Skrillex record will seem to you to be about FOUR TIMES as loud as the Billie Holiday. (12 dB difference; each 6 dB is about equivalent to doubling perceived loudness.)

Awesome, eh?

Audio Production Neologism: Homodulation

Homodulation – noun – In modern recording studio practice, the practice of removing any sort of idiosyncrasies, character, individuality, or other perceived ‘imperfection’ from the recorded product.


You heard it here first. (Actually, I just posted this on Gearslutz, but you heard it from me first. Well, unless you’re coming here from somewhere else long in the future when my new coinage is a commonplace in the studio world. Uh huh.)