How (and Why) Your Digital Audio Sounds Awful
What on Earth does sample-rate mean? What’s a bit depth? Am I adopted? I’m here to answer most of these questions to the best of my knowledge.
Hi. My name is Mosh, and I’m a hopeless audiophile. I’ll spare you the details, but mostly it just means that I have super strong opinions and complain a lot about music production and sound quality. I didn’t used to be like this. I was always the life of the party, from a safe corner looking awkwardly out. But then I went to college and everything changed. The moral of the story is: if you want to be forever branded as a nerd, get an engineering degree. Who would have thought?
At any rate, I’m here today to try and educate you all (and possibly clear up some misconceptions) on the basics of something you, the readers of this hallowed blog, most likely make use of every day: digital audio files. This isn’t going to be some in-depth nerdfest, I just want to cover the basics so that everyone knows what they’re putting into their ear canals… I mean, up to a point and within certain limitations. You filthy individuals. Here we go…
This is an analog audio wave. It is the visual representation of how sound exists in the real world; what your ears actually hear. An analog wave is a continuous line, and every line is made up of an infinite number of points (along the “time” axis) that can have an infinite number of values (within the “amplitude” axis). When wanting to convert an analog wave into digital, this presents a slight problem. See, you think computers are smart, but in reality they’re dumb enough that they can only work with finite things: bits. To convert something infinite into bits, the conversion from analog to digital is based around two key concepts: sampling and quantization.
Sampling
In short: sampling means taking your analog signal and checking the amplitude every x seconds. If you take 100 values over 10 seconds, you’re sampling at 10 samples per second (or at a 10 Hz rate). Now, the important detail: this process is fully reversible. In order to play back digitalized files for your ears to comprehend, you need to be able to convert the files back into analog signals, and this can be done without losing any information whatsoever as long as you sample at what’s known as the Nyquist rate. The gist of it is, if you sample with a rate that’s double the maximum frequency of your original signal, you will be able to recover it flawlessly. Since the human ear can hear up to about 22 kHz (usually less), the standard sampling rate for files that go on CDs is 44.1 kHz. Makes sense, right? Alright, stay with me for a bit longer…
Quantization
So, now we have our analog wave made into a bunch of samples… but these samples still have amplitudes with an infinite number of values (not good for your computer). This leads to a non-reversible process known as quantization. Let’s say you would like each amplitude to be matched to a specific value, or a combination of bits that your computer could use. So you would take each infinite-amplitude’d sample and compare it to your pre-defined set of different values (levels) to see which one it’s closest to: the closest finite amplitude is now that sample’s amplitude, simple as that. In the image you can see a sampled wave before and after quantization to get a clearer idea of what I’m talking about.
This means that we’re purposefully introducing a degree of error into the process, which translates into (controllable) noise. The thing is, the higher the number of bits you use for this process, the more levels you’ll have and the more accurate (less noisy) your quantized samples will be. CDs use 16 bits (this is what’s known as bit-depth), which means there are two to the power of 16 different levels (65,563 to be exact) to quantize samples to, with virtually no discernible noise. For comparison’s sake, an 8 bit song only has 256, which is why 8 bit songs sound so characteristically… well, bad.
To wrap this up: after this whole process, a non-compressed stereo audio file (that is, two separate channels) with a bit-depth of 16 and a sample-rate of 44,100 Hz would be a series of 16-bit blocks arranged one after the other in a string of 2 x 16 x 44,100 = 1,411,200 bits per second. This is the bit-rate of a standard WAV file, and it makes files take up space on your hard drive reeeeeeal quick, which is why compression is used… but that’s for another time.
Hopefully you’ve made it this far having understood the key concepts, and I think this is more than enough for a first foray into perpetual dorkery. Thanks for reading, Mosh out.