A few words before moving on to other topics…
We’ve looked at why digital sample represent ideal impulses, and why any point between samples represents a value of zero. And, as a result, audio samples don’t represent the audio itself, but a modulated version of the audio.
Why is helpful to understand these points?
First, it gives clear and intuitive answers for why digital audio behaves certain ways than do more typical explanations. For instance, it makes this puzzle trivial:
People ask why the sample rate needs to be double the frequency of the highest signal frequency that we want to preserve. Often the reply is that it needs to be just above double the highest frequency of interest, to avoid aliasing. But why? And how much higher? At this point, someone mentions something about wagon wheels turning the wrong way in movies. Or shows a graph with two sine waves of different frequencies intersecting the same sample points. So unsatisfying.
If you consider that the signal is amplitude modulated in the digitization process, you need only see that the sidebands would start overlapping at exactly half the sample rate. To keep them from overlapping, all frequencies must be below half the sample rate, giving each cycle more than two samples.
And integer sample rate conversion choices are easier to make. Especially for multistage conversion. We often use multistage conversion to improve efficiency. Like performing 8x upsampling as three 2x stages. If that sounds like three times the work, it isn’t, because the higher relative cutoffs of the filters make for fewer coefficients, balancing out with the total operations for 8x. But we can do more than break even by optimizing each stage—the earlier stages can be a bit sloppy as long as everything is tidy by the last stage’s output. Somewhat like doing a big cleanup on a house in multiple passes versus one.
Perhaps this is a good place to note that you might see chatroom posts where someone says that instead of inserting zeros and filtering, they prefer to use a polyphase filter. There is no difference—a polyphase filter in this case is simply an algorithm that upsamples and filters. Any seasoned programmer will notice that there is no need to explicitly place zeros between samples, then run all samples through an FIR, because the zero samples result in a zero product; optimizing the code to skip the zero operations results in a polyphase filter.
An understanding of why we need to filter rate conversions can help us optimize DSP processes. For example, someone posted a question on a DSP board recently. They were using EQ filters designed by the bilinear transform, which have a pinching effect near half the sample rate (due to zeros at the Nyquist frequency). They didn’t need additional frequency headroom per se—the filters are linear—but they wanted to oversample by 2x to avoid the shape distortion of peaking EQ filters. (Note there are methods to reduce or avoid such distortion of the filter shape, but this is a good example.)
Let’s say we’re using a 48 kHz sample rate. Typically we’d raise the sample rate to 96k by inserting zeros every other sample and then lowpass filtering below 24k. Then we’d do our processing (EQ filtering). Finally, we’d take it back to 48k by lowpass filtering below 24k and discarding every other sample. But in this case, our processing step is linear (linear EQ filters), so it doesn’t create new frequencies. That means we can skip one of the lowpass filtering stages. It doesn’t matter whether we lowpass filter before or after the EQ processing, but we don’t need both. That’s a substantial savings.
Let’s say we create an audio effect such as an amp simulator, which has a non-linear process that requires running at a higher sample rate to reduce audible aliasing. We run our initial linear processes, such as tone controls, then upsample and run our non-linear processes (an overdriven tube simulation!). But in this case we conclude with a speaker cabinet simulator, which is a linear process (via convolution or other type of filtering). Guitar and bass cabinets use large speakers (typically 8” and up, often 10” or 12” for guitar), with frequency responses that drops steeply above around 5 kHz. Understanding how the downsampling process works, we might choose to eliminate the downsampling filter stage altogether, as superfluous, or at least save cycles with a simplified filter with relaxed requirements.