Here we explain how sample rate conversion works. As an essential prerequisite, you must understand the principals of sampling. Even if you understand sampling already, read our explanation of the process here. The viewpoint and terms used there are mirrored here, and are key to this explanation. Read it now—take a quick look at to refresh your memory if you’ve read it before—then come back here for the rest.
Sometimes we need to change the sample rate after the fact. We might need to match converter hardware running at a different rate, or we might want to increase the frequency headroom; for instance, frequency modulation, pitch shift, and distortion algorithms can add higher frequency components that might otherwise alias into the passband if we don’t raise the ceiling.
Let’s consider the common case of increasing the sample rate by a factor of two—maybe we have some music recorded at 48 kHz, and want to combine it with tracks recorded at 96 kHz, with 96 kHz playback. This means we need to generate a new sample between each existing sample. One way might be to offset the new sample by half the difference of the two existing samples surrounding it. This is linear interpolation—essentially a poor low-pass filter, and the results aren’t very good, leaving us with a reduced frequency response and aliasing. There are better ways to interpolate the new sample points, but there is one that is essentially perfect, creating no new frequency components, and not changing the frequency response. We can do this because the original signal was band-limited to begin with, so we know the characteristics of what lies between the sampled points—it’s essentially the shape dictated by the original low pass filter we used in the sampling process. By reversing the process, we can get the original band-limited signal back, and resample it at twice the original rate—having lost nothing.
We don’t need to know the exact characteristics of the original low-pass—it was certainly one that was of high enough quality to get the job done (or we’d already been suffering from the resulting aliasing), so we just use another. The obvious approach is to convert back to analog, letting the low pass filter reconstruct the continuous signal, then re-sample it at twice the original rate.
Converting to analog and back to digital seems a waste—we’re already digital and would like to avoid running through hardware—and filters can be done mathematically in the digital domain. Looking at the diagram, we see that we don’t need two filters. The one with the higher cut-off frequency doesn’t do anything interesting, since the other filter removes those frequencies anyway, so we discard it and keep the lower filter.
The filter is a linear process, and we suspect that we can move it to one side, putting the D/A and A/D components next to each other, where they cancel except for the change in sample rate. Then, conceptually, we just need to double the number of samples to upsample by a factor of two, and low-pass filter. But what does that mean—should we simply repeat each sample? Average the surrounding samples? And does it make a difference whether we filter before or after? We suspect that we need to filter after the rate change, since running a full-band low-pass filter doesn’t really do anything.
We can see how to double the samples and what the filter does by looking back at how we sampled the signal in the first place. Recall that the existing samples represent the pulse-amplitude modulated result of the analog to digital conversion stage. Let’s look at that signal again:
With that in mind, the in-between samples are obvious—they are all the value zero. The result of inserting a zero between each of our existing samples is that the signal we have doesn’t change at all—and therefore neither does the spectrum—but the sample rate does. Here’s our spectrum again, before inserting zeros:
Here’s what we have after inserting a zero between each sample:
Essentially, we’ve doubled the number of samples, doubling our available bandwidth, but in doing so we’ve revealed an aliased image in the widened passband. Now it’s apparent why we need the low-pass filter—to remove the alias from the passband:
That’s it—to double the sample rate, we insert a zero between each sample, and low-pass filter to clear the extended part of the audio band. Any low-pass filter will do, as long as you pick one steep enough to get the job done, removing the aliased copy without removing much of the existing signal band. Most often, a linear phase FIR filter is used—performance is good at the relatively high cut-off frequency, phase is maintained, and we have good control over its characteristics.
The process of reducing the sample rate—downsampling—is similar, except we low-pass filter first, to reduce the bandwidth, then discard samples. The filter stage is essential, since the signal will alias if we try to fit it into a narrower band without removing the portion that can’t be encoded at the lower rate. So, we set the filter cut-off to half the new, lower sample rate, then simply discard every other sample for a 2:1 downsample ratio. (Yes, the result will be slightly different depending on whether you discard the odd samples or even ones. And no, it doesn’t matter, just as the exact sampling phase didn’t matter when you converted from analog to digital in the first place.)
We’ve gone through a lot to explain this, but if you understand the reasons, it’s easy to see what to do in other situation. For instance, if we need to upsample by a factor of four, we look at our pulse-amplitude modulated signal and note that there is nothing but a zero level between our existing impulses, so we insert three zeros between our existing samples. Then low-pass filter the result, with the frequency cut-off set to include our original frequency band and block everything above it to remove the aliased copies from the new passband.
There are some added wrinkles. For larger conversion ratios, we do the conversion in multiple smaller steps and exploit some optimizations with less computation than doing a single large ratio conversion. For non-integer ratios, such as conversion from 44.1 kHz to 48 kHz, most text books suggest a combination of upsampling and downsample as a ratio of integers (upsampling by a factor of 160 and downsampling by a factor of 147 in this case), but it can also be done in a single step, fractionally.