In this discussion, “oversampling” means oversampling on output—at the digital to analog conversion stage. There is also a technique for oversampling at the input (analog to digital) stage, but it is not nearly as interesting, and in fact is unrelated to oversampling as discussed here.
Motivation for oversampling
Most people have heard the term “oversampling” applied to digital audio devices. While it’s intuitive that sampling and playing back something at a higher rate sounds better than a lower rate—more points in the waveform for increased accuracy—that’s not what oversampling means.
In fact, the truth is much less intuitive: Oversampling means generating more samples from a waveform that has already been digitally recorded! How can we get more samples out than was recorded?!
For background, let’s look at the “classic” digital audio playback system, the Compact Disc: The digital audio samples—numbers—are sent at 44.1 KHz, the rate at which they were recorded, to a low-pass filter. By Nyquist’s Theorem, the highest frequency we can play back is less than half the recorded rate, so the upper limit is 22.05 KHz. Everything above that is aliased frequency components—where the audio “reflects” around the sampling frequency and its multiples like a hall of mirrors. The low-pass filter, also called a reconstruction filter or anti-aliasing filter, is there to block the reflections and let the true signal pass.
One problem with this is that, ideally, we want to block everything above the Nyquist frequency (22.05 KHz), but let everything below it pass unaffected. Filters aren’t perfect, though. They have a finite slope as they begin attenuating frequencies, so we have to compromise. If we can’t keep 22 KHz while blocking everything above it, we’d certainly like to shoot for 20 KHz. That means the low-pass filter’s cutoff must go from about 0 dB attenuation at 20 KHz to something like 90 dB at 22 KHz—a very steep slope.
While we can do this in an analog filter, it’s not easy. Filter components must be very precise. Even so, a filter this steep has a great deal of phase shift as it nears the cut-off point. Besides the expense of the filter, many people agree that the phase distortion of the upper audio frequencies is not a good thing.
Now, what if we had sampled at a higher rate to begin with? That would let us get away with a cheaper and more gentle output filter. Why? Since the reflections are wrapped at the sampling frequency and its multiples, moving the sampling frequency that far up moves the reflected image far from the audio portion we want to preserve. We don’t need to record higher frequencies—the low-pass filter will get rid of them anyway—but simply having more samples of our audio signal would be a big help.
This is where interpolation comes in. We calculate what it would look like if we had sampled with more points to begin with. If we could have, for instance, eight times as many sample points running at eight times the rate (“8X oversampling”), we could use a very gentle filter, because instead of 2 KHz of room to get the job done, we’d have 158 KHz.
In practice, we do exactly this, following it with a phase linear digital “FIR” (finite-impulse response) filter, and a gentle and simple (and cheap) analog low-pass filter. If you buy the fact that giving ourselves more room to weed out the reflections—the alias components—solves our problems, then the only part that needs some serious explaining is…
Where do the extra samples come from?
First, lets note that in the analog domain, the sampling rate is essentially infinite—the waveform is continuous, not a series of snapshots as with a digitize waveform. So, you could say that the low-pass reconstruction filter converts from the output sampling rate to an infinitely high sampling rate. It’s easy to see that we could sample the output of the low-pass filter at a higher rate to increase the sampling rate. In fact, since we don’t need to convert to the analog domain at this point, we could simply use a digital low-pass filter to reconstruct the digital waveform at a higher sampling rate directly.
There is more than one way to make a digital low-pass filter that will do the job. We have two basic classes of filters to choose from. One is called an IIR (infinite impulse response), which is based on feedback and is similar in principle to an analog low-pass filter. This type of filter can be very easy to construct and computationally inexpensive (few multiply-adds per sample), but has the drawback of phase shift. This is not a fatal flaw—analog filters have the same problem—but the other type of digital filter avoids the phase shift problem. (IIR filters can be made with zero relative phase shift, but it greatly increases complexity.)
FIR filters are phase linear, and it’s relatively easy to create any response. (In fact, you can create an FIR filter that has a response equal to a huge cathedral for impressive and accurate reverb.) The drawback (starting to get the idea that everything has a trade-off?) is that the more complex the response (steep cut-off slope, for instance), the more computation required by the filter. (And yes, unfortunately our “cathedral” would require an enormous number of computations, and in fact digital reverbs of today don’t work this way.)
Fortunately, we need only a gentle cut-off slope, and an FIR will handle that easily.
An FIR is a simple structure—basically a tapped delay line, where the taps are multiplied by coefficients and summed for the output. The two variables are the number of taps, and the values of the coefficients. The number of taps is based on a compromise between the number of coefficients we need to produce the desired result, and the number we can tolerate (since each coefficient requires a multiplication and addition).
How do we know what numbers to use to yield the desired result? Conveniently, the coefficients are equivalent to the impulse response of the filter we’re trying to emulate.
So, we need to fill the coefficients with the impulse response of a low-pass filter. The impulse response of a low-pass filter is described by (sine(x))/x. If you plot this function, you’ll see that it’s basically a sine wave that has full amplitude at time 0, and decays in both directs as it extends to positive and negative infinity.
If you’ve been following closely, you’ll notice that we have a problem. The number of computations for an FIR filters is proportional to the number of coefficients, and here we have a function for the coefficients that is infinite. This is where the “compromise” part comes in.
If we truncate the series around zero—simply throwing away “extra” coefficients at some point—we still get a low-pass filter, though not one with perfect cut-off slope (or ripple in the “stop band”). After all, the sin(x)/x function emulates a perfect low-pass filter—a brick wall. Fortunately, we don’t need a perfect one, and our budget version will do. We also use some math tricks—artificially tapering the response off, even quickly, gives much better results than simply truncating. This technique is called “windowing”, or multiplying by a window function.
As a bonus, we can take advantage of the FIR to fix some other minor problems with the signal. For instance, Nyquist promised perfect reconstruction in an ideal mathematical world, not in our more practical electronic circuits. Besides the lack of an ideal low-pass filter that’s been covered here, there’s the fact we’re working with a stair-step shaped output before the filter—not an ideal series of impulses. This gives a little frequency droop—a gentle roll off. We can simply superimpose a complementary response on the coefficients and fix the droop for “free”.
While we’re at it, we can use the additional bits gained from the multiplies to help in noise shaping—moving some of the in-band noise up to the frequencies that will be removed later by the low-pass filter, and to frequencies the ear is less sensitive to.
More cool math tricks to give us better sound!