Sampling in-depth

Here we lay the foundation. We’ll look at analog to digital conversion, and we’ll look at the spectrum of the resulting digital signal. We’ll use that knowledge to help understand the conversion process back to analog. Though we can build real-world converters that work as described, we’re not building practical converters here; we’re interested in the mathematical concepts involved and their relationships to digital signal processing.

Analog to digital conversion

The sampling theorem says that we can describe, uniquely, a continuous (analog) signal with evenly-spaced discrete (digital) samples, as long as the sampling rate is greater than twice the highest frequency component in the continuous signal. To ensure that we don’t violate that condition, we preface the analog to digital conversion with a low pass filter, set to pass frequencies below half the sample rate, and block those above:

The sampling process—basically taking repeated measurements as a fixed interval—can be viewed in a number of ways; here’s one that can help us understand some important aspects of sampling:

The input signal, after filtering to ensure our bandwidth condition, is modulated (multiplied) by a pulse train of unit amplitude. The result is a pulse-amplitude modulated signal. We measure the height of each modulated pulse, and store the numerical values as our samples.

The spectrum of a sampled signal

While our original signal can be anything, we must limit the frequency spectrum of the signal we sample to half the sampling rate—again, we ensure this condition by placing a low-pass filter before the sampling stage. So, we can represent the spectral limits of the signal by showing the response curve of the low-pass filter:

Here we’re showing positive and negative frequencies, in order to make the following more obvious.

The spectrum of the impulse train we’re using to multiply the input signal looks like this:

That is, the spectrum of an impulse train in time is an impulse train in frequency.In the time domain, the impulses are spaced at multiples of the sampling period, and in the frequency domain at multiples of the sampling frequency.

The result of the multiplication in the time domain is a convolution in the frequency domain—the result in the frequency domain is that we have spectral reflections of the original signal, centered on multiples of the sampling rate:

We don’t mind the reflections, since they stay out of the way of the signal band we care about, but if we had used a poorer low-pass filter or set its frequency too high, the reflections would overlap into the band we’re interested in—this is called aliasing (from here on we’ll look at the positive frequencies only, to save space, noting that the negative side is just a reflection):


A consequence of aliasing is that frequencies that exceed half the sample rate reflect back into our band of interest. For instance, a sine wave that sweeps in frequency above half the sample rate will seem to reverse and turn back down. High harmonics that cross over will be inverted, imparting an inharmonic tone to the sound. Once we let this happen, there is no way to tell the good from the bad, so we must minimize aliasing by using acceptable filters.

Now that we understand what we’ve done to the signal by sampling it, we can look at the process of converting the signal back to a continuous signal.

Digital to analog conversion

To convert from samples to the continuous domain, we perform what is pretty much the inverse process. In principal, we first turn the numbers into a pulse-amplitude modulated signal. This signal has the information we need, but carries some extra that we don’t want—images of our signal, aliased copies that appear at multiples of the sample rate. So, we strip off the high-frequency clones with a low-pass filter. Effectively, this smoothes the pulse train into the signal we desire.

Note that the digital to analog conversion is similar to our analog to digital conversion, almost in reverse, except that in the case of converting from analog, the low-pass filter is precautionary—to guarantee we limit the bandwidth to avoid aliasing—while in the conversion from digital, the filter plays an integral role in reconstructing the signal. For that reason, the latter is often called the reconstruction filter.

Posted in Digital Audio, Sampling Theory | Leave a comment

The bilinear z transform

The bilinear transform is the most popular method of converting analog filter prototypes in the s domain to the z domain so we can implement them as digital filters. The reason we are interested in these s domain filters is that analog filter theory has been around longer than digital filter theory, so we have available to us a number of popular and useful filters in the s domain. The standard biquads used heavily in audio processing are direct implementations from the s domain using the bilinear transform.

The theory of the bilinear transform is well documented in DSP texts and on the web, so rather than spend time going into the theory, we’ll cut to the chase and show how to do the transform, using a second order lowpass filter as an example.

The s domain transfer function of a second order lowpass filter is

To move to the z domain, we need to substitute for s in terms of z. At the same time, we “pre-warp” the response using the tan function in order to wrap the infinite s plane onto the circular z plane. The combined substitution is

Here’s our z domain transfer function after substituting for s:

In order to solve for our filter coefficients, we need to get the equation into the form of a second order polynomial in the numerator (the zeros) and another second order polynomial in the denominator (the poles). That is, we need it to look something like this:

It helps to use a math package to cut down on some of the tedious work. If we take the right side of the equation and ask MathCad to “simplify”, it gives

The first thing we see is that there are too many Q’s, so divide the top and bottom by Q, resulting in

Now combine terms in the numerator and rearrange in descending powers of z; do the same for the denominator:

The z operator means delay, and its power tell us how many samples. Positive powers of z are in the future, and negative powers are in the past. Our filter can’t see into the future (we call such filters “non-causal”), so we need to reduce the powers of z to zero and below (to make it causal). To do this we multiply the numerator and denominator by , which subtracts two from each power of z, and gives us our final form:

Our coefficients are now evident:

Because the b0 coefficient is associated with y[n], the filter output, we divide everything by b0 to normalize the filter. Here we also note some convenient relationships between the “a” coefficients:

As a further efficiency, you can compute the common numerator term once and use it where needed. When calculating the coefficients, Q is the filter Q value (for a Butterworth lowpass, where the passband is flat with no corner peaking, use 0.7071, which is the reciprocal of the square root of two). Recall that we defined K as

where Fc is the filter corner frequency you want, and Fs is the sample rate.

Posted in Digital Audio, Filters, IIR Filters | Leave a comment

The digital state variable filter

The digital state variable filter was described in Hal Chamberlin’s Musical Applications of Microprocessors. Derived by straight-forward replacement of components from the analog state variable fiter with digital counterparts, the digital state variable is a popular synthesizer filter, as was its analog counterpart.

The state variable filter has several advantages of biquads as a synthesizer filter. Lowpass, highpass, bandpass, and band reject are available simultaneously. Also, frequency and Q control are independent and their values calculated easily.

The frequency control coefficient, f, is defined as

where Fs is the sample rate and Fc is the filter’s corner frequency you want to set. The q coefficient is defined as

where Q normally ranges from 0.5 to inifinity (where the filter oscillates).

Like its analog counterpart, and biquads, the digital state variable has a cutoff slope of 12 dB/octave.

The main drawback of the digital state variable is that it becomes unstable at higher frequencies. It depends on the Q setting, but basically the upper bound of stability is about where f reaches 1, which is at one-sixth of the sample rate (8 kHz at 48 kHz). The only way around this is to oversample. A simple way to double the filter’s sample rate (and thereby double the filter’s frequency range) is to run the filter twice with the same input sample, and discard one output sample.

As a sine oscillator

The state variable makes a great low frequency sine wave oscillator. Just set the Q to infinity, and make sure it has an impulse to get it started. Simply preset the delays (set the “cosine” delay to 1, or other peak value, and the other to 0) and run, and it will oscillate forever without instability, with fixed point or floating point. Even better, it gives you two waves in quadrature—simultaneous sine and cosine.

Simplified to remove unecessary parts, the oscillator looks like this:

For low frequencies, we can reduce the calculation of the f coefficient equation to

Here’s an example in C to show how easy this oscillator is to use; first initialize the oscillator amplitude, amp, to whatever amplitude you want (normally 1.0 for ±1.0 peak-to-peak output):

// initialize oscillator
sinZ = 0.0;
cosZ = amp;

Then, for every new sample, compute the sine and cosine components and use them as needed:

// iterate oscillator
sinZ = sinZ + f * cosZ;
cosZ = cosZ – f * sinZ;

The sine purity is excellent at low frequencies (becoming asymmetrical at high frequencies).

Posted in Digital Audio, Filters, IIR Filters | 1 Comment

Biquads

One of the most-used filter forms is the biquad. A biquad is a second order (two poles and two zeros) IIR filter. It is high enough order to be useful on its own, and—because of coefficient sensitivities in higher order filters—the biquad is often used as the basic building block for more complex filters. For instance, a biquad lowpass filter has a cutoff slope of 12 dB/octave, useful for tone controls; if you need a 24 dB/octave slope, you can cascade two biquads, and it will have less coefficient-sensitivity problems than a single fourth-order design.

Biquads come in several forms. The most obvious, a direct implementation of the second order difference equation (y[n] = a0*x[n] + a1*x[n-1] + a2*x[n-2] – b1*y[n-1] – b2*y[n-2]), called direct form I:

Direct form I

Direct form I is the best choice for implementation in a fixed point processor because it has a single summation point (fixed point DSPs usually have an extended accumulator that allows for intermediate overflows).

We can take direct form I and split it at the summation point like this:

We then take the two halves and swap them, so that the feedback half (the poles) comes first:

Now, notice that one pair of the z delays is redundant, storing the same information as the other. So, we can merge the two pairs, yielding the direct form II configuration:

Direct form II

In floating point, direct form II is better because it saves two memory locations, and floating point is not sensitive to overflow in the way fixed point math is. We can improve this a little by transposing the filter. To transpose a filter, reverse the signal flow direction—output becomes input, distribution nodes become summers, and summers become nodes. The characteristics of the filter are unchanged, but in this case it happens that the floating point characteristics are a little better. Floating point has better accuracy when intermediate sums are with closer values (adding small numbers to large number in floating point is less precise than with similar values). Here is the transposed direct form II:

Transposed direct form II

Notes and recommendations

Again, direct form I is usually the best choice for fixed point, and transposed direct form II for floating point.

At low frequency settings, biquads are more susceptible to quantization error, mainly from the feedback coefficients (b1 and b2) and the delay memory. Lack of resolution in the coefficients makes precise positioning of the poles difficult, which is particularly a problem when the poles are positioned near the unit circle. The second problem, delay memory, is because multiplication generates more bits, and the bits are truncated when stored to memory. This quantization error is fed back in the filter, causing instability. 32-bit floating point is usually good enough for audio filters, but you may need to use double precision, especially at very low frequencies (for control filtering) and at high sample rates.

For fixed point filters, 24-bit coefficients and memory work well for most filters, but start to become unstable below about 300 Hz at 48 kHz sample rate (or twice that at 96 kHz). Double precision is always costly on a fixed point processor, but fortunately there is a simple technique for improving stability. Looking at the direct form I drawing, the quantization occurs when the higher precision accumulator is stored in the lower precision delay memory on the right side. By taking the quantization error (the difference between the full accumulated value and its value after storing it to memory) and adding it back in for the next sample calculation, the filter performs nearly as well as using full double precision calculations, but at a much lower computational cost. This technique is called first order noise shaping. There are higher order noise shapers, but this one works well enough to handle almost all audio needs, even at high sample rates.

Direct form I with first-order noise shaping

In general, 16-bit fixed point processing is not suitable for audio without double precision coefficients and computation.

Finally, biquads are just one of a DSP programmers tools—they aren’t always the best filter form. There are other filters that don’t share the biquad’s low-frequency sensitivities (in general, biquad coefficient precision is very good at high frequencies, and poor at low ones; there are other filter forms that spread the precision out more evenly, or trade off reduced high frequency performance for better low frequency performance). However, biquads are well known and design tools are plentiful, so they are usually the first choice for an IIR filter unless you find a reason to use another.

There are too many filter forms to cover, but one other filter form that is popular for synthesizers is the state variable filter. It has very excellent low frequency performance, and limitations in the high frequencies that have to be worked around, but most importantly frequency and Q coefficients are separate and easy to change for dynamic filtering. It also make a great low frequency sine wave generator.

Posted in Digital Audio, Filters, IIR Filters | Leave a comment

Pole-Zero placement

Here’s a Java applet that illustrates pole-zero placement. It lets you design a filter with two poles and two zeros, while showing the resulting frequency response and filter coefficients. It’s also handy for learning more about how poles and zeros work.

You can set the two poles (or zeros) independently, or as complex conjugate pairs by using the Pair checkboxes. Note that the frequency response plot and the coefficients are gain compensated automatically, so that the maximum output is 0 dB.

A pole or zero located at the origin has no effect, so position them there if you want to disable them (to examine single pole or zero filters, for instance).


Experiments with standard biquads

Here are some experiments that show how the standard biquads (derived with the bilinear transform) relate to the z plane.

For each of these filters, the pole angle dictates filter frequency, and the pole radius dictates Q:

Experiment with pole and zero placement to better understand how these filters work. See what happens when you swap the poles and zeros. Change the pole angle and radius and see how it affects the frequency response. Think about why the poles and zeros are positioned where they are: For lowpass, the zeros areate -1 to pull down the response at the highest frequency; for highpass, they are at 1 to pull down the lowest; for bandpass, they pull down the response at each end of the spectrum. For bandreject (notch), the zeros are on the unit circle at the notch frequency to completely remove it, and the poles are at the same angle; as the poles move closer to the zeros, they get closer to canceling them, and the notch narrows.

Posted in Digital Audio, Filters, IIR Filters | 5 Comments

A gentle introduction to the FFT

Some terms: The Fast Fourier Transform is an algorithm optimization of the DFT—Discrete Fourier Transform. The “discrete” part just means that it’s an adaptation of the Fourier Transform, a continuous process for the analog world, to make it suitable for the sampled digital world. Most of the discussion here addresses the Fourier Transform and it’s adaptation to the DFT. When it’s time for you to implement the transform in a program, you’ll use the FFT for efficiency. The results of the FFT are the same as with the DFT; the only difference is that the algorithm is optimized to remove redundant calculations. In general, the FFT can make these optimizations when the number of samples to be transformed is an exact power of two, for which it can eliminate many unnecessary operations.

Background

From Fourier we know that periodic waveforms can be modeled as the sum of harmonically-related sine waves. The Fourier Transform aims to decompose a cycle of an arbitrary waveform into its sine components; the Inverse Fourier Transform goes the other way—it converts a series of sine components into the resulting waveform. These are often referred to as the “forward” (time domain to frequency domain) and “inverse” (frequency domain to time domain) transforms. For most people, the forward transform is the baffling part—it’s easy enough to comprehend the idea of the inverse transform (just generate the sine waves and add them). So, we’ll discuss the forward transform; however, it’s interesting to note that the inverse transform is identical to the forward transform (except for scaling, depending on the implementation). You can essentially run the transform twice to convert from one form to the other and back!

Probing for a match

Let’s start with one cycle of a complex waveform. How do we find its component sine waves? (And how do we describe it in simple terms without mentioning terms like “orthogonality”? oops, we mentioned it.) We start with an interesting property of sine waves. If you multiply two sine waves together, the resulting wave’s average (mean) value is proportional to the sines’ amplitudes if the sines’ frequencies are identical, but zero for all other frequencies.

Take a look: To multiply two waves, simply multiply their values sample by sample to build the result. We’ll call the waveform we want to test the “target” and the sine wave we use to test it with the “probe”. Our probe is a sine wave, traveling between -1.0 and 1.0. Here’s what happens when our target and probe match:

See that the result wave’s peak is the same as that of the target we are testing, and its average value is half that. Here’s what happens when they don’t match:

In the second example, the average of the result is zero, indicating no match.

The best part is that the target need not be a sine wave. If the probe matches a sine component in the target, the result’s average will be non-zero, and half the component’s amplitude.

In phase

The reason this works is that multiplying a sine wave by another sine wave is balanced modulation, which yields the sum and difference frequency sine waves. Any sine wave averaged over an integral number of cycles is zero. Since the Fourier transform looks for components that are whole number multiples of the waveform section it is analyzing, and that section is also presumed to be a single cycle, the sum and difference results are always integral to the period. The only case where the results of the modulation don’t average to zero is when the two sine waves are the same frequency. In that case the difference is 0 Hz, or DC (though DC stands for Direct Current, the term is often used to describe steady-state offsets in any kind of waveform). Further, when the two waves are identical in phase, the DC value is a direct product of the multiplied sine waves. If the phases differ, the DC value is proportional to the cosine of the phase difference. That is, the value drops following the cosine curve, and is zero at pi/2 radians, where the cosine is zero.

So this sine measurement doesn’t work well if the probe phase is not the same as the target phase. At first it might seem that we need to probe at many phases and take the best match; this would result in the ESFT—the Extremely Slow Fourier Transform. However, if we take a second measurement, this time with a cosine wave as a probe, we get a similar result except that the cosine measurement results are exactly in phase where the sine measurement is at its worst. And when the target phase lies between the sine and cosine phase, both measurements get a partial match. Using the identity

for any theta, we can calculate the exact phase and amplitude of the target component from the sine and cosine probes. This is it! Instead of probing the target with all possible phases, we need only probe with two. This is the basis for the DFT.

Completing the series

Besides probing with our single cycle sine (and cosine), the presumed fundamental of the target wave, we continue with the harmonic series (2x, 3x, 4x…) through half the sample rate. At that point, there are only two sample points per probe cycle, the Nyquist limit. We also probe with 0x, which is just the average of the target and gives us the DC offset.

We can deduce that having more points in the “record” (the group of samples making up our target wave cycle) allows us to start with a lower frequency fundamental and fit more harmonic probes into the transform. Doubling the number of target samples (higher time resolution) doubles the number of harmonic probes (higher frequency resolution).

Getting complex

By tradition, the sine and cosine probe results are represented by a single complex number, where the cosine component is the real part and the sine component the imaginary part. There are two good reasons to do it this way: The relationship of cosine and sine follows the same mathematical rules as do complex numbers (for instance, you add two complex numbers by summing their real and complex parts separately, as you would with sine and cosine components), and it allows us to write simpler equations. So, we refer to the resulting average of the cosine probe as the real part (Re), and the sine component as the imaginary part (Im), where a complex number is represented as Re + i*Im.

To find the magnitude (which we have called “amplitude” until now—magnitude is the same as amplitude when we are only interested in a positive value—the absolute value):

In the way we’ve presented the math here, this is the magnitude of the average, so again we’d have to multiply that value by two to get the peak amplitude of the component we’re testing for.

You might notice that Im can be zero, which would lead to a divide-by-zero error on your computer. In that case, notice that the result of the division becomes very large for non-zero Re as Im approaches zero, and the atan for very large numbers approaches pi/2. This would tell us that the target component is approaching an exact match with the cosine phase, which we already know to be true with a near-zero imaginary part.

Making it “F”

Viewing the DFT in this way, it’s easy to see where the algorithm can be optimized. First, note that all of the sine probes are zero at the start and in the middle of the record—no need to perform operations for those. Further, all the even-numbered sine probes cross zero at one-fourth increments through the record, every fourth probe at one-eighth, and so on. Note the powers of two in this pattern. The FFT works by requiring a power of two length for the transform, and splitting the the process into cascading groups of two (that’s why it’s sometimes called a radix-2 FFT). Similarly, there are patterns for when the sine and cosine are at 1.0, and multiplication is not needed. By exploiting these redundancies, the savings of the FFT over the DFT are huge. While the DFT needs N^2 basic operations, the FFT needs only NLog2(N). For a 1024 point FFT, that’s 10,240 operations, compared to 1,048,576 for the DFT.

Let’s take a look at the kinds of symmetry exploited by the FFT. Here’s an example showing even harmonics crossing at zero for integer multiples of pi/2 on the horizontal axis:

Here we see that every fourth harmonic meets at 0, 1, 0, and -1, at integer multiples of pi/2:

Caveats and Extensions

The Fourier transform works correctly only within the rules laid out—transforming a single cycle of the target periodic waveform. In practical use, we often sample an arbitrary waveform, which may or may not be periodic. Even if the sampled waveform is exactly periodic, we might not know what that period is, and if we did it may not exactly fit our transform length (we may be using a power-of-two length for the FFT).

We can still get results with the transform, but there is some “spectral leakage.” There are ways to reduce such errors, such as windowing to reduce the discontinuities at the ends of the group of sample points (where we snipped the chunk to examine from the sampled data). And for arbitrarily long signals (analyzing a constant stream of incoming sound, for instance), we can perform FFTs repeatedly—much in the way a movie is made up of a constant stream of still pictures—and overlap them to smooth out errors.

There is a wealth of information on the web. Search for terms used here, such as Fourier, FFT, DFT, magnitude, phase… The purpose here is to present the transform in an intuitive way. With an understanding that there is no black magic involved, perhaps the interested reader is encouraged to dig deeper without fear when it’s presented in a more rigorous and mathematical manner. Or maybe having a basic idea of how it works is good enough to feel more comfortable with using the FFT. You can find efficient implementations of the FFT for many processors, and links to additional information, at http://www.fftw.org. For another source on the transform and basic C code, try Numerical Recipes in C.

Posted in Digital Audio, FFT | 3 Comments

A bit about reverb

Reverb is one of the most interesting aspects of digital signal processing effects for audio. It is a form of processing that is well-suited to digital processing, while being completely impractical with analog electronics. Because of this, digital signal processing has had a profound affect on our ability to place elements of our music into different “spaces.”

Before digital processing, reverb was created by using transducers—a speaker and a microphone, essentially—at two ends of a physical delay element. That delay element was typically a set of metal springs, a suspended metal plate, or an actual room.The physical delay element offered little variation in the control of the reverb sound. And these reverb “spaces” weren’t very portable; spring reverb was the only practically portable—and generally affordable—option, but they were the least acceptable in terms of sound.

First a quick look at what reverb is: Natural reverberation is the result of sound reflecting off surfaces in a confined space. Sound emanates from its source at 1100 feet per second, and strikes wall surfaces, reflecting off them at various angles. Some of these reflections meet your ears immediately (“early reflections”), while others continue to bounce off other surfaces until meeting your ears. Hard and massive surfaces—concrete walls, for instance—reflect the sound with modest attenuation, while softer surfaces absorb much of the sound, especially the high frequency components. The combination of room size, complexity and angle of the walls and room contents, and the density of the surfaces dictate the room’s “sound.”

In the digital domain, raw delay time is limited only by available memory, and the number of reflections and simulation of frequency-dependent effects (filtering) are limited only by processing speed.

Two possible approaches to simulating reverb

Let’s look at two possible approaches to simulating reverb digitally. First, the brute-force approach:

Reverb is a time-invariant effect. This means that it doesn’t matter when you play a note—you’ll still get the same resulting reverberation. (Contrast this to a time-variant effect such as flanging, where the output sound depends on the note’s relationship to the flanging sweep.)

Time-invariant systems can be completely characterized by their impulse response. Have you ever gone into a large empty room—a gym or hall—and listened to its characteristic sound? You probably made a short sound—a single handclap works great—then listened as the reverberation tapered off. If so, you were listening to the room’s impulse response.

The impulse response tells everything about the room. That single handclap tells you immediately how intense the reverberation is and how long it takes to dies out, and whether the room sounds “good.” Not only is it easy for your ears to categorize the room based on the impulse response, but we can perform sophisticated signal analysis on a recording of the resulting reverberation as well. Indeed, the impulse response tells all.

The reason this works is that an impulse is, in its ideal form, an instantaneous sound that carries equal energy at all frequencies. What comes back, in the form of reverberation, is the room’s response to that instantaneous, all-frequency burst.


An impulse and its response

In the real world, the handclap—or a popping balloon, an exploding firecracker, or the snap of an electric arc—serves as the impulse. If you digitize the resulting room response and look at it in a sound-editing program, it looks like decaying noise. After some density build-up at the beginning, it decays smoothly toward zero. In fact, smoother sounding rooms show a smoother decay.

In the digital domain, it’s easy to realize that each sample point of the response can be viewed as a discrete echo of the impulse. Since, ideally, the impulse is a single non-zero sample, it’s not a stretch to realize that a series of samples—a sound played in the room—would be the sum of the responses of each individual sample at their respective times (this is called superposition).

In other words, if we have a digitized impulse response, we can easily add that exact room characteristic to any digitized dry sound. Multiplying each point of the impulse response by the amplitude of a sample yields the room’s response to that sample; we simply do that for each sample of the sound that we want to “place” into that room. This yields a bunch—as many as we have samples—of overlapping responses that we simply add together.

Easy. But extremely expensive computationally. Each sample of the input is multiplied individually by each sample of the impulse response, and added to the mix. If we have n samples to process, and the impulse response is m samples long, we need to perform n+m multiplications and additions. So, if the impulse response is three seconds (a big room), and we need to process one minute of music, we need to do about 350 trillion multiplications and the same number of additions (assuming a 44.1KHz sampling rate).

This may be acceptable if you want to let your computer crunch the numbers for a day before you can hear the result, but it’s clearly not usable for real-time effects. Too bad, because its promising in several aspects. In particular, you can accurately mimic any room in the world if you have its impulse response, and you can easily generate your own artificial impulse responses to invent your own “rooms” (for instance, a simple decaying noise sequence gives a smooth reverb, though one with much personality).

Actually, there’s a way to handle this more practically. We’ve been talking about time-domain processing here, and the process of multiplying the two sampled signals is called “convolution.” While convolution in the time domain requires many operations, the equivalent in the frequency domain requires drastically reduced computation (convolution in the time domain is equivalent to multiplication in the frequency domain). I won’t elaborate here, but you can check out Bill Gardner’s article, “Efficient Convolution Without Input/Output Delay” for a promising approach. (I haven’t tried his technique, but I hope to give it a shot when I have time.)

A practical approach to digital reverb

The digital reverbs we all know and love take a different approach. Basically, they use multiple delays and feedback to built up a dense series of echoes that dies out over time. The functional building blocks are well known; it’s variations and how they are stacked together that give an digital reverb units its characteristic sound.

The simplest approach would be a single delay with part of the signal fed back into the delay, creating a repeating echo that fades out (the feedback gain must be less than 1). Mixing in similar delays of different sizes would increase the echo density and get closer to reverberation. For instance, using different delay lengths based on prime numbers would ensure that each echo fell between other echoes, enhancing density.

In practice, this simple arrangement doesn’t work very well. It takes too many of these hard echoes to make a smooth wall of reverb. Also, the simple feedback is the recipe for a comb filter, resulting in frequency cancellations that can mimic room effects, but can also yield ringing and instability. While useful, these comb filters alone don’t give a satisfying reverb effect.


Comb filter reverb element

By feeding forward (inverted) as well as back, we fill in the frequency cancellations, making the system an all-pass filter. All-Pass filters give us the echoes as before, but a smoother frequency response. They have the effect of frequency-dependent delay, smearing the harmonics of the input signal and getting closer to a true reverb sound. Combinations of these comb and all-pass recirculating delays—in series, parallel, and even nested—and other elements, such as filtering in the feedback path to simulate high-frequency absorption, result in the final product.


All-Pass filter reverb element

I’ll stop here, because there are many readily available texts on the subject and this is just an introduction. Personally, I found enough information for my own experiments in “Musical Applications of Microprocessors” by Hal Chamberlin, and Bill Gardner’s works on the subject, available here on the web.

Posted in Convolution, Digital Audio, Impulse Response, Reverb | 5 Comments

The Fourier series

Experiment with harmonic (Fourier) synthesis with this Java applet! The sliders represent the levels of the first eight harmonics in the harmonic series. The second harmonic is twice the frequency of the first, the third is three times that of the first, and so on. The graph shows one cycle of the resulting waveform.






If you had a Java-equiped browser, you’d see as applet here that looks like this.

Press the Sawtooth button to get an eight-harmonic approximation of a sawtooth waveform. A sawtooth waveform contains all harmonics; the second harmonic is one-half the level of the first, the third harmonic is one-third the level of the first, and so on. (Continuing the series yields a more accurate sawtooth.)

Similarly, press the Square button for a square-wave approximation. A square wave is made of only odd-numbered harmonics, in the same relationship as those of the sawtooth.

One way of looking at this is that the sliders represent the frequency domain of a waveform (the level of its frequency components—how we hear), and the graph represents its conversion to the time domain (the signal as it is routed through audio equipment and speakers, only to be converted back to the frequency domain by our ears!).

Posted in Digital Audio, FFT, Fourier | Leave a comment

A question of phase

If you’ve paid attention for long enough, you’ve seen heated debate in online forums and letters to the editor in magazines. One side will claim that it has been proven that people can’t hear the effects of phase errors in music, and the other is just as adamant that the opposite is true.

Much of the confusion about phase lies with the fact that there are several facets to this issue. Narrow arguments on the subject can be much like the story of the blind men and the elephant—one believes that the animal is snake-like, while another insists that it’s more like a wall. Both sides may be right, as far as their knowledge allows, but both are equally wrong because they’re hampered by a limited understanding of the subject.

What is phase?

Phase is a frequency dependent time delay. If all frequencies in a sound wave (music, for instance), are delayed by the same amount as they pass through a device, we call that device “phase linear.” A digital delay has this characteristic—it simply delays the sound as a whole, without altering the relationships of frequencies to each other. The human ear is insensitive to this kind of phase change of delay, as long as the delay is constant and we don’t have another signal to reference it to. The audio from a CD-player is always delayed due to processing, for instance, but it has no effect on our listening enjoyment.

Relative phase

Now, even if the phase is linear (simply an overall delay), we can easily detect a phase difference if we have a reference. For instance, you can get closer to one of your stereo speakers than the other; even if you use the stereo balance control to even the relative loudness between speakers, it won’t sound the same as being equidistance between them.

Another obvious case is when we have a direct reference to compare to. When you delay music and mix it with the un-delayed version, for instance, it’s easy to hear the effect; short delays cause frequency-dependent cancellation between the two signals, while longer delays result in an obvious echo.

If you connect one of your stereo speakers up backwards, inverting the signal, you’ll get phase cancellation between many harmonic components simultaneously as they cancel in the air. This is particularly noticeable with mono input and at low frequencies, where the distance between the speakers has less effect.

The general case

Having dispensed with linear phase, let’s look at the more general case of phase as a frequency-dependent delay.

Does it seem likely that we could hear the difference between a music signal and the same signal with altered phase?

First, I should point out that phase error, in the real world, is typically constant and affects a group of frequencies, usually by progressive amounts. By “constant”, I mean that the phase error is not moving around, as in the effect a phase shifter device is designed to produce. By “group of frequencies”, I mean that it’s typically not a signal frequency that’s shifted, or unrelated frequencies; phase shift typically “smears” an area of the music spectrum.

Back to the question: Does it seem likely that we could hear the difference between an audio signal and the same signal with altered phase? The answer is… No… and ultimately Yes.

No: The human ear is insensitive to a constant relative phase change in a static waveform. For instance, you cannot here the difference between a steady sawtooth wave (which contains all harmonic frequencies) and a waveform that contains the same harmonic content but with the phase of the harmonics delayed by various (but constant) amounts. The second waveform would not look like a sawtooth on an oscilloscope, but you would not be able to hear the difference. And this is true no matter how ridiculous you get with the phase shifting.

Yes: Dynamically changing waveforms are a different matter. In particular, it’s not only reasonable, but easy to demonstrate (at least under artificially produced conditions) that musical transients (pluck, ding, tap) can be severely damaged by phase shift. Many frequencies of short duration combine to produce a transient, and phase shift smears their time relationship, turning a “tock!” into a “thwock!”.

Because music is a dynamic waveform, the answer has to be “yes”—phase shift can indeed affect the sound. The second part is “how much?” Certainly, that is a tougher question. It depends on the degree or phase error, the area of the spectrum it occupies, and the music itself. Clearly we can tolerate phase shift to a degree. All forms of analog equalization—such as on mixing consoles—impart significant phase shift. It’s probably wise, though, to minimize phase shift where we can.

Posted in Digital Audio, Phase | 2 Comments

The jitters

When samples are not output at their correct time relative to other samples, we have clock jitter and the associated distortion it causes. Fortunately, the current state of the art is very good for stable clocking, so this is not a problem for CD players and other digital audio units. And since the output from the recording media (CD, or DAT, for instance) is buffered and servo-controlled, transport variations are completely isolated from the digital audio output clocking.

Clocking external sources

Clock jitter can arise when we combine multiple units, though. When each unit runs on its own clock, compensating for small differences between the clocks can cause output errors. For instance, even if both clocks are at exactly the same frequency, they will almost certainly not be in phase.

For example, consider connecting the digital output of your computer-based digital recording system to a DAT recorder, and monitoring the analog output of the DAT unit. Because the digital output (S/PDIF or AES/EBU) doesn’t carry a separate clock signal, the DAT unit must output the audio using its own clock.

Since the DAT player can’t synchronize its clock to that of the source, it has to either derive a clock signal from the digital input (using a Phase Locked Loop—PLL), or make the digital input march to its own clock (buffering and reclocking, or sample rate conversion). The PLL method will certainly be subject to jitter on playback, dependent on the quality of the digital signal at the input. In other words, poor cables would make the audio sound worse! It’s important to note that this will only affect monitoring; if you record the signal and play it back, there will be no change from the original (barring serious problems with the cabling or other transfer factors). This because the recorder will store the correct sample values, despite jitter, then reclock the digital stream on playback.

If the clock rate of the input digital stream and the playback unit differ (44.1 KHz and 48 KHz, for instance), the playback unit has no choice but to sample rate convert. If they are the same, the playback unit may use sample rate conversion to oversample the input, then pick the samples that “line up” with its own clock, or it may simply buffer the incoming digital stream and reclock it for output. Either method will not be subject to jitter, since the D/A convertor is using its own local clock.

Note that the resampling (sample rate conversion) techniques actually change the digital stream before converting it to analog, whereas buffering does not. This is a particularly important distinction when making digital copies and transfers.

Be sure to check out Bob Katz’s web article on the subject for a more detailed look.

Posted in Digital Audio, Jitter | Leave a comment