Floating point denormals

There’s another issue with floating point hardware that can easily cause serious performance problems in DSP code. Fortunately, it’s also easy to guard against if you understand the issue. I covered this topic a few years ago in A note about de-normalization, but giving it a fresh visit as a companion to Floating point caveats.

Floating point hardware is optimized for maximum use of the mantissa, for best accuracy. (It also allows for more efficient math processing, since the mantissas are always aligned.) This is called normalization—all numbers are kept in the binary range of plus or minus 1.1111111 (keep going, 16 more 1s) times 2 raised to the power of the exponent. To increase accuracy near zero, floating point implementations let the number become “denormalized”, so instead of the smallest number being 1.0 times 2 raised to the most negative exponent, the mantissa can become as small as 0.000…1 (24 digits).

The penalty is that floating point math operations become considerably slower. The penalty depends on the processor, but certainly CPU use can grow significantly—in older processors, a modest DSP algorithm using denormals could completely lock up a computer.

But these extremely tiny values are of no use in audio, so we can avoid using them, right? Not so easily—recursive algorithms such as IIR lowpass filters can decay into near-zero numbers in typical use, so they will happen. For instance, when you hit Stop on your DAW’s transport, it likely sends a stream of zeros (to let reverbs decay out, and any live input through effects to continue). The memory of a lowpass filter then decays exponentially towards zero. We can slow it a bit by using double precision floats, but it’s only a matter of time till the processing meter climbs abruptly when your processing algorithm bogs down in denormalized computation.

Modern processors can mitigate the problem with flush-to-zero and denormals-are-zero modes. But this too can be tricky—you’re sharing the processor with other functions that might fail without the expected denormal behavior. You could flip the switch off and back during your main DSP routine, but you need to be careful that you’re not calling a math function that might fail. Also, this fix is processor dependent, and might be the wrong thing to do if you move your code ot a new platform. A DSP framework might handle this conveniently for you, but my goal here is to show you that it’s pretty easy to work around, even without help from the processor. Let’s look at what else we could do.

We could test the output of any susceptible math operation—there is no need to test all operations, because most won’t create a denormal without a denormal as input.

There’s a surprisingly easy solution, though, and it’s more efficient than testing and handling each susceptible operation. At points that have the threat of falling into denormals, we can add a tiny value to ensure it can’t get near zero. This value can be so small that it’s hundreds of dB down from the audio level, but it’s still large compared with a denormal.

In fact, it’s actually possible to flush denormals to zero with no other error, due to the same properties of floating point we discussed earlier. If you add a relatively large number to a denormal, then subtract it back out, the result is zero. Still, this is pointless, because a tiny offset will not be heard.

But there is a catch to watch out for. A constant value in the signal path is the same as a DC offset. Some components, such as a highpass filter, remove DC offsets. I get around this by using a tiny number to add as need in the signal path—often, one time is enough for an entire plugin channel—and alternating the sign of it each time a new buffer is processed. A value such as 1e-15 (-300 dB) completely wipes out denormals while having no audible effect.

Why did I pick that value? It needs to be at least the smallest normalized number (2^-126, about 1e-38, or about -760 dB), to ensure it wipes out the smallest denormal. That would work, but it’s better to use a larger number so that you don’t have to add it in as often. A much larger number would likely be fine to add just once—just add it to the input sample, for instance. (But consider your algorithm—if you have a noise gate internal to your effect, you might need to add the the “fixer” value after its output as well.) And clearly it needs to be small enough that it won’t be heard, and won’t affect algorithms such as spectrum analysis. A number like 1e-15 is -300 dB—each additional decimal place is -20 dB, so 1e-20 is -400 dB, for instance. Let your own degree of paranoia dictate the value, but numbers in those ranges are unlikely to affect DSP algorithms, while wiping out any chance of denormals through the chain.

To recap: Take a small number (mDenormGuard = 1e-15;). Change its sign (mDenormGuard = -mDenormGuard;) once in a while so an DC blockers you might have in your code doesn’t remove it. A handy place is the beginning of your audio buffer handling routine, one sign change for an entire buffer should be fine. Add it in (input + mDenormGuard).

Posted in Math, Uncategorized | Leave a comment

Floating point caveats

The equivalent of “When I was your age, we use to walk to school in the snow. Barefoot”, in DSP, is to say we used to do DSP in fixed point math. The fixed-point system could be made of integer and shift operations in software, or built into fixed-point DSP chips such as the 56K family, or other hardware implementations. Floating point was not available in dedicated DSP chips, and took too many cycles on a CPU. But the general usefulness of floating point lead to it being optimized in CPUs, making it the overwhelming choice for DSP today.

A floating binary point allows floating point math a huge range. It essentially gives us a magic ruler that can be stretch or shrunk at will, to measure huge or tiny distances with the same relative accuracy. The catch is the word “relative”. There are the same number of tick marks on the ruler at any size, and that makes for some properties that might not be immediately obvious. Besides the effects of math involving quantized samples, significant errors to watch out for are in addition, multiplication, and denormals. We’ll cover denormals in the next post.

Definitions

Floating point is well defined in IEEE standards that are widely adopted in order to give consistent results from one computer and processor to another. There are many ways to do floating point, but these are the only ones we need consider.

“Single precision” (32-bit) floating point, “float” in C, is ample to hold samples of a digital audio stream. It has a sign bit, 23 bits of mantissa, and 8 bits of exponent. But floating point execution units typically keep results in a “normalized” state, where the leading digit is always a “1” and the exponent is adjusted accordingly. This makes best use of the available mantissa bits. Since the leading bit is always 1, it’s omitted, yielding 24 mantissa bits, effectively. But we have a sign bit too, so it’s equivalent to 25-bit audio. It matches up conveniently to 24-bit audio converters.

Double precision (64-bit), “double” in C, gives substantially more of everything, at little cost in processing time (though twice the memory use). A sign bit, 52 bits of mantissa, and 11 bits of exponent. The increase in exponent isn’t important for audio, but the 54 bits of precision is a big plus to minimize accumulated errors.

For an idea of how the two compare, precision-wise, consider that there are 31557600 seconds in a year. The 25 bits of effective precision (if using the full range of negative to positive) in a 32-bit float has a range of 33554432 (225), a resolution of about a second out of a year. A 64-bit double has a range of 18014398509482000 (254), for a resolution of about two nanoseconds.

For fixed point, it depends on the implementation but the standard is the 56k family, which multiplies 24-bit (leading decimal point) numbers for a result with 48 bits to the right of the decimal point and 8 to the left. The extra 8 bits allow headroom for multiply-accumulate cycles to grow greater than one.

Which size to use

32-bit floats are ample for storage and for the sample busses between processing—the 64-bit sample size used for buffers by some DAWs is overkill. 32-bit floats are adequate for many computations, on par with 24-bit fixed point systems (better in most regards but falling short in some—the 56K’s extended-precision accumulators allow long FIRs to be more precise). Either size has far more range in the exponent than we’ll use—audio simply doesn’t require that incredible dynamic range.

But CPUs are heavily optimized for double-precision floating point, and the added precision is often necessary in audio DSP algorithms. For typical modern Mac/PC host processors, stick with doubles in your DSP calculations; the performance difference is small, and your life will be a lot easier, with less chance of a surprise. But you may need to use single-precision, especially in DSP chips or SIMD and some processors where the performance difference is large, so you should understand the limitations.

Errors in multiplication

When you multiply two arbitrary numbers of n digits of precision, the result has 2n digits of precision. 9 x 9 = 81, 99 x 99 = 9801, .9 x .9 = .81. But floating point processors do not give results in a higher precision—the result is truncated to fit in the same number of bits as the operands (essentially, .1111 x .9999 = .1110 instead of the precise .11108889, for an example of 4-digit decimal math). This means an algorithm using single precision floats may have significantly more error than the same algorithm done in a fixed point DSP like the 56K family, which allows accumulation of double-precision results. You need to consider the effects of this error on your algorithm. Or, just stick with double precision computation, which has so much precision that you can usually ignore this truncation error for audio (but watch out for things like raising values to high powers—you can’t freely code arbitrarily high-order IIR filters even with doubles).

Errors in addition

If you’ve worked with fixed-point math, you’re used to the idea that you discard precision from multiplying when storing back to the original sample size, but addition is basically perfect (as long as you don’t overflow, by either guarding against it, or using an accumulator with headroom). With floating point, you have the same fundamental issues as with multiplication (but with no choice), and a whole new problem with addition. I think this is something many don’t consider, so this section might be my most important point.

Floating point doesn’t do well when adding two values of vastly different sizes.

If our magic ruler is stretched to measure the distance from here to the sun, each tick is bout 5 km, so adding a measurement of a bug’s wing to it will do nothing—it’s too small of a distance to make it to the next tick, the next highest possible number. (With zero placed at the midpoint of the span, the effective mantissa range of a 32-bit float is 225, or 33554432. The scale is our distance to the sun, 150 million km—150000000 km / 33554432 is about 5 km per “tick”. For doubles, it’s half the width of the finest human hair!)

This can be a huge problem in iterative math. Consider that a 32-bit unsigned integer has a maximum value of 4,294,967,295. As long as you don’t exceed that amount, you can count through every value from 0 to that maximum by adding one for each count. A 32-bit float has a maximum value of 3.402823 × 1038, but can count through a maximum of 16,777,216 steps incrementally, no matter the size. After that, the increment is too small compared with the size of the running total, and has no effect.

For instance, a big mistake would be to use a 32-bit float as a timer, incrementing every sample period, to use in an automation ramp calculation. Even if the time increment is a very small number, the timer will stop incrementing after six minutes because the increment’s mantissa can no longer be aligned with that of the running total. 6.3 minutes * 60 seconds/minute * 44100 samples/second is about 224 (16777216) samples—we’re assuming only positive values here. The good news is that when using doubles, it would take a few thousand years to hit that point.

Try this:

// Incrementing integer and float
#include <iostream>
#include <cstdint>
using namespace std;

int main() {
    const long objectives[] = { 100, 10000, 1000000, 100000000, 1000000000 };
    for(const long objective : objectives) {
        float counterFp = 0;
        uint32_t counterInt = 0;
        for (long idx = 0; idx < objective; idx ++) {
            counterInt += 1;
            counterFp += 1;
        }
        cout.precision(20);
        cout << counterInt << ",  " << counterFp << endl;
    }
}
100,  100
10000,  10000
1000000,  1000000
100000000,  16777216
1000000000,  16777216

Note that fixed point math is not without this problem, but for fixed point it’s obvious that small numbers use fewer significants after the leading zeros, and obvious when they are too small to fit in the fixed point representation. With floating point, you can be easily fooled into thinking you have more precision in a value than you can use for an addition operation, because the usable precision is dependent on the value you’re adding it to.

Posted in Math | Leave a comment

Wavetable signal to noise ratio

In our wavetable series, we discussed what size our wavetables needed to be in order to give us an appropriate number of harmonics. But since we interpolated between adjacent table entries, the table size also dictates the signal to noise ratio of playback. A bigger (and therefore more oversampled) table will give lower interpolation error—less noise. We use “signal to noise ratio”—SNR for short—as our metric for audio.

SNR has a precise definition—it’s the RMS value of the signal divided by the RMS value of the noise, and we usually express the ratio in dB. We’ll confine this article to sine tables, because they are useful and the ear is relatively sensitive to the purity of a sine wave.

We could derive the relationship of table size and SNR analytically, but this article is about measurement, and it’s easily extended to other types of waveforms and audio.

Calculating SNR

To calculate SNR, we need to know what part of the sampled audio is signal and what part is noise. Since we’re generating the audio, it’s pretty easy to know each. For example, if we interpolate our audio from a sine table, the signal part is the best precision sine calculations we can make, and the noise is that minus the wavetable-interpolated version. RMS is root-mean-squared, or taking the square roots of all the samples, producing the average, then squaring that value. We do that for both the signal and the noise, sample by sample, and divide the two—and convert to dB. The greatest error between the samples will be somewhere in the middle. Picking halfway for the sine is a good guess, but we can easily take more measurements and see.

It ends up that the table size can be relative small for an excellent SNR with linear interpolation. This shouldn’t be surprising, since a sine wave is smooth and therefore the error of drawing a line between two points gets small quickly with table size. A 512 sample table is ample for direct use in audio. It yields a 97 dB SNR. While some might think that’s fine for 16-bit audio but not so impressive for 24 bit, a closer look reveals just how good that SNR is.

Keep in mind, this is a ratio of signal to noise. While the noise floor is -97 dB compared with the signal, that’s not the same as saying we have a noise floor of -97 dB. The noise floor is -97 dB when the signal is 0 dB (actually, this is RMS, so a full-code sine wave is -3 dB and the noise is -100 dB). But people don’t record and listen to sine waves at the loudest possible volume. When the signal is -30 dB, the noise floor is -127 dB. When the signal is disabled, the noise floor is non-existent.

However, if that’s still not good enough for you, every doubling of the table size yields a 20 dB improvement.

Code

Here’s a simple C++ example that calculates the SNR of a sine table. Set the tableSize variable to check different table sizes (typically a power of 2, but not enforced). The span variable is the number of measurements from one table entry to the next. You can copy and paste, and execute, this code in an online compiler (search for “execute c++ online” for many options).

#include <iostream>
#include <cmath>
#if !defined M_PI
const double M_PI = 3.14159265358979323846;
#endif
using namespace std;

int main(void) {
    const long tableSize = 512;
    const long span = 4;
    const long len = tableSize * span;
    double sigPower = 0;
    double errPower = 0;
    for (long idx = 0; idx < len; idx++) {		
        long idxMod = fmod(idx, span);

        double sig = sin((double)idx / len * 2 * M_PI);

        double sin0, sin1;
        if (!idxMod) {
            sin0 = sig;
            sin1 = sin((double)(idx + span) / len * 2 * M_PI);
        }

        double err = (sin1 - sin0) * idxMod / span + sin0 - sig;

        sigPower += sig * sig;
        errPower += err * err;
    }
    sigPower = sqrt(sigPower / len);
    errPower = sqrt(errPower / len);

    cout << "Table size: " << tableSize << endl;
    cout << "Signal: " << 20 * log10(sigPower) << " dB RMS" << endl;
    cout << "Noise:  " << 20 * log10(errPower) << " dB RMS" << endl;
    cout << "SNR:    " << 20 * log10(sigPower / errPower) << " dB RMS" << endl;
}

Quantifying the benefit of interpolation

This is a good opportunity to explore what linear interpolation buys us. Just change the error calculation line to “double err = sin0 – sig;”, and set span to a larger number, like 32, to get more readings between samples. Without linear interpolation, the SNR of a 512-sample table is about 43 dB, down from 97 dB, and we gain only 6 dB per table doubling.

You can extend this comparison to other interpolation methods, but it’s clear that linear interpolation is sufficient for a sine table.

Extending to other waveforms

OK, how about other waveforms? A sawtooth wave is not as smooth as a sine, so as you might expect, it will take a larger table to yield high SNR number. Looking at it another way, the sawtooth is made up of a sine fundamental. The next harmonic is at half the amplitude, which alone would contribute half the signal and half the noise, but it’s also double the frequency—the equivalent of half the sine table size and therefore 20 dB worse than the fundamental is taken alone. It’s a little more complicated than just summing up the errors of the component sines, though, because positive and negative errors can cancel.

But the measurement technique is basically the same as with the sine example. The signal would be a high-resolution bandlimited sawtooth (not a naive sawtooth), and noise would be the difference between that and the interpolated values from your bandlimited sawtooth table. Left to you as an exercise, but you may be surprised at the poor numbers of a 2048 or 4096 sample table in the low octaves (where the is no oversampling). But again, the noise only occurs when you have signal, particularly when you have a bright waveform, and remains that far below it at any amplitude. It’s still hard to hear the noise through the signal!

Checking the SNR of wavetable generated by our wavetable oscillator code is a straightforward extension of the sine table code. For a wavetable of size 2048 and a given number of harmonics, for instance, create a table of size 2048 times span. Then subtract each entry of the wavetable, our “signal”, from the corresponding interpolated value for the “noise”. For instance, if tableSize is 2048 and span is 8, create a table of 16384 samples. For each signal sample n, from 0 to 16383, compare it with the linearly interpolated value between span points (compare samples 0-7 with the corresponding linear interpolation of samples 0 and 8, etc., using modulo or counters).

It’s more code than I want to put up in this article, especially if I want to give a lot of options or waves and interpolations, but it’s easy. You might want to make a class that lets you specify a waveform, including number of harmonics and wavetable size, which creates the waveform. Create a function to do the linear interpolation (“lerp”) and possibly others (make it a class in that case); input the wavetable and span, output the computed signal and noise numbers. Then main simply makes the call to build the waveform, and another call to analyze it, and displays the results.

Posted in Digital Audio, Oscillators, Wavetable Oscillators | Leave a comment

Sampling theory, the best explanation you’ve ever heard—End notes

A few words before moving on to other topics…

We’ve looked at why digital sample represent ideal impulses, and why any point between samples represents a value of zero. And, as a result, audio samples don’t represent the audio itself, but a modulated version of the audio.

Why is helpful to understand these points?

Critical sampling

First, it gives clear and intuitive answers for why digital audio behaves certain ways than do more typical explanations. For instance, it makes this puzzle trivial:

People ask why the sample rate needs to be double the frequency of the highest signal frequency that we want to preserve. Often the reply is that it needs to be just above double the highest frequency of interest, to avoid aliasing. But why? And how much higher? At this point, someone mentions something about wagon wheels turning the wrong way in movies. Or shows a graph with two sine waves of different frequencies intersecting the same sample points. So unsatisfying.

If you consider that the signal is amplitude modulated in the digitization process, you need only see that the sidebands would start overlapping at exactly half the sample rate. To keep them from overlapping, all frequencies must be below half the sample rate, giving each cycle more than two samples.

Multistage conversion

And integer sample rate conversion choices are easier to make. Especially for multistage conversion. We often use multistage conversion to improve efficiency. Like performing 8x upsampling as three 2x stages. If that sounds like three times the work, it isn’t, because the higher relative cutoffs of the filters make for fewer coefficients, balancing out with the total operations for 8x. But we can do more than break even by optimizing each stage—the earlier stages can be a bit sloppy as long as everything is tidy by the last stage’s output. Somewhat like doing a big cleanup on a house in multiple passes versus one.

Perhaps this is a good place to note that you might see chatroom posts where someone says that instead of inserting zeros and filtering, they prefer to use a polyphase filter. There is no difference—a polyphase filter in this case is simply an algorithm that upsamples and filters. Any seasoned programmer will notice that there is no need to explicitly place zeros between samples, then run all samples through an FIR, because the zero samples result in a zero product; optimizing the code to skip the zero operations results in a polyphase filter.

Optimization example

An understanding of why we need to filter rate conversions can help us optimize DSP processes. For example, someone posted a question on a DSP board recently. They were using EQ filters designed by the bilinear transform, which have a pinching effect near half the sample rate (due to zeros at the Nyquist frequency). They didn’t need additional frequency headroom per se—the filters are linear—but they wanted to oversample by 2x to avoid the shape distortion of peaking EQ filters. (Note there are methods to reduce or avoid such distortion of the filter shape, but this is a good example.)

Let’s say we’re using a 48 kHz sample rate. Typically we’d raise the sample rate to 96k by inserting zeros every other sample and then lowpass filtering below 24k. Then we’d do our processing (EQ filtering). Finally, we’d take it back to 48k by lowpass filtering below 24k and discarding every other sample. But in this case, our processing step is linear (linear EQ filters), so it doesn’t create new frequencies. That means we can skip one of the lowpass filtering stages. It doesn’t matter whether we lowpass filter before or after the EQ processing, but we don’t need both. That’s a substantial savings.

Another example

Let’s say we create an audio effect such as an amp simulator, which has a non-linear process that requires running at a higher sample rate to reduce audible aliasing. We run our initial linear processes, such as tone controls, then upsample and run our non-linear processes (an overdriven tube simulation!). But in this case we conclude with a speaker cabinet simulator, which is a linear process (via convolution or other type of filtering). Guitar and bass cabinets use large speakers (typically 8” and up, often 10” or 12” for guitar), with frequency responses that drops steeply above around 5 kHz. Understanding how the downsampling process works, we might choose to eliminate the downsampling filter stage altogether, as superfluous, or at least save cycles with a simplified filter with relaxed requirements.

Posted in Digital Audio, Sampling Theory | Tagged | 2 Comments

Sampling theory, the best explanation you’ve ever heard—Part 3

We look at what Pulse Amplitude Modulation added to our analog source audio.

What did PAM add?

Earlier, we noted that the PAM signal represents the the source signal plus some additional high frequency content that we need to remove with a lowpass filter before we listen back.

Again, PAM is amplitude modulation of the source signal with a pulse train. Mathematically, we know precisely what amplitude modulation produces—the sums and differences of every frequency component between the two input signals. That is, if you you multiply a 100 Hz sine wave by a 6 Hz sine wave, the result is the sum of 106 Hz and 94 Hz sine waves. For signals with more frequency components, there are more sums and differences in the result.

To answer our question, “What got added?”, we need to understand the frequency content of a pulse train. One way to know that would be to use an Fourier Transform on the pulse train. But I want to use intuitive reasoning to eliminate as much math as possible. Fortunately, I already know what the extra frequency content is—it’s the spectral images in sampled systems, as described in classic DSP textbooks. That coupled with knowledge of amplitude modulation tips me off that we’ll need a frequency component at 0 Hz (DC—we need that to keep our original source band), at the sample rate, and at every integer multiple of the sample rate. Through infinity.

OK, we’ll lighten up on the infinity requirement. We can’t produce a perfect impulse in the analog world anyway. And we don’t need to. However, once in the digital domain, samples represent perfect impulses. While their values may have deviated slightly from a perfect representation of the analog signal, due to sampling time jitter and quantization, any math we do to them is “perfect” (again, subject to quantization and any other approximations). In the digital realm, the images do go to infinity.

Indeed, as you add cosine waves of 0, 1, 2, 3, 4…times the sample rate, the result gets closer and closer to the shape of an impulse. (Cosine instead of sine so that the peaks of the different frequencies line up.)

And that means we’ll have a copy of the source signal mirrored around 0 Hz, around the sample rate, twice the sample rate, three times the sample rate…to infinity. (In both directions, but we can ignore negative frequencies—for real signals, the negative spectrum mirrors the positive.)

What we’ve learned

Revisiting my “secrets”, with added comments:

1. Individual digital samples are impulses. Not bandlimited impulses, ideal ones.

Bothered that ideal impulses are impossible? Only in the physical world. There, we accept limitations. For instance, gather together infinity of something. Anything—I’ll wait. Meanwhile, in the mathematical world, infinity fits easily on this page: ∞

2. We know what lies between samples—virtual zero samples.

Think there’s really a continuous wave, implied, between samples? If so, you probably think it’s because samples represent a bandlimited impulse. No—you’re getting confused with what will come out of the DAC’s lowpass filter later, when we play back audio.

3. Audio samples don’t represent the source audio. They represent a modulated version of the audio. We modulated the audio to ensure points #1 and #2.

This is a frequency-domain observation that follows from the first two points, which are time domain. If you understand this point, you’ll never be confused about sample rate conversion.

Posted in Digital Audio, Sampling Theory | Tagged | 5 Comments

Sampling theory, the best explanation you’ve ever heard—Part 2

In this article, we explore the origins of sampling.

Discrete time

For many, discrete time and digital sampling are synonymous, because most people have little experience with discrete time analog. But perhaps you’ve used an old-style analog delay stompbox, with “bucket brigade” delay chips. Discrete time goes back a lot farther, though. When we talk of the sampling theorem, attributed to people like Nyquist, Shannon, and others, it applies to discrete time signals, not digital signals in particular.

The origins of discrete time theory are in communications. A single wire can support multiple simultaneous telegraph messages, if you synchronize a commutator between sender and receiver and slice time into sections to interleave the messages—this is called Time Division Multiplexing, or TDM. Following later with voice, using TDM to fit multiple voice calls on a line, it was found that the sampling rate had to be around 3500-4300 Hz for satisfactory results.

Traveling over a wire, analog signals can’t be “discrete” per se–there is always something being sent, no gaps in time. But the signal information is discrete, sending zero in between, and that leaves room to interleave other signals in TDM.

The most common method of making an analog signal discrete in this way is through Pulse Amplitude Modulation, or PAM. This means we multiply the source signal continuously with a pulse train of unit amplitude.

While the benefit of PAM for analog communications is that we can interleave multiple signals, for digital, the benefit is that we don’t need to store the “blank” (zero) space between samples. For digital sampling, we simply measure the height of each impulse of the PAM result, and encode it as a number. Pulse Amplitude Modulation and encoding—we call the combined process Pulse Code Modulation. Now you know what PCM means.

Impulses, really

Some might look at that last diagram and think, “But I’ve seen this process depicted as a staircase wave before, not spiky impulses.” In fact, measuring voltage quickly and with precision, which we must do for the encoding step, is not easy. Fortunately, we intend to discard the PAM waveform anyway, and keep just the digital values. We don’t need to maintain the empty spaces between impulses, since our objective is not time division multiplexing analog signals. So, we perform a “sample and hold” process on the source signal, which charges a capacitor at the moment of sampling and stretches the voltage value out, allowing a more leisurely measurement.

This results only in a small shift in time, functionally identical to instantaneous sampling—digital samples represent impulses, not a staircase. If you have a sample value of 0.73, think of it as an impulse of height 0.73 units.

The step of digitizing the analog PAM signal introduces quantization, and therefore quantization error. But it’s important to understand that issues related to aliasing are not a property of the digital domain—aliasing is a property of discrete time systems, so is inherent in the analog PAM signal as well. That’s why we took this detour—I believe I can explain aliasing to you in a simpler way, from the analog perspective.

Next: We’ll look at exactly what frequency content is added by the PAM (and therefore PCM) process, in Part 3

Posted in Digital Audio, Sampling Theory | Tagged | Leave a comment

Sampling theory, the best explanation you’ve ever heard—Part 1

I’ll start by giving away secrets first:

  1. Individual digital samples are impulses. Not bandlimited impulses, ideal ones.
  2. We know what lies between samples—virtual zero samples.
  3. Audio samples don’t represent the source audio. They represent a modulated version of the audio. We modulated the audio to ensure points #1 and #2.

Well, not secrets, but many smart people—people who’ve done DSP programming for years—don’t know these points. They have other beliefs that have served them well, but have left gaps.

Let’s see why

Analog audio, to digital for processing and storage, and back to analog

Component details—first the analog-to-digital converter (ADC)

The digital-to-analog converter (DAC)

Analog to digital conversion, and conversion back to analog are symmetrical processes—not surprising.

But we can make another important observation: We know that the bandlimiting lowpass filter of the ADC is there as a precaution, to ensure that the source signal is limited to frequencies below half the sample rate. But we have an identical filter at the output of the DAC—why do we need that, after eliminating the higher frequencies at the beginning of the ADC? The answer is that conversion to discrete time adds high frequency components not in the original signal.

Stop and think about this—it’s key to understanding digital audio. It means that the digital audio samples do not represent the spectrum of the bandlimited analog signal—the samples represent the spectrum of the bandlimited analog signal and additional higher frequencies.

To understand the nature of the higher frequencies added in the sampling process, it helps to look at the origins of sampling.

Next: We explore the origins of sampling in Part 2

Posted in Digital Audio, Sampling Theory | Tagged | 4 Comments

Sampling theory, the best explanation you’ve ever heard—Prologue

I’ve been working on a new video, with the goal of giving the best explanation of digital sampling you’ve ever heard. The catch is I started on it three years ago. I’m not that slow, it’s just that I’ve been busy with projects, so time passes between working on it. And each time I get back to it, I rewrite it from scratch. You see, I want to give you a solid, useful theoretical framework that will both help you understand and help you make good decisions, with the goal of being intuitive. So, each time I have a little different viewpoint to try.

The video is still a few months off—I’m still busy for the next few months—but I’m going to present the idea first as a collection of short articles. And in the end, if I like it, the articles will serve as the script for the video.

I believe this is a unique explanation of sampling. Please follow it from the start, even if you feel you know the subject well. This isn’t something I read or was taught, but came from thinking about a way to explain it without descending into either mathematical symbolism or hand waving.

Next up: My first crack at the best explanation of sampling theory you’ve ever heard.

Posted in Digital Audio, Sampling Theory | Tagged | Leave a comment

Amp simulation oversampling

In tandem with our last article on Guitar amp simulation, this article gives a step by step view of the sampling and rate conversion processes, with a look at the frequency spectrum.

From guitar to digital

The first two charts embody the initial sampling of the analog signal. It’s done in one step, from your analog-to-digital converter, but it’s a two-part process. First, a lowpass filter clears everything from half the sample rate up—something we must do to avoid higher frequencies aliasing into our audio band when sampled.

Then the signal is digitized. This creates repeating images of the positive and negative frequencies that extend upward without end. After this, we’ll look only at frequencies between 0 Hz and the sampling frequency, but it’s important to understand that these images are there, nonetheless.

If we don’t need more frequency headroom, we can do our signal processing on the samples we have. In fact, we want to do as much as we can at the lower sample rate. In the case of a guitar amp, we would process any tone controls that come before the “tube”, and other things like DC blocking and noise gating. And we can do our (non-saturating) gain stage here (assuming floating point).

Higher rate processing

After initial processing and gain, it’s time for saturation. For this non-linear process, we need frequency headroom. The first step is to increase the rate by inserting zero-magnitude samples. Though I suggested starting with 8x upsampling in the guitar amp article, this exercise shows 4x, in order to better accommodate page constraints. We place three zero-samples between each existing sample to bring it up 4x. This only raises the sample rate—we have four samples in the same period that we used to have one. Since the spectrum is not altered, the aliases are still there.

Part two of the sample rate conversion process is to use a lowpass filter to clear everything above our original audio band. (In reality, we optimize the zero insertion and filtering into a single process, to take advantage of multiplies by zero that we can skip. Note we usually use a linear-phase FIR for this step, in order to preserve the wave shape for the saturator.) Now we see our headroom.

After our saturation stage, we’ve created new harmonics. As long as they are at a sufficiently low level by the time they reach the sample rate minus our final audio bandwidth, the aliased version won’t pollute our audio band.

Back to normal

Done with our more expensive high-rate signal processing, we can drop back to our original sample rate for the rest. The first step is to run our lowpass filter again, to clear everything above our audio band.

Part two is the downsampling process—we simply keep one sample, discard three, and repeat. Why did we bother calculating them? Because we needed them ahead of the lowpass filter. But, here also, we can optimize these two down-conversion steps into one, and save needless calculation.

Final processing

From here, we handle other linear processes—any filtering and tone controls the follow the tube stage, effects such spring reverb, and finally the speaker cabinet simulation. In the end, we send it to our digital-to-analog converter, which itself has a lowpass filter to remove the aliased copies.

And enjoy listening.

Posted in Digital Audio, Effects, Guitar Amp Simulation, Sample Rate Conversion | 4 Comments

Guitar amp simulation

In this article, I’ll sketch a basic guitar amp simulator. For one, questions on the topic come up often, and also, it will be a good example of a typical use of working at a higher sample rate.

The most basic guitar amp simulator has gain, with saturation, tone controls, and a speaker cabinet simulator. Because saturation is a non-linear process, the results are different whether the tone controls come before or after—more on this later. The speaker cabinet, of course, comes last, and is an important part of the tone. Gain with saturation (the overdriven “tube”) is the tricky part—we’ll start there.

Gain with saturation

Gain is simply a multiply. But we like to overdrive guitar amps. That means gain and some form of clipping or softer limiting—that’s what generates the overdrive distortion harmonics we want. Typically, we’d ease into the clipping, more like tube saturation behaves, but the more overdrive gain you use, the closer it gets to hard clipping, as more of the signal is at the hard limit.

That’s where we hit our first DSP problem. Clipping creates harmonics, and there’s no way to say, “Please, only generate harmonics below half the sample rate.” We will have aliasing. The added harmonics fall off in intensity (in much the same way as harmonics of a rectangular wave do), as they extend higher in frequency, but at typical digital audio sample rates, the Nyquist Frequency comes too soon. Aliased images extend back down into the audio range. We can’t filter them out, because they mix with harmonics we want to keep. And because the guitar notes played aren’t likely the be an integer multiple of the sample period, the aliased harmonics are out of tune. Worse, if you bend a guitar note up, the aliased harmonics bend down—that’s where aliasing becomes painfully apparent.

Can we calculate clipping overtones and create just the ones we want? Can we analyze the input and output and remove frequencies that we don’t want? That is not an easy task (left as an exercise for the reader!).

The oversampled solution

To mitigate the aliasing issue, the most practical solution is to give ourselves more frequency headroom before generating distortion harmonics with our saturation stage. If the sample rate is high enough, aliased images are spread far enough apart that the tones extending into our audio range from above are diminished to the point they are obscured by the din of our magnificent, thick, overdriven guitar sound. And when we play delicately or with little overdrive gain, so is the aliasing lessened.

How much headroom?

How much headroom do we need? I’d like to leave that to your situation and experimentation—mostly. But since I started doing this in the days when every DSP cycle was a precious commodity, I can tell you that—for 44.1 kHz sample rate, typically the worst case that we care about—the minimum acceptable oversampling factor is 6x. 8x is a good place to start, and will be adequate for many uses, at a reasonable cost. (Do multistage oversampling for efficiency…but that’s another story.)

Raising the sample rate “8x” sounds like we’ll have eight times the bandwidth, but it’s better than that. At our original 44.1 kHz sample rate, we have a usable bandwidth of about 20 kHz (allowing for the conversion filters), and little addition frequency headroom. If we go past 24.1 kHz (44.1 kHz – 20 kHz), aliasing extends below 20 kHz. But by raising the rate to 8 times the original, we have headroom of 352.8 kHz – 20 kHz, or 332.8 kHz. That’s more than 80 times our original headroom.

The idea is that we upsample the signal (removing anything not in our original audio band), run it though our tube simulator (clipper/saturator), then drop the sample rate back to the original rate (part of this process is again removing frequencies higher than out original audio band).

How much gain?

Real guitar amps have a lot of gain. Don’t think that if your overdrive is adjustable from 0-2 gain factor, you’ll get some good overdrive distortion. That’s only a maximum of 6 dB. More like 60 dB (0-1024)! Maybe up to something like 90 dB for modern, screaming high-gain amps. That’s equivalent to a shift of 15 bits, so I hope you’re using something better than a 16-bit converter (or tracks) to input your raw guitar sound.

The tube

My intent here is not to guide you down the road of yet another guitar amp simulator plugin, but to give an example of a task that needed more frequency headroom, requiring processing at a higher sample rate. But it’s worth going into just a bit of detail on the tube (saturation) element. Again, we’re talking about “first approximations”—a hard clipper, or a soft one.

A hard clipper is trivial. If a sample is greater than 1, change it to 1. If less than -1, change it to -1.

if (samp > 1.0)
 samp = 1.0;
else if (samp < -1.0)
  sample = -1.0;

Here’s the transfer function—for input sample values along the x axis, the output is on the y axis. For input { 0.5, 0.8, 1.0, 1.3, 4.2 }, the output is { 0.5, 0.8, 1.0, 1.0, 1.0 }.

But for this use we’d probably like a softer clip—a transfer function that eases in to the limit. samp = samp > 1 ? 1 : (samp <= -1 ? -1 : samp * (2 – fabs(samp))); This is just a compact way of saying,

if (samp > 1.0)
  samp = 1.0;
else if (samp < -1.0)
  sample = -1.0;
else
  samp = samp * (2 - fabs(samp));

This is a very mild, symmetrical x-squared curve. It’s just an example, but you’ll find it produces useful results. You could try a curve that stays straighter, then curves quicker near +/-1. Or a curve that’s not symmetrical for positive and negative excursions. The harder the curve, the closer we get to our original hard-clipping.

One detail worth noting: If you’ve spent much time looking at discussions of this sort of non-linear transfer function on the web, usenet, or mailing lists, invariably someone points out that we know exactly what order of harmonics will be created, and therefore how much oversampling we might need, based on the polynomial degree. This is wrong—in fact we’re only using the polynomial between the bounds of x from -1 to 1, and substituting a hard clip for all other points. For a use such as this, the input is driven far into the hard clip region, so the polynomial degree is no longer relevant.

Tone controls

We know IIR filters well, so this part is easy. Typically, a guitar amp might have bass, mid, and treble controls. One catch is that if you want to sound like a particular vintage amp, they used passive filters that result in control interaction. That is, not active circuitry that isolate the components from each other. So, adjusting the bass band might affect the mid filtering as well. But that’s fairly easy to adjust for.

More about overdrive and tone

Something worth noting. It you really want to scream with gain, you need to make some architectural adjustments. If you’ve played much with raw distortion pedals with different instruments, you’ve probably noticed that you get a “flatulent” sound with heavy clipping of bass. And high-gain distortion of signals with a lot of highs can sound pretty shrill. Guitar solos really scream when they have a lot of midrange distortion. So, if you really want to go for a screaming lead with loads of distortion without it falling apart into useless grunge, one thing you can do is roll off the lows and highs before the tube sim (distortion) stage. But then the result lacks body—compensate by boost the lows and high back up, after the distortion stage. The point here is that you have three choices of where to put EQ for your amp tone: before the tube, after, or both.

Some of your tone choices can be a property of the amp model—not everything needs to be a knob for the user to control. These choices are what give a particular guitar amp its characteristics. A Fender Twin does not sound like a Marshall Plexi. The reason that well-known guitar amps are the basis of DSP-based simulation is that the choices of their designers have withstood the test of time. Countless competitors faded from memory, often because there choices were as compelling.

Cabinet

Similarly, guitar speakers gained familiar configurations not because the industry got together and chose, but these are the ones that worked out well.

One characteristic of speakers for guitar amps is that they are not full range—you’ll find no tweeters in guitar cabinets. The highs of the large speakers used (most often 10″ and 12″) drop off very quickly. A clean guitar tone doesn’t have strong high frequency harmonics, and the highest note on a guitar is typically below 1 kHz. Overdrive distortion creates powerful high frequency harmonics, but we really don’t want to hear them up very high—extremely harsh and fatiguing to listen to.

The first approximation of a speaker cabinet is simply a lowpass filter, set to maybe 5 kHz. I didn’t say it would be a great cabinet, but it would start to sound like a real guitar amp combo. The next step might be to approximate the response of a real speaker cabinet, miked, with multiple filters.

But if you’re serious, a better start might be to generate impulse responses of various cabinets (4 x 12″, 2 x 10″, etc.), miked at typical positions (center, edge, close, far), with selected mics (dynamic, condenser). Then use convolution to recreate the responses.

Followup

To moderate the size of this article, I’ll follow with images depicting the oversampling process in the next article.

Posted in Aliasing, Digital Audio, Sample Rate Conversion | 3 Comments