A wavetable oscillator—end notes

I wrote the wavetable oscillator code from scratch as I wrote this series of articles, using just the thought process I gave. Hence, no references to other articles on wavetable oscillators. The concept of a phase accumulator is fairly obvious; one of the first books I recall reading about it was Musical Applications of Microprocessors, by Hal Chamberlin, about 30 years ago (I have an autographed first edition), though the book didn’t explore the concept of multiple tables per oscillator to control aliasing.

These end notes are an opportunity to touch on various aspects of the oscillator.

Oscillator examples

First, a few examples. Here’s a pulse-width modulation example, using a sine-wave instance of an oscillator to modulate the pulse width of another oscillator:

PWM 110 Hz mod 0.3 Hz sine, 4s

Three detuned and summed sawtooth oscillators:

Saw times 3 detuned 55 Hz, 4s

But we aren’t limited to computing our wavetables. Here, I recorded myself singing “aahh…” (originally, 110 Hz, the second A below middle C). I cut one cycle of that in an audio editor and stretched (resampled) it to 2048 samples, and did an FFT of that. From there it was easy to create higher octaves by eliminating harmonics in the upper octave and doing an inverse FFT to make the next higher wavetable—the same process as we did with the computed waves. Here it is, swept 20 Hz to 20 kHz (but keep in mind that the typical useful range is in the lower octaves):

Ahh sweep 20-20k, 20s

Sawtooth revisited

Because I wanted to show the harmonic series of sine waves building a sawtooth in the earlier articles, I didn’t mention that sawtooth waves traditionally ramp up (remember the charging capacitor in the analog oscillator?). Creating this wave from sines make it ramp down—often called an inverted sawtooth. It really doesn’t make a difference in the audio range; in the sub-audio range for a control signal, you might prefer one or the other depending on the effect you’re after. But either way, it’s trivial to build the up-ramping sawtooth, and I did that in the final version just for fun. I left out the 20-40 Hz octave in the tutorial to make a point, but I’m including it here—a non-inverted sawtooth sweep, and you’ll notice that the lowest octave is brighter with a true 20 Hz table:

Saw sweep 2048 lin 20-20k (base=20Hz), 20s

More on low-end brightness

Though we don’t often listen to the audio output of an oscillator in the sub-audio frequency range, the harmonic content is apparent for the brighter waveforms such as sawtooth. Here’s what it sounds like when we sweep from audio range down to 1 Hz with our wavetables built for the audio range:

Saw sub-audio test (20-20k 2048), 20s

Note how the harmonic content drops as the ticks slow.

Because the lower the frequency, the less that aliasing is a factor (since the sample resolution for a single cycle increases the lower we go), we could switch to a more straight-forward sawtooth for sub-audio. Or, because our oscillator allows a variable table size for each subtable, we could simply place a long wavetable to handle all sub-audio. Here’s the same oscillator, but with a 32768-sample sawtooth table to handle everything below 20 Hz:

Saw sub-audio test (20-20k 2048 + 32768 ramp), 20s

Optimizations

My goal with the accompanying code was to make it simple and understandable. It’s pretty quick, though—the full audio sweep takes about 0.9 seconds to generate 1000 seconds of audio on my computer. And there are certainly ways to make it faster, though in general a little bit faster makes it a lot less readable, and often requires some loss of generality. For instance, we could make oscillator table selection faster by mandating octave spacing, then using a fast-log2 floating-point trick to to get to the proper table in a single step—but that would put a limitation on the oscillator, the relevant code would become unreadable to many people, and in practice the speed gain is small.

Some people might prefer a fixed-point phase accumulator. The math is fast and precise, and you can get a free wrap-around when you increment. But the double-precision floating point phase accumulator of this oscillator is extremely precise, is plenty fast, and the code is very easy to follow.

Interpolation

If you search the web for wavetable oscillators, you’ll find terrific educational papers and forum threads discussing various types of interpolation and their relative performance. It may leave you wondering why you should trust the lowly and relatively pathetic linear interpolation presented here. Good point.

It’s about table size. If your table is big enough, you don’t need any interpolation at all—truncation is fine. But linear interpolation is only a bit more costly, and gives a boost that’s probably worth the effort. Higher forms of interpolation require more sample points in the calculation (for instance, needing two points on each side of the point to be calculated, instead of one point on each side for linear interpolation). They are great for squeezing better quality out of a small table, but you need to ask yourself why you are using the small tables. If you’re implementing this on a modern computer, what is the point of increasing your calculations so that you can use a 512-sample table instead of a 2048-sample table—on a systems that has gigabytes of free memory?

Constant versus variable table size

If we use a constant subtable size, as each subtable moves up to cover the next octave, with half of the harmonic content, they become another factor of 2 x oversampled. The top octave will always be a sine wave, and at the 2048 samples we’re using for our subtables, it’s extremely oversampled—way more than we need or will be able to discern.

And while I said that we want to be at least 2 x oversampled at our lowest subtable, for linear interpolation to have good fidelity, at the same time it’s unlikely that we’ll hear it in the lowest octave. The reason is that low frequency wavetables for a sawtooth will be very crowded in the high harmonics—half of them are in the top octave of our hearing, spaced closely. So, we could go to 1024-sample tables for 40 Hz and up, or 2048 for 20 Hz and up (as in the “Saw sweep 2048 lin 20-20k” example above), and you probably won’t hear the difference. All of the higher tables will still be increasingly oversampled. Here’s the 1024 equivalent of the 2048 example in Part 3 of the wavetable oscillator series:

Saw sweep 1024 trunc 20-20k (base=20Hz), 20s

In a nutshell, we might suspect that it will be easier to hear the benefits of oversampling in the upper octaves, but at the same time the minimum table size to hold their required harmonic content is smaller.

We could just as easily go to a constant oversampling ratio by dropping the subtable length by a factor of two as we go to the next octave’s subtable. It’s easy enough to make all of these things variable in your code, so that you can play with tradeoffs.

Lesser interpolation experiments

Linear interpolation is referred to as “first order” interpolation. Truncation is “zero order” interpolation, implying none. So let’s hear how awful the oscillator would sound with no interpolation:

Saw sweep 2048 trunc 20-20k, 20s

Um, that’s not as bad as expected—or should we not have expected it to be bad? Remember, we’re using fixed table sizes so far, which makes the higher, more sensitive tables progressively more oversampled. And oversampling helps improve any interpolation, even none.

Let’s explore variable table sizes, to keep a constant oversampling ratio per octave. We’ll start with the worst-case of 1 x oversampling, and with truncation:

Saw sweep variable 1x trunc 20-20k, 20s

As expected, the aliasing is bad, and gets worse in the higher octaves. Here it is again, with linear interpolation:

Saw sweep variable 1x lin 20-20k, 20s

Even though 1 x oversampling is still inadequate, the improvement with linear interpolation is obvious. Let’s try again, but with 4 x oversampling; first with no interpolation (and let’s focus on the higher octaves):

Saw sweep trunc 1280-20k, constant 4x oversample, 20s

Now with linear interpolation:

Saw sweep lin 1280-20k, constant 4x oversample, 20s

You can still hear a little aliasing at the top, but 4 x is adequate for much of the range. We might consider 8 x, but a problem with this approach is that we’re wasting a lot of table space for the lower octaves, and not achieving the objective of saving memory by using variable tables. Instead of keeping a constant oversampling ratio, we can go with a variable table size, but impose a minimum length, so that just the top octaves are progressively more oversampled, where we need it most.

Here are constant 1 x oversampled tables, but with a minimum of 64 samples so that the top octaves are oversampled progressively higher; first truncated, then with linear interpolation:

saw sweep 2048 trunc 20-20k (base=20, variable limit 64), 20s

saw sweep 2048 lin 20-20k (base=20, variable limit 64), 20s

A couple of things we’ve learned from these exercises: The upper octaves need progressively more oversampling to sound as good as the lower octaves—at least for a waveform like the sawtooth. And linear interpolation is a noticeable improvement over truncation.

But I hope you’ve also realized that truncation is a viable alternative, as long as the table size is big enough. And the “big enough” qualification is not difficult if you’re using fixed table sizes, which yield progressively higher oversampling in higher octaves.

While variable table size is helpful in memory-resticted environments, it’s not very important in most modern computing environments. I included the ability to handle variable table sizes within an oscillator for educational experimentation, mainly.

Note: Some people are thinking…if truncation might be acceptable, why not use rounding instead? The answer is…there is no difference in noise/distortion between truncation and rounding—use whichever is most convenient (which may depend on the computer language you use or rounding mode of the processor). If you doubt that truncation and rounding are equivalent for our use, consider that rounding is equivalent to adding 0.5 and truncating; this means that rounding in our lookup table is equivalent to truncation except for a phase shift of half a sample—something we can’t hear (because there’s no reference to make it matter, but if it really bothers you, you could always create the wavetables pre-shifted by half a sample!).

Setting oscillator frequency

We set the pitch by setting the increment that gets added to our phasor for every new sample lookup. An increment of 0.5 steps half-way through the table. For a single-cycle table, that would correspond to a frequency of half the sample rate—22.05 kHz, half of our 44.1 kHz sample rate. Another way to say that is that the increment for a frequency of 22.05 kHz is 22050 / 44100, or 0.5.

So, the increment for any frequency is that frequency divided by the sample rate. For 440 Hz, it’s 440 / 44100, or 0.00997732426303855.

On the subject of frequency, one simple and useful addition to the oscillator might be the ability to adjust the phase, for quadrature and phase modulation effects.

Hearing

OK, I hear better than I thought. 15k loud and clear, 16k is plenty clear too, but I could tell it wasn’t quite as loud. 17k is getting pretty tough, needing plenty of volume. But in this case I was listening to naked sine waves at pretty decent volume—I stand by my reasoning that it would be awfully tough to hear those aliased harmonics—even if we were playing ridiculously high notes with no synth filter, at high volume, and soloed. But, if you disagree, just go with closer wavetable spacing!

Posted in Digital Audio, Oscillators, Wavetable Oscillators | 7 Comments

A wavetable oscillator—the code

The wavetable oscillator code follows the basic structure of most audio processing blocks, with three types of functions:

  • at the start: initialization—create the oscillator and its wavetables
  • as needed: control—change the oscillator frequency, and pulse width for PWM
  • every sample: processing—get next output sample, update phase

At the start

Initialization—Most of our work for the oscillator is here. Fortunately, this is the part that’s not time-critical. We start by initializing the oscillator’s phase to zero, and creating all of the wavetables we need—starting with the lowest frequency wavetable, followed by its progressively bandwidth-reduced copies for successively higher frequency ranges. In this implementation, each wavetable is accompanied by its table length, and the highest pitch that the table is built to handle without noticeable aliasing.

As needed

Control—You may change the oscillator frequency only for new notes, or it might be as often as every sample if you want modulation with updates at the sample rate (also true of pulse-width if you want to pulse-width modulation). And fortunately this task is the simplest and fastest of all for the oscillator, so it’s no problem supporting it for every sample.

Every sample

Processing—In general you need to update each oscillator’s phase for every sample, though you can choose to suspend updating of any oscillator that’s part of a voice that’s silent. And of course the oscillator won’t do you much good if you don’t get its output for every sample and use it.

Updating the phase is trivial—just add the phase increment, and wrap it back into the range of 0-1 if necessary.

Retrieving the current output is a bit more work. First, we need to determine which table to use, by comparing the current frequency control (phase increment) with the highest-frequency table that can accommodate it. Then we use the current phase value, from the phase accumulator, to determine the wavetable positions and weighting for our linear interpolation to yield the oscillator output.

For instance, if the phase accumulator is 0.51, and the length of the appropriate wavetable is 2048, then the offset into the table is 0.51 x 2048, or 1044.48. So, we use table samples at indices 1044 and 1045, and use the fractional part—0.48—in the interpolation.

Key oscillator components

Variables

The oscillator is implemented in C++. We’ll start with the oscillator’s private member variables, but keep in mind that you won’t access these directly—you’ll interact with the oscillator through the member functions.

First, each oscillator needs a phase accumulator and a phase increment (frequency control):

double phasor;      // phase accumulator
double phaseInc;    // phase increment

And since we’re supporting variable pulse width via the difference between two saws, we need a phase offset control:

double phaseOfs;    // phase offset for PWM

And each oscillator needs its list of wavetables and how many there are. We pick a maximum number of wavetables to support—this information doesn’t take up much memory, and it’s more efficient than allocating the list dynamically; we can cover 20 Hz to 20 kHz with ten tables—I picked 32 as the maximum in case you want to try half- or third-octave tables, but you can set numWaveTableSlots to whatever value you want:

// list of wavetables
int numWaveTables;
waveTable waveTables[numWaveTableSlots];

Here’s the wavetable structure—the length of the wavetable, the top frequency it’s designed for, and the wavetable itself (which is allocated dynamically):

typedef struct {
    double topFreq;
    int waveTableLen;
    float *waveTable;
} waveTable;

Why does each wavetable have its own length variable? Because this oscillator doesn’t require a constant size. For example, for a memory-sensitive environment, you could use longer tables for low frequencies in order to accommodate more harmonics, and shorter ones for higher frequency tables. For constant oversampling, you could cut the table size in half (along with the number of harmonics) for each higher octave.

Functions

Now for the oscillator functions. Besides the constructor—which simply initializes the phase accumulator, phase increment, phase offset, and the empty wavetable list—and the destructor, we have this function to initialize the oscillator:

void addWaveTable(int len, float *waveTableIn, double topFreq);

The addWaveTable function specifies the length of a wavetable to add to the oscillator, its contents, and the top frequency (phase increment) it’s designed for. Call it once for each wavetable you need to add—from lowest frequency tables to highest.

We have these functions for setting frequency (normalized to the sample rate—it’s the phase increment) and the phase offset (for setting pulse width, and perhaps other uses in the future):

void setFrequency(double inc);
void setPhaseOffset(double offset);

Here is the function to update the oscillator phase by adding adding the phase increment to the phase accumulator—call this once per sample:

void updatePhase(void);

Finally, the routines that retrieve the current oscillator output. Use getOutput, normally. But for variable pulse width, initialize the oscillator for a sawtooth oscillator, and get the oscillator output via getOutputMinusOffset, where the pulse width is set with setPhaseOffset, with a value between 0.0 and 1.0 corresponding to the pulse width:

float getOutput(void);
float getOutputMinusOffset(void);

The code

Here’s the wavetable oscillator source code, in a .zip file—WaveTableOsc.cpp and WaveTableOsc.h, along with a file, main.c, that gives a few oscillator examples and generates .wav file output:

WaveTableOsc source code

More comments and output examples coming soon.

Posted in Digital Audio, Oscillators, Source Code, Wavetable Oscillators | 15 Comments

A wavetable oscillator—Part 3

In Part 2, we looked at how to use subtables designed to cover one octave each, removing upper harmonics for each higher octave to avoid aliasing. Here’s a view of the nine wavetables for our sawtooth oscillator, with the starting frequency and number of harmonics for each octave:

The table size for each subtable is 2048 samples of single precision floating point. We’re using a double precision floating point phase accumulator. Here’s the result, with an exponential sweep from 20 Hz to 20 kHz:

Sawtooth sweep 20Hz-20kHz, 20s

That sounds…good! Some observations:

  • The wave starts out a little short of harmonics—we knew it would, because we built our lowest table for 40 Hz and up.
  • You probably can’t hear the very end of the sweep, because the frequency is too high.
  • If you play it back on a good audio system and listen carefully, you can hear at least two “ticks” in the audio, nearing the end of the sweep, about 2 seconds apart.

The ticks are due to the abrupt change in energy as we switch to the next table with fewer harmonics. It’s much easier to notice it on in the high range, since the relative power lost is larger (stronger harmonics are dropped). This might be especially noticeable with a strong vibrato that spans tables on a very high note. You can see the transitions in our sweep:

It would be easy to fix—with a higher sample rate, or by crossfading the switch. Or, you can ask yourself how often you’ve ever played sustained notes that high…with strong vibrato…soloed in a mix…with the synth filter missing. The fact is that all synthesizer oscillators make tradeoffs such as this. Either way, we’ll carry on and keep this oscillator simple and easy to understand.

Other waveforms

So far, we’re building the waves computationally. We could sample a single cycle of anything, resample it into a wavetable length of our choice, then use an FFT to get the spectrum, and modify the spectrum as needed for our additional octave tables. This is where the wavetable oscillator has a huge advantage over analog oscillators—any combination of harmonics is possible. For now, we’ll just deal with completing the classic synthesizer waveforms. For most analog synthesizers, that means a sawtooth and a variable-width pulse (rectangle) wave.

The Minimoog had just three choices of pulse width, but most synthesizers have voltage control of pulse width for pulse width modulation (PWM) effects. At first it might seem that we’d need to generate many variations of pulses, but fortunately we can create the same thing by subtracting one sawtooth from another at the same frequency but different phase. Then we get our pulse width modulation by controlling the phase offset. Here’s a square wave (50% pulse) generated using this method:

Square wave sweep 20Hz-20kHz, 20s

The triangle wave was also found in the Minimoog, and is routine for modular analog synths, but was sacrificed in most classic non-modular analog synths because it’s not used as often as sawtooth and pulse. Compared with a sawtooth, a triangle has only odd harmonics, and the harmonics fall off as the inverse square of the harmonics number (and they alternate in sign as well), giving it a hollow sound like the square wave (also odd harmonics), but much more mellow sounding due to the fast rolloff of harmonic energy.

Triangle wave sweep 20Hz-20kHz, 20s

We’ll look more closely at how the oscillator is implemented, as well as give additional observations about wavetable oscillators, in upcoming articles.

Posted in Digital Audio, Oscillators, Wavetable Oscillators | 6 Comments

A wavetable oscillator—Part 2

From Part 1, we have an oscillator. But we need to broaden it to allow scaling of harmonic content based on pitch so that we have all the harmonic content we need at the low frequency end, and, as we move up, eliminate those harmonics that would be above the the range of hearing and mirror (alias) back into the audible range.

How many harmonics do we need?

To start, we pick a lowest wavetable frequency. Let’s say 40 Hz (it could be 80 Hz, it could be 20 Hz—it could be 1 Hz). The highest harmonic that fits under half our sample rate is 551 (22050 / 40 is 551.25). So, we could make a sawtooth table of 551 harmonics. The top harmonic would be only 10 Hz from the aliasing threshold—Nyquist. If we shift that wavetable up one octave to 80 Hz, the top harmonic would double—which means it would alias back down to 20 Hz. And the top two hundred harmonics would be below 16 kHz…Ouch! That means as we glide the wavetable frequency from 40 Hz to 80 Hz, we’ll be inundated with harmonics aliasing downward throughout the audio range.

Of course, we could go the other direction, using that table from 40 Hz downward (and each higher table handling one octave downward similarly). But that means that as we shift down an octave, all of those harmonics at the top of the audio range will shift down to half the audio range—our highest harmonics would now top out at about 11 kHz before the next table takes over and restores harmonics to the top of the audio range. We’ll hear this shortcoming.

OK, we can cheat some and go for the middle ground. Most people don’t really hear to 20 kHz—adults are lucky if they can hear 14 kHz. If we go for 368 harmonics, that puts the top harmonic at 14.72 kHz for 40 Hz, and the alias would fold back to 14.66 kHz (44100 – 14720 x 2). I think this will work fine for me—I did a quick and dirty check a while back and could hear a 14 kHz tone, but not 15 kHz. The rest of you are on your own. Just kidding (and it ends up my hearing’s not that bad anyway—more later, in the “end notes” installment). Instead of requiring each subtable to cover an octave, we could use twice as many and have them cover half of an octave. Or an even smaller increment. Or we could just oversample the oscillator and expand our shallow frequency headroom greatly, from a few kHz to tens of kHz.

I’ll continue to develop this oscillator in octaves in order to keep the explanation simple (and because I’m going for “good” quality, to leave you to decide if you can sacrifice some quality for performance or memory, or whether you want to improve quality), but the extension to use more closely-spaces subtables is trivial. Oversampling 2x is also easy, but will complicate the explanation—you’re on your own if you want to go that way, but it’s a good learning exercise.

Wavetable size

First, let’s back up and figure out how long our tables need to be. Recalling that we need to sample a signal at greater than twice the highest frequency component, that means that for 368 harmonics, we need at least 368 x 2 + 1 samples, or a table length of at least 737 samples. But also remember that we’ll be using linear interpolation, so we need to oversample for good results. That means at least 2x, for 1474 samples.

We have good reasons to choose a table length that’s a power of two, however. There are optimizations we can do that take advantage of binary arithmetic, such as zero-cost wrapping of the table index for fixed-point indices, for instance. Plus, we get a huge boost in building these tables if we use the FFT, compared to summing sine waves individually. Also, we’ll definitely want to use the FFT if we let users define their own waves—and the FFT is best suited for tables whose lengths are a power of two. From 1474, our next highest power of two is 2048, which gives us a bit more oversampling as well. Memory’s cheap, so no problem going bigger, but let’s see where this gets us.

The bottom subtable will be used for that first octave and all pitches below. The second subtable takes over from 80 Hz to 160 Hz, and so on, as frequency doubles (and pitch goes up one octave). We use a suitable, progressively smaller range of harmonics for each table as we go up—dropping the upper half of our harmonics for each higher octave. The top table will be a sine wave, always, so we can just add tables until the last one has one harmonic.

Next: let’s hear the results in Part 3

Posted in Aliasing, Digital Audio, Oscillators, Wavetable Oscillators | 14 Comments

A wavetable oscillator—Part 1

There are many ways to make an oscillator. Without looking for further motivation, I’ll propose a wavetable oscillator. Wavetables are a fairly obvious extension of the general playback of digital audio. Such oscillators are easy to understand, and their extreme flexible make them a very popular choice among synthesizer oscillators.

Making a tone from a wavetable

If we start with one cycle of a waveform, we can output the samples one after another, at the sample rate, and repeat the process after getting to the end of the table. Here’s a sine wave, the most “fundamental” (if boring) of all waveforms, at about 440 Hz (“concert A”); I say “about” because a single cycle at 44.1 k Hz would be 100.227 (44100 / 440) samples, so we round to a whole number to give us 100 samples (441 Hz):

Sine wave, 441 Hz

OK, that gives us one tone at excellent quality. We could have a separate wavetable for each note that we want to play, but that wouldn’t let us vary the pitch by arbitrary amounts, such as for vibrato or frequency sweeps. Also, we’d continue to run into the problem of our table needing to be an integer length—a problem that becomes extreme at high frequencies (due to shorter tables).

Arbitrary pitch from a wavetable

A first guess at a solution might be to mimic changing the playback rate on a tape recorder by varying the sample rate. This works quite well, and has the advantage that any errors in the waveform (due to a short wavetable or short sample size) remain harmonically related, so they’re heard as harmonic distortion instead of noise (hiss), and aliasing is not a threat because we’re raising the sample rate to play higher notes. This technique was used in some early digital synthesizers (such as the Fairlight CMI), but has a glaring weakness: we can’t multiplex this type of oscillator—each independent voice requires its own variably-clocked DAC.

However, we can do the equivalent shifting with DSP by using sample rate conversion techniques (sometimes called resampling). Because we can resample a wavetable at multiple rates with DSP, we can generate multiple tones digitally, before sending them to a common DAC.

Basically, we simulate the different clocking rates by using a fixed clock and scanning through the wavetable at a different rate. (We’ll use the CD sample rate of 44.1 kHz for our tests—higher rates are easier, since they give more audio headroom before aliasing, making 44.1 kHz a good test as we develop the algorithm.) So, instead of outputting the first sample, then the second—stepping through the table with an increment of 1—we can step through with a smaller increment for lower pitches and a larger increment for higher pitches.

But we need to decide which sample to use for a fractional offset. The simplest choice is to truncate the fractional index by taking only the integer portion. This has been done in popular instruments of the past, especially early samplers. I’ll cut straight to a recommendation and suggest that we use linear interpolation—it’s not much more computationally expensive, and gives us enough improvement in sound quality to be worth the effort. We’ll discuss linear interpolation more later.

Here’s our sine wave at a different pitch, using a fractional increment and linear interpolation—middle C (261.626 Hz):

Sine wave, 261.626 Hz

That sounds promising—and note that fractional increments let us get our target pitch exactly this time. To prove it works at any pitch we’re interested in, here’s the same technique used with an exponential sweep, changing the wavetable index increment at each cycle to cover a wide range:

Sine sweep, 20-20,000 Hz

That sounds like we may have solved the problem of creating any pitch we want from a wavetable! But let’s test further…

A classic synthesizer waveform

Sine waves are boring. A staple for classic synthesizers is the sawtooth wave—a harmonically rich waveform that’s excellent for subtractive synthesis. A single cycle of a sawtooth climbs in value from the lowest value to highest, then resetting instantly to the lowest. We can’t generate a perfect sawtooth in a wavetable (we can’t reset instantly, for instance—we must wait till the next sample output), so the proper thing to do is to build a band-limited sawtooth from sine wave harmonics. A band-limited sawtooth wave is made up of a sine wave for the first harmonic, a sine at twice the frequency but half the amplitude for the second harmonic, a sine three times the frequency but a third of the amplitude, etc. Because our wavetable is 100 samples, we can fit a sawtooth of 49 harmonics.

Here’s what a sawtooth of 441 Hz sounds like:

Sawtooth wave, 441 Hz

And here we sweep the range as before:

Sawtooth sweep, 20-20,000 Hz

Yikes! At the higher pitches, the tone is too harmonic-rich, causing strong aliasing. Also, notice that at the lower pitches the sawtooth starts out sounding a bit dull, lacking the highest harmonics.

Clearly, we do not have a suitable wavetable oscillator yet. We need the ability to scale the harmonic content to the pitch required—starting with a wavetable tailored to the lowest pitch we want to produce, and reducing harmonic content as we move up in pitch before it has a chance to fold back as aliasing at half the sample rate.

Next: working on a solution in Part 2

Posted in Digital Audio, Oscillators, Sample Rate Conversion, Wavetable Oscillators | 2 Comments

A wavetable oscillator—Introduction

Years ago, Cristoph Kemper told me how the Access Virus came to be. He had coded a filter on a DSP and wanted to test it. Of course he needed an oscillator, so he coded that…pretty soon he had a synthesizer…

Oscillators are an important DSP topic, especially for the musically inclined. Plus…we may want to look more at filters, and need a flexible signal source.

There are many ways to implement oscillators. Yamaha’s DX7 was a huge early digital synthesizer success using FM (frequency modulation of digital sine waves). There was no classic synthesizer filter in the DX7, because it built the harmonic complexity up from sine wave modulation, unlike the classic subtractive synthesis technique of using filters to change the tone of harmonically complex oscillators.

We’re going to look at emulating classic analog synthesizer waveforms. In all their simplicity, it’s hard to not like the sound.

The simplicity comes from simple circuits. Analog synthesizers usually generate a sawtooth by charging a capacitor with a constant current source. The capacitor fills until it hits a trigger point that discharges it, and the process repeats. Increasing the current makes it fill faster, and decreasing it makes it fill slower, controlling the frequency of the sawtooth oscillator. Much of the work is in implemented exponential voltage control, so that a linear voltage change (1 volt per octave is standard) creates an exponential frequency change (which sounds linear to us), and in particular working around temperature terms in the circuit to keep it stable and in tune.

It’s easy enough for us to use a counter (an accumulator) to take the place of the capacitor, but we have a problem in that we can’t “discharge” (reset) it exactly when we want—only on sample boundaries. This results in aliasing—I won’t go into that further now, but the point is that it won’t give us adequate results without a very high sample rate. We need to find another way to emulate a classic analog oscillator.

Next: we’ll look at implementing a wavetable oscillator in Part 1

Posted in Digital Audio, Oscillators, Wavetable Oscillators | Leave a comment

Convolution—in words

Convolution is a convoluted topic—and that’s what it means (convoluted, from Merriam-Webster : “Extremely complex and difficult to follow. Intricately folded, twisted, or coiled.”).

Really, it’s more difficult to explain why you would want to use convolution than it is to explain the mathematical function itself. I wrote a more technical article nearly a year ago, and it went unpublished because I didn’t have time to write the interactive and animated graphs that I wanted to accompany it. Revisiting the topic, I decided it was better to explain it in words from an intuitive point of view, followed by an article on the mathematical implementation later, and audio examples.

Hello!

I hope that most people are familiar—from either personal experience, or maybe a cartoon—with the effect of an echo off a distant canyon wall. You shout, and moments later you hear your shout repeated back to you, though not as loud (the original in red, the quieter echo in blue):

If we gave an impulse—perhaps firing a starter piston—we’d hear a response that has the same spacing and amplitude:

Note that we don’t need to go to that canyon to get the same results in a recording studio—we could mix together a shout of “Hello!” with an attenuated and delayed copy of it. Our impulse response tells us, precisely, how much to attenuate and to delay the copy.

Now, consider what happens when you continue to shout instead of pausing to hear the reflection:

A nearby listener would hear the original speech, starting at the beginning (the first pop), and a delayed, quieter copy starting at the time of the second pop. The two speeches would be jumbled together.

Now consider being inside of an empty gymnasium, where you hear not just one discrete echo, but many, including echoes of echoes as the sound bounces between walls. We could get an impulse response of the gym with a starter pistol, and it too would tell us where to overlap copies—of whatever speech or sound might go on in the gym—and their relative volumes.

As you can imagine, piecing together a facsimile from a signal (speech, music…) and the room’s known impulse response gets more complicated (convoluted!) as the impulse response has more features. In the digital realm, our “features” are individual samples, so the complexity is determined by how many samples there are in the impulse response—the longer the impulse response, the greater the number of computations required to scale and add in “copies”. You won’t want to simulate the results of a reverberant room manually like we did with the single-echo example. Fortunately, we can do much better—we can compute the results exactly, given an exact impulse response. We say that we convolve the signal with an impulse response—the process is called convolution (just like we multiply two numbers in a process called multiplication).

More on getting the impulse response

There are many ways to generate an impulse. Have you ever gone into a near empty gym or warehouse and clapped your hands together sharply once, to hear the “sound” of the room? You were analyzing its impulse response. Popping a balloon is another way. A perfect impulse has equal amounts of all frequencies—like white noise condensed into a spike. It’s impossible to attain this ideal impulse, but we can get close enough to handle the audio band. Often, however, impulse responses of large rooms are taken by sweeping a sine wave through the audio band—a “chirp”—because it’s easier to get a more accurate result, and better signal to noise ratio, than trying to make a loud impulse that’s practically ideal. In essence, a chirp is an impulse spread over time.

In the digital realm, and impulse can be readily approximated by its band-limit version—a single unit sample in the midst of zero samples. To get the impulse response of a digital filter, for instance, run this single-sample impulse through the filter—the impulse response is its output. For an FIR filter, the impulse response is equal to its coefficients (because, conversely, standard FIR filters are normally implemented by convolution).

And, of course, we can compute an impulse response instead of measuring it. We do this routinely for FIR filters. And to combine two serial FIR filters into one, just convolve their impulse responses (which is to say, their taps). We could calculate the response of an imagined room as well, for use as a reverb effect.

Using convolution for audio effects

For changing signals such as music, longer delays have less correlation, and sound like echos, while shorter delays cause more frequency cancellation and sound like filtering. This allows us a wide range of tonal and spacial effects for audio via convolution.

And while I use the term “impulse response” throughout this article, there’s nothing stopping you from convolving any two sounds, including instruments—a trumpet note convolved with a bowed cymbal, for example.

Limitations

Convolution is a useful tool for reproducing linear, time-invariant effects.

Linear means that the output simply scales with the input at a constant ratio. An identical input signal half as loud, produces the same output half as loud. Examples of linear effects are typical fixed filters and echos. A distortion pedal is non-linear—playing louder creates not just a louder version of the same sound, but a different sound.

Time-invariant means that the impulse response doesn’t change over time. If you input a signal to a time-invariant system right now, the output will sound the same as doing it five minutes from now—nothing changes except the five minutes. A flanger is not time-invariant. Playing right now, your signal might start at the top of the sweep, while playing at an arbitrary time later it might start mid-sweep or at the bottom.

Convolution is not convenient for time-varying effects, as they would require that the impulse response change constantly. You could do this—cycle through changing, possibly interpolated impulse responses, but that’s not a practical solution for most effects.

Likewise, convolution for non-linear effects would require a different impulse response for different instantaneous levels at the input. To be fully general, that would be for every possible input level (65,636 for 16-bit resolution), though more practically most effects could be done by using much fewer levels and interpolation, because good-sounding audio processes are not completely random—the saturation level of a distortion effect rolls on gradually and monotonically, it doesn’t jump all over the place. Still, convolution loses much of its appeal for non-linear effects, because most non-linear effects can be done more simply other ways.

Convolution reverb

Even though convolution has been used in filtering since the dawn of digital audio, most musicians are aware of the term from convolution reverb. Convolution reverb is a boon for giving people access to “realistic” acoustic spaces, but it shares all of the limitations, and more, with algorithmic reverb. It’s an exaggeration to say that it puts your instruments in a real space—more like it puts your instruments through a speaker (or speakers) in a physical space, and gives you the sound mic’d at a point in that space (with multiple mics for multiple impulse responses for stereo and other multi-channel effects processing). And you lose the chance of capturing non-linearities and time variations, which may play a part in some spaces.

Want the effect of a sound coming from within a closed cardboard box? Generate an impulse inside the box, and capture it outside the box. Need the effect of someone shouting for help from inside a storm drain for a movie without making the actor climb into the storm drain? Maybe you can lower a sound generator into a storm drain and mic it from the outside, then convolve the actor’s voice with that impulse response—and no one needs to get dirty. The sound of a telephone or other small speaker? Wire an electrical impulse directly, and mic the output to get the response.

The web has many pre-made impulse responses, so we can use spaces that we don’t have access to. Play your pipe organ sample via the room response of a large cathedral—or play your guitar through a classic spring reverb, played through an antique radio…

Posted in Convolution, Digital Audio, FIR Filters, Impulse Response, Reverb | Leave a comment

Biquad formulas

For fixed filters, we can plug biquad coefficients into our programs. But often, we need to calculate them on the fly, to user settings or changes in sample rate. As a companion to the biquad calculator, here are the formulas used, in JavaScript; refer to our article on biquads for implementation diagrams:

function calcBiquad(type, Fc, Fs, Q, peakGain) {
    var a0,a1,a2,b1,b2,norm;
    
    var V = Math.pow(10, Math.abs(peakGain) / 20);
    var K = Math.tan(Math.PI * Fc / Fs);
    switch (type) {
        case "lowpass":
            norm = 1 / (1 + K / Q + K * K);
            a0 = K * K * norm;
            a1 = 2 * a0;
            a2 = a0;
            b1 = 2 * (K * K - 1) * norm;
            b2 = (1 - K / Q + K * K) * norm;
            break;
        
        case "highpass":
            norm = 1 / (1 + K / Q + K * K);
            a0 = 1 * norm;
            a1 = -2 * a0;
            a2 = a0;
            b1 = 2 * (K * K - 1) * norm;
            b2 = (1 - K / Q + K * K) * norm;
            break;
        
        case "bandpass":
            norm = 1 / (1 + K / Q + K * K);
            a0 = K / Q * norm;
            a1 = 0;
            a2 = -a0;
            b1 = 2 * (K * K - 1) * norm;
            b2 = (1 - K / Q + K * K) * norm;
            break;
        
        case "notch":
            norm = 1 / (1 + K / Q + K * K);
            a0 = (1 + K * K) * norm;
            a1 = 2 * (K * K - 1) * norm;
            a2 = a0;
            b1 = a1;
            b2 = (1 - K / Q + K * K) * norm;
            break;
        
        case "peak":
            if (peakGain >= 0) {    // boost
                norm = 1 / (1 + 1/Q * K + K * K);
                a0 = (1 + V/Q * K + K * K) * norm;
                a1 = 2 * (K * K - 1) * norm;
                a2 = (1 - V/Q * K + K * K) * norm;
                b1 = a1;
                b2 = (1 - 1/Q * K + K * K) * norm;
            }
            else {    // cut
                norm = 1 / (1 + V/Q * K + K * K);
                a0 = (1 + 1/Q * K + K * K) * norm;
                a1 = 2 * (K * K - 1) * norm;
                a2 = (1 - 1/Q * K + K * K) * norm;
                b1 = a1;
                b2 = (1 - V/Q * K + K * K) * norm;
            }
            break;
        case "lowShelf":
            if (peakGain >= 0) {    // boost
                norm = 1 / (1 + Math.SQRT2 * K + K * K);
                a0 = (1 + Math.sqrt(2*V) * K + V * K * K) * norm;
                a1 = 2 * (V * K * K - 1) * norm;
                a2 = (1 - Math.sqrt(2*V) * K + V * K * K) * norm;
                b1 = 2 * (K * K - 1) * norm;
                b2 = (1 - Math.SQRT2 * K + K * K) * norm;
            }
            else {    // cut
                norm = 1 / (1 + Math.sqrt(2*V) * K + V * K * K);
                a0 = (1 + Math.SQRT2 * K + K * K) * norm;
                a1 = 2 * (K * K - 1) * norm;
                a2 = (1 - Math.SQRT2 * K + K * K) * norm;
                b1 = 2 * (V * K * K - 1) * norm;
                b2 = (1 - Math.sqrt(2*V) * K + V * K * K) * norm;
            }
            break;
        case "highShelf":
            if (peakGain >= 0) {    // boost
                norm = 1 / (1 + Math.SQRT2 * K + K * K);
                a0 = (V + Math.sqrt(2*V) * K + K * K) * norm;
                a1 = 2 * (K * K - V) * norm;
                a2 = (V - Math.sqrt(2*V) * K + K * K) * norm;
                b1 = 2 * (K * K - 1) * norm;
                b2 = (1 - Math.SQRT2 * K + K * K) * norm;
            }
            else {    // cut
                norm = 1 / (V + Math.sqrt(2*V) * K + K * K);
                a0 = (1 + Math.SQRT2 * K + K * K) * norm;
                a1 = 2 * (K * K - 1) * norm;
                a2 = (1 - Math.SQRT2 * K + K * K) * norm;
                b1 = 2 * (K * K - V) * norm;
                b2 = (V - Math.sqrt(2*V) * K + K * K) * norm;
            }
            break;
    }

    return [ a0, a1, a2, b1, b2 ];
}
Posted in Biquads, Digital Audio, Filters, IIR Filters | 40 Comments

A biquad calculator

Type:

Plot:
Sample rate (Hz)
Fc (Hz)
Q
Gain (dB)

Something useful: a biquad filter coefficient calculator. Set the filter Type, the Sample rate (or 1.0 for “normalized” frequency), and cutoff of center frequency Fc. You can adjust Q for lowpass, highpass, bandpass, notch, and peak filters (use 0.7071–which is 1 divided by the square root of 2–for Butterworth lowpass and highpass), and Gain for peak and shelving filters. View the frequency response with a linear or log scale. You can copy and past the resulting coefficients into your programs.

More info on biquads in previous posts, here and here.

Note that there is an improved version of this calculator here.

Posted in Biquads, Digital Audio, Filters, IIR Filters, Widgets | 36 Comments

Sample rate conversion: down

In doubling the sample rate, we inserted zeros between existing samples, then used a lowpass filter to remove the resulting alias in the audio band. To resample at half the current rate, we use a lowpass filter to remove audio above half of the new audio band, which would otherwise alias, then remove every other sample.

Here’s an example: First, I created a linear sweep of a sine wave, from 0 Hz to 44 kHz over 4 seconds, at a sample rate of 88.2 kHz. This audio clip is the result of downsampling by discarding every other sample without filtering first, and playing at a sample rate of 44.1 kHz; note how the sweep folds back and heads downward in the second half—aliasing:

Sine sweep, downsampled without filtering

This time, we do it right; the clip is filtered first, removing everything above 22 kHz, then every other sample is discarded, and played back at 44.1 kHz:

Sine sweep, downsampled after filtering

The filter was created with our windowed-sinc calculator, set to remove signal above 0.25 of the 88.2 kHz sample rate. Since we intend to drop every other sample after filtering, we can step iterations of the filter by two, and calculate only the samples we’ll keep. We use each of the coefficients on consecutive samples of the signal, but advance the input index by two instead of one before calculating the next output sample.

The practice of dropping samples is called “decimation”. The term means to remove one in every ten—a gruesome practice employed by the Roman army. In signal processing, it’s used to mean the removal of an arbitrary number of samples in a group. For instance, here we are decimating by a factor of two when we drop every other sample. Decimating by a factor of four would mean retaining one sample, dropping three, and repeating the process for all samples.

Note: The above sound examples are mp3 format, for improved browser compatibility. If you want to take a look at the files in a sound editor, you’ll get a more accurate view with the original uncompressed files (right-click and “save as”): non-filtered, non-filtered.

Posted in Aliasing, Digital Audio, Filters, Sample Rate Conversion | Leave a comment