I wrote the wavetable oscillator code from scratch as I wrote this series of articles, using just the thought process I gave. Hence, no references to other articles on wavetable oscillators. The concept of a phase accumulator is fairly obvious; one of the first books I recall reading about it was Musical Applications of Microprocessors, by Hal Chamberlin, about 30 years ago (I have an autographed first edition), though the book didn’t explore the concept of multiple tables per oscillator to control aliasing.
These end notes are an opportunity to touch on various aspects of the oscillator.
First, a few examples. Here’s a pulse-width modulation example, using a sine-wave instance of an oscillator to modulate the pulse width of another oscillator:
PWM 110 Hz mod 0.3 Hz sine, 4s
Three detuned and summed sawtooth oscillators:
Saw times 3 detuned 55 Hz, 4s
But we aren’t limited to computing our wavetables. Here, I recorded myself singing “aahh…” (originally, 110 Hz, the second A below middle C). I cut one cycle of that in an audio editor and stretched (resampled) it to 2048 samples, and did an FFT of that. From there it was easy to create higher octaves by eliminating harmonics in the upper octave and doing an inverse FFT to make the next higher wavetable—the same process as we did with the computed waves. Here it is, swept 20 Hz to 20 kHz (but keep in mind that the typical useful range is in the lower octaves):
Ahh sweep 20-20k, 20s
Because I wanted to show the harmonic series of sine waves building a sawtooth in the earlier articles, I didn’t mention that sawtooth waves traditionally ramp up (remember the charging capacitor in the analog oscillator?). Creating this wave from sines make it ramp down—often called an inverted sawtooth. It really doesn’t make a difference in the audio range; in the sub-audio range for a control signal, you might prefer one or the other depending on the effect you’re after. But either way, it’s trivial to build the up-ramping sawtooth, and I did that in the final version just for fun. I left out the 20-40 Hz octave in the tutorial to make a point, but I’m including it here—a non-inverted sawtooth sweep, and you’ll notice that the lowest octave is brighter with a true 20 Hz table:
Saw sweep 2048 lin 20-20k (base=20Hz), 20s
More on low-end brightness
Though we don’t often listen to the audio output of an oscillator in the sub-audio frequency range, the harmonic content is apparent for the brighter waveforms such as sawtooth. Here’s what it sounds like when we sweep from audio range down to 1 Hz with our wavetables built for the audio range:
Saw sub-audio test (20-20k 2048), 20s
Note how the harmonic content drops as the ticks slow.
Because the lower the frequency, the less that aliasing is a factor (since the sample resolution for a single cycle increases the lower we go), we could switch to a more straight-forward sawtooth for sub-audio. Or, because our oscillator allows a variable table size for each subtable, we could simply place a long wavetable to handle all sub-audio. Here’s the same oscillator, but with a 32768-sample sawtooth table to handle everything below 20 Hz:
Saw sub-audio test (20-20k 2048 + 32768 ramp), 20s
My goal with the accompanying code was to make it simple and understandable. It’s pretty quick, though—the full audio sweep takes about 0.9 seconds to generate 1000 seconds of audio on my computer. And there are certainly ways to make it faster, though in general a little bit faster makes it a lot less readable, and often requires some loss of generality. For instance, we could make oscillator table selection faster by mandating octave spacing, then using a fast-log2 floating-point trick to to get to the proper table in a single step—but that would put a limitation on the oscillator, the relevant code would become unreadable to many people, and in practice the speed gain is small.
Some people might prefer a fixed-point phase accumulator. The math is fast and precise, and you can get a free wrap-around when you increment. But the double-precision floating point phase accumulator of this oscillator is extremely precise, is plenty fast, and the code is very easy to follow.
If you search the web for wavetable oscillators, you’ll find terrific educational papers and forum threads discussing various types of interpolation and their relative performance. It may leave you wondering why you should trust the lowly and relatively pathetic linear interpolation presented here. Good point.
It’s about table size. If your table is big enough, you don’t need any interpolation at all—truncation is fine. But linear interpolation is only a bit more costly, and gives a boost that’s probably worth the effort. Higher forms of interpolation require more sample points in the calculation (for instance, needing two points on each side of the point to be calculated, instead of one point on each side for linear interpolation). They are great for squeezing better quality out of a small table, but you need to ask yourself why you are using the small tables. If you’re implementing this on a modern computer, what is the point of increasing your calculations so that you can use a 512-sample table instead of a 2048-sample table—on a systems that has gigabytes of free memory?
Constant versus variable table size
If we use a constant subtable size, as each subtable moves up to cover the next octave, with half of the harmonic content, they become another factor of 2 x oversampled. The top octave will always be a sine wave, and at the 2048 samples we’re using for our subtables, it’s extremely oversampled—way more than we need or will be able to discern.
And while I said that we want to be at least 2 x oversampled at our lowest subtable, for linear interpolation to have good fidelity, at the same time it’s unlikely that we’ll hear it in the lowest octave. The reason is that low frequency wavetables for a sawtooth will be very crowded in the high harmonics—half of them are in the top octave of our hearing, spaced closely. So, we could go to 1024-sample tables for 40 Hz and up, or 2048 for 20 Hz and up (as in the “Saw sweep 2048 lin 20-20k” example above), and you probably won’t hear the difference. All of the higher tables will still be increasingly oversampled. Here’s the 1024 equivalent of the 2048 example in Part 3 of the wavetable oscillator series:
Saw sweep 1024 trunc 20-20k (base=20Hz), 20s
In a nutshell, we might suspect that it will be easier to hear the benefits of oversampling in the upper octaves, but at the same time the minimum table size to hold their required harmonic content is smaller.
We could just as easily go to a constant oversampling ratio by dropping the subtable length by a factor of two as we go to the next octave’s subtable. It’s easy enough to make all of these things variable in your code, so that you can play with tradeoffs.
Lesser interpolation experiments
Linear interpolation is referred to as “first order” interpolation. Truncation is “zero order” interpolation, implying none. So let’s hear how awful the oscillator would sound with no interpolation:
Saw sweep 2048 trunc 20-20k, 20s
Um, that’s not as bad as expected—or should we not have expected it to be bad? Remember, we’re using fixed table sizes so far, which makes the higher, more sensitive tables progressively more oversampled. And oversampling helps improve any interpolation, even none.
Let’s explore variable table sizes, to keep a constant oversampling ratio per octave. We’ll start with the worst-case of 1 x oversampling, and with truncation:
Saw sweep variable 1x trunc 20-20k, 20s
As expected, the aliasing is bad, and gets worse in the higher octaves. Here it is again, with linear interpolation:
Saw sweep variable 1x lin 20-20k, 20s
Even though 1 x oversampling is still inadequate, the improvement with linear interpolation is obvious. Let’s try again, but with 4 x oversampling; first with no interpolation (and let’s focus on the higher octaves):
Saw sweep trunc 1280-20k, constant 4x oversample, 20s
Now with linear interpolation:
Saw sweep lin 1280-20k, constant 4x oversample, 20s
You can still hear a little aliasing at the top, but 4 x is adequate for much of the range. We might consider 8 x, but a problem with this approach is that we’re wasting a lot of table space for the lower octaves, and not achieving the objective of saving memory by using variable tables. Instead of keeping a constant oversampling ratio, we can go with a variable table size, but impose a minimum length, so that just the top octaves are progressively more oversampled, where we need it most.
Here are constant 1 x oversampled tables, but with a minimum of 64 samples so that the top octaves are oversampled progressively higher; first truncated, then with linear interpolation:
Saw sweep 2048 trunc 20-20k (base=20, variable limit 64), 20s
Saw sweep 2048 lin 20-20k (base=20, variable limit 64), 20s
A couple of things we’ve learned from these exercises: The upper octaves need progressively more oversampling to sound as good as the lower octaves—at least for a waveform like the sawtooth. And linear interpolation is a noticeable improvement over truncation.
But I hope you’ve also realized that truncation is a viable alternative, as long as the table size is big enough. And the “big enough” qualification is not difficult if you’re using fixed table sizes, which yield progressively higher oversampling in higher octaves.
While variable table size is helpful in memory-resticted environments, it’s not very important in most modern computing environments. I included the ability to handle variable table sizes within an oscillator for educational experimentation, mainly.
Note: Some people are thinking…if truncation might be acceptable, why not use rounding instead? The answer is…there is no difference in noise/distortion between truncation and rounding—use whichever is most convenient (which may depend on the computer language you use or rounding mode of the processor). If you doubt that truncation and rounding are equivalent for our use, consider that rounding is equivalent to adding 0.5 and truncating; this means that rounding in our lookup table is equivalent to truncation except for a phase shift of half a sample—something we can’t hear (because there’s no reference to make it matter, but if it really bothers you, you could always create the wavetables pre-shifted by half a sample!).
Setting oscillator frequency
We set the pitch by setting the increment that gets added to our phasor for every new sample lookup. An increment of 0.5 steps half-way through the table. For a single-cycle table, that would correspond to a frequency of half the sample rate—22.05 kHz, half of our 44.1 kHz sample rate. Another way to say that is that the increment for a frequency of 22.05 kHz is 22050 / 44100, or 0.5.
So, the increment for any frequency is that frequency divided by the sample rate. For 440 Hz, it’s 440 / 44100, or 0.00997732426303855.
On the subject of frequency, one simple and useful addition to the oscillator might be the ability to adjust the phase, for quadrature and phase modulation effects.
OK, I hear better than I thought. 15k loud and clear, 16k is plenty clear too, but I could tell it wasn’t quite as loud. 17k is getting pretty tough, needing plenty of volume. But in this case I was listening to naked sine waves at pretty decent volume—I stand by my reasoning that it would be awfully tough to hear those aliased harmonics—even if we were playing ridiculously high notes with no synth filter, at high volume, and soloed. But, if you disagree, just go with closer wavetable spacing!
Thanks for sharing this series, extremely usefull for starters.
One nice trick to avoid aliasing is to store an integrated waveform. After the interpolation step, the original spectrum is restored by differentiating, amplifying mostly high frequency aliased harmonics which are not so much problematic. The higher aliased frequencies reflected in the low range are still scaled by – k * 6dB per octave (with k order integration).
Higher-Order Integrated Wavetable Synthesis
Hi, thanks a lot for this tutorial and the code. I tried implementing it in my synth and it works GREAT. Only problem is the CPU usage is way too much if I play a few chords and have two waveforms per voice. I think the problem may be with the setFrequency calculations (especially when you modulate the pitch with something). Any thoughts on how to make it less CPU hungry?
The setFrequency call is just an inline assignment, so I assume you mean your calculation of the frequency parameter that you pass to it. One aspect is that calculating frequency involves an exponential, which can be costly, but use a faster approximation, since you can satisfy the the ear with lower precision than you might need for other mathematical calculations.
For the oscillator itself, most of its computing cost is in computing the next sample (getOutput). That can be improved a bit with some tricks, but not a large amount. Probably the biggest single improvement in the big picture that is used heavily in soft synths is to do things by the buffer-full. Calculate a buffer of oscillator output, then pass it to the filter to process the whole buffer, and so on (instead of sample by sample). Here you can take better advantage of running your control calculations at a lower rate. There is no easy way to get a huge performance boost with a single change, but there are ways to get many small improvements that add up (you can make your wavetable one sample longer, and avoid the branch; you can remove the function call overhead…). And this is the key to making a soft synth that’s not CPU hungry. Notice that the most flexible soft synths are also the most CPU intensive. One reason is that you can’t do feedback between modules when calculating each module by the buffer.
Nigel, I just wanted to say thank you for this series. Your WTO is great. Nice work!
I want to use your WTO to implement Hard Sync – yes, I know this may use lots of CPU but I’m just experimenting. I have a question here:
However, I’m prepared to work to get there but have a couple of questions regarding your WTO.
I watched some nice videos showing the Real and Complex Fourier Series of a Sawtooth and the guy in the video gives:
freqWaveRe[idx] = (2.f/idx) * pow(-1, idx+1);
for the Real way to calc a sawtooth and this indeed works (though presumably this is out of phase with your simpler Real definition in sawOsc:
freqWaveRe[idx] = 1.0 / idx;
He then shows the Complex way, which is, I understand, identical to his Real way:
freqWaveIm[idx] = (i/idx) * pow(-1, idx);
My question is, what should I put for i ? I mean, I know that i is the square root of minus 1 but how would that translate to your code?
Of course, my ultimate goal is to answer my Stack Exchange question and on that topic, is it possible to use your WTO for Hard Sync in this way? I will later try to convert the Complex Fourier Series given in the question to the Real version so I can use:
freqWaveIm[idx] = 0.0;
freqWaveRe[idx] =some real expression;
Will this work?
Sorry for all the questions!
First, as you probably know, the difference between alternating sign for the harmonics and not is whether the result is a rising ramp or falling ramp. Second, about the “i”: not having the complete context, I have to assume that you would ignore it because the i is assumed, being in the imaginary part of the array, and just use 1/idx.
I’m on a pretty tight schedule, for a while now, and haven’t had time to look at the paper on hard sync. I’m interested, and even started a paper some time back on adding hard sync, but ran out of time. I hope to get back active on this blog by the start of the new year.
OK! Cool. Thanks for writing.
As I am trying to program my own software synthesizer, I am trying to wrap my head around the sampling part.
What I can’t seem to understand, is that at a sampling rate of 44100 per second, and if I want to “play” a note at around the frequency of 11025, it seems to me that I have only 4 sampling points to “describe”/make the desired sound, and if that is correct that is no where near enough to recreate complex waveforms. So what am I missing?
First, consider listening to any perfectly steady, noise-free (not a flute), harmonically pure (not struck metal, just the harmonic series) note at 11025 Hz in the real world. It will sound like a sine wave, because the next harmonic up is already at 22050 Hz and beyond hearing. So, in that regard, four samples for a sine is more than enough—we only need two (plus a little).
However, it’s true that’s not enough for arbitrary resampling, because we need to take into account aliasing. While we can play any 11k sine back fine, with resampling we might want to record a note at 5k, and pitch it up to 11k with resampling. The problem happens when the 5k is not a sine but a sawtooth. We record it digitally, and have a signal with harmonics at 5k, 10k, and 20k—anything higher is removed by the input antialiasing filter. Now, an ideal sawtooth pitched up to 11k has harmonics at 11k, 22k, 33k. We don’t have enough samples for the 33k, it will alias down to 44.1k – 33k = 11.1k. We’ll end up hearing 11k, plus 11.1k at one-third the amplitude (we won’t hear the 22k harmonic, too high).
That’s why all the work on my wavetable oscillator, so we can interpolate using tables over ranges that don’t alias (or have tolerable aliasing). It’s also why most samplers of the past limited the amount of shifting—you needed to sample two or more notes per octave.
just curious: on your words, it seems that oversampling the tables will reduce aliasing.
Isn’t this unrelated? I mean: oversampling is done because of Linear interpolation (later), which will impact on noisefloor/distortion and reduce it, rather than “aliasing”. I’m confused about this, because for my knoledge these are two differents beast 🙂
This might be tricky to answer by phone (I’m away), hope it comes out clearly. Oversampling means using more samples than the minimum needed, so it that sense all the wave tables are oversampled, because we need a little room to pitch them up. The more oversampled they are, the more they can be pitched up (we can use fewer wave tables).
But we also typically use “oversampling” to mean running at a higher sample rate than absolutely needed for our 20 kHz hearing range (96 kHz, for instance).
There is another benefit for the oscillator to run at a higher sample rate internally. Linear interpolation, is a sample rate converter with a poor lowpass filter (it has a triangular impulse response). It aliases, but fortunately it’s typically hard to hear over the intended harmonics in the oscillator. Running at a higher internal rate would help a bit on the aliasing. We could also use a better interpolator (which might require running at a higher rate).
But this isn’t an instrument grade oscillator, we only need to satisfy our ears and not a measurement specification.
Not sure I follow this point 🙂
So Nyquist was wrong? 😛
Isn’t one of the main topic in DSP be able to reproduce a digital signal at a sampling frequency double the max component of the signal? Is this unrelated with aliasing?
I mean, if max component is 368° harmonics, and max freq 20hz, isn’t enough a table of 736 samples for preserve “frequency” domain?
I always think oversampling this way will reduce noisefloor, not to reduce aliasing :O (i.e. so Nyquist doesn’t count here?)
I think I know what you’re asking, but in an interactive conversation it would be easier to be exactly sure. So, again, I’ll just make some general comments. But it will be more clear if you specifically address anything from these comments that you don’t think is right or have questions about.
If you’re playing out a wavetable by stepping through it a sample at a time, that’s one thing—the harmonics are exactly what you put in the wavetable. But a wavetable oscillator does sample rate conversion (SRC) on the fly in order to play a different pitch. Linear interpolation with non-unit step size is SRC. Two surrounding samples go in, a new sample comes out, a linear guess for the new sample location. But linear interpolation is a mediocre interpolator—it’s a weak lowpass filter. A windowed sinc interpolator would be much better, for instance, but require many more surrounding samples in the calculation. But we can help the linear interpolator by starting with a more heavily oversampled wavetable. It will have fewer high frequencies relative to its sample rate. Or, you could look at it by noting that oversampled waveforms are smoother, so closer to a straight line between samples, which is more friendly for a linear interpolator.
As far as “I always think oversampling this way will reduce noisefloor, not to reduce aliasing”, that a valid viewpoint. But in this case the noise floor (or “error”) happens to have a spectrum that is related to the frequency content—in other words, the main error is aliasing. That is, it’s not wrong to call it error, but it’s more helpful to call it aliasing. For instance, if you slide a pitch up with a lot of this error happening, is the problem that you hear similar to tape hiss? No, you’ll be hearing frequencies sliding in the opposite direction. Yes, it’s error, but more specifically it’s aliasing. Make sense?