WaveTableOsc optimized

The wave table oscillator developed here in 2012 is pretty lightweight, but I never took a close look at optimization at the time. An efficient design is the number one optimization, and it already had that. I was curious how much minor code tweaks would help.

Free wrap

For instance, to avoid added complexity in the original article, I didn’t employ a common trick of extending the wave table an extra sample in order to avoid the need for a wrap-around test in the linear interpolation. A rewrite was a good opportunity to add it, and check on the improvement. The code change was trivial—add one to the allocation size for each table, and duplicate the first sample to the last when filling them, in AddWaveTable. Then remove the bounds check in the linear interpolation, saving a test and branch for every sample generated.

// old
float samp0 = waveTable->waveTable[intPart];
if (++intPart >= waveTable->waveTableLen)
    intPart = 0;
float samp1 = waveTable->waveTable[intPart];

// new
float samp0 = waveTable->waveTable[intPart];
float samp1 = waveTable->waveTable[intPart + 1];

Factor

And, I spotted another optimization opportunity. The oscillator needs to select the appropriate wavetable each time it calculates the next sample (GetOutput), but the choice of wavetable only changes when the frequency changes. So, the selection should be moved to SetFrequency. In general use, it should make no difference—an analog-like synth would usually have continuous modulation of the oscillator, for instance. But moving the calculation to SetFrequency would be a substantial win for steady tones.

I’ll go through the changes and their contributions, but a little explanation first. Unlike most of my DSP classes that provide a new sample on each Process call, the oscillator requires two calls—GetOutput and UpdatePhase—every time. It was just a design choice, I could have also provided a Process call that does both. In addition, SetFrequency might also be called every time, if the modulation is constant. For a steady tone, we’d call SetFrequency once, then the other two calls repeatedly for each sample. For a sweep, all three would be called for each sample. Of course I could trivially add a Process call that couples the existing UpdatePhase and GetOutput.

The wave table selection code iterates from lowest to highest, until it finds the right table, so it’s slower for higher frequencies. So, we’d expect an improvement in moving that code to SetFrequency for the steady tone case, and that the amount of improvement would be greater for high tone settings. We’d expect the case of a sweep to give no change in performance, since it still requires the wave table selection on each sample.

// old
inline void WaveTableOsc::setFrequency(double inc) {
  phaseInc = inc;
}

// new: the added code was simply repositioned from GetOutput
void SetFrequency(double inc) {
  mPhaseInc = inc;

  // update the current wave table selector
  int curWaveTable = 0;
  while ((mPhaseInc >= mWaveTables[curWaveTable].topFreq) && (curWaveTable < (mNumWaveTables - 1))) {
    ++curWaveTable;
  }
  mCurWaveTable = curWaveTable;
}

After moving the table selection from GetOutput, implementing the “one extra sample” optimization gave about a 20% improvement to the steady-tone case (SetFrequency not included) on its own. Unexpectedly, changing the variable temp from double to float gave a similar individual improvement. For steady tones, these changes together yielded a 30% improvement to the loop (SetFrequency not included), and 60% for high tones.

At first test, the sweep case did not see the improvement, as the biggest change simply moved the calculation between the two routines called on each sample. I was disappointed, because I expected at least a small improvement from the other changes, but instruction pipelines can be fickle. The minor tweak of using a temporary variable to calculate mCurWaveTable changed that—yielding a a 24% improvement. This is in a loop that sweeps a sawtooth exponentially through the audible range (SetFrequency, GetOutput, UpdatePhase, calculate the next frequency with a multiply, loop overhead). Basically, all the same code is executed, but apparently it worked out better for the processor. So, a great improvement overall.

Another goal of the change was to put everything in a single header file—because I like that. GetOutput was formerly not inline, so there was a small improvement moving it. I also changed the case of the function names (setFrequency to SetFrequency) to align with the way I currently write code, so it’s not a drop-in replacement for the old version.

I tried one other improvement. The wave table selection “while” loop takes longer, of course, as frequency moves to higher tables. I tried to special-case octave tables by using clever manipulation of a floating point number, essentially getting a cheap log conversion. The advantage would be a single calculation for any frequency, instead of an iterative search. It ended up not being worth it for the case of octave tables, the iterative solution was more efficient that I expected, and more importantly it’s far more general.

Using picobench to benchmark, the new code is about 25% faster for the case of a 20-20kHz sawtooth sweep. Not a huge change, because the code was already very efficient. But as part of single-header-file rewrite without increasing complexity, that’s a pretty good bonus.

WaveTableOsc.h

6 Responses to WaveTableOsc optimized

DKDiveDude says:

January 4, 2020 at 2:26 pm

How about doing the wavetable frequency lookup, similar to a wavetable done in advance, then no calculations needed during retrieval of the sample, only one float to int to get an array pointer. Now this would require an int array of every frequency, so about 20 thousand, but memory is cheap. Here is what I was thinking.

waveTableFreqLookupArray[20000];

Then populate each with what wave table to use for a particular frequency.
Populate from 0Hz to 19Hz, even though they may not be needed too so no addition is needed at all.
Example;
waveTableFreqLookupArray[20] = 1; // Wave table 1
waveTableFreqLookupArray[21] = 1; // Wave table 1
…
waveTableFreqLookupArray[40] = 2; // Wave table 2
etc.

- Nigel Redmon says:
  
  January 5, 2020 at 10:18 am
  
  Or, with a little more computation, do an exponential conversion and go straight to the appropriate table. For the special case of octave tables that info is already available in the floating point frequency’s exponent and just needs a little bit manipulation. But it ended up being slower than the way I’m doing it now, and less flexible.
  
  On modern processors you really have to code and measure how long the process takes. A cast seems like a trivial idea, but it takes processor time, and giant tables won’t be in the cache.
  
  - Claudio says:
    
    September 8, 2020 at 7:59 am
    
    Or, storing all topFreq (which are created already sorted and not randomly) variables inside a std::vector and then use std::lower_bounds, which applies binary search algorithm
    For a vector of 20 floats, you only need at maximum 5 evalutions
    
  - markzzz says:
    
    April 27, 2021 at 9:40 am
    
    Not sure if this will work for every sampleRate. On original code it scales automatically on factor x samplerate (i.e. 441 at 44100 phaseInc is 0.01, at 88200 is 0.005, which will reach topFreq “later”). If you match freqs array this way you would need a waveTableFreqLookupArray for each sampleRate…
    
markzzz says:

April 11, 2021 at 2:54 am

Another improvement would be phasor -= floor(phasor), so you can skip the if/branch on updatePhase() function.

What do you think about this trick?

- Nigel Redmon says:
  
  April 11, 2021 at 9:27 am
  
  Historically, conversion from float to int is relatively slow. I remember this being a big processing hit 20 years ago, one of the worst simple instructions common to DSP, I’d avoid it whenever possible. Here you’d be saying “convert this double to int, then the result value back to double, and subtract it from the original double”. I imagine it’s better now, but at the same time, trivial branches in modern processors are highly optimized. You could give it a try though.

Free wrap

Factor

6 Responses to WaveTableOsc optimized

Leave a Reply Cancel reply

Recent Posts

Archives

Categories