How I write code

Next article I’ll post an update of our wave table oscillator, but first I’ll take the opportunity to discuss how I write code these days. Maybe it will help make sense of some of the choices in the code I post going forward.

I tend to build all DSP units as inlines in header files

True story: Recently, I moved an audio plug-in project I was developing on the Mac in Xcode, to Windows and Visual Studio. I was shocked to see that my source files had disappeared! There was only the main implementation cpp file (not counting the plugin framework), and my headers files. All files were backed up, of course, but it was still unsettling—what could have happened? Then it sank in—I’d written most of the good stuff in header files, so that outside of the plug-in framework, there was indeed only one cpp file—leveraging 28 header files.

The main reason my DSP functions reside in header files is that I make my basic functions inline-able for speed. In a perfectly orderly world, that still might be a header file for the smallest and most critical functions, and a companion C++ source file (.cpp) for the rest. But it’s faster to code and make changes to a single file instead of bouncing between two. And I need only include the header where I use it, instead of also pulling in and marking companion cpp files for compilation.

Further, I write “atomic” DSP components that handle basic functions, and build more complex components from these atomic functions. For instance, I have a delay line function, from which I make an allpass delay. Writing a reverb function can be very short and clear, combining these basic functions with other filter functions and feedback. And feedback is easy because all components process a sample at a time instead of blocks. Examples of my library files: OnePole.h, Biquad.h, StateVar.h, DelayLine.h, FIR.h, Noise.h, Gain.h, ADSR.h, WaveTableOsc.h.

Note that “inline” is a request to the compiler. But compilers are pretty good about honoring it. Remember, inline matters most for small functions, where function call overhead is a bigger consideration. And small functions are easiest to inline, so there’s little reason for a compiler to not comply. If you’re concerned, just list and examine the preprocessor output of a source file to see it.

By the way, the usual argument against inlines—“they lead to bloated code”—doesn’t apply much in the DSP context. These are not large functions used many places in your code. They are built for efficiency. The process routines are localized to your audio processing function, and the setting routines mostly in your plug-in’s parameter handling code.

My DSP units are designed for individual samples, not blocks of samples

Dedicated DSP chips usually process audio a sample at a time. But DSP run on a host computer’s process must handle audio a buffer at a time, to minimize the overhead of context switching. So, if you look at open source DSP libraries, you’ll see that many are written to operate on a buffer of samples.

I don’t do that—my inline functions process a single sample at a time. Of course, you can easily wrap that in a for loop, perhaps partially unrolled to minimize loop overhead. The the next process that acts on the entire buffer, then the next. Or you can string them together, one after the other, to complete your entire algorithm one sample at a time, with an outer loop to iterate through the buffer. The former might work better do the caching advantages, at the expense of more loop overhead. But it’s easier to make this choice with single-sample processes than for a library that’s entirely optimized for buffer processing.

I usually write my DSP units as templates

Mainly, I template them to handle float or double. I use double when developing, but have the option of float available. Filters are a case in which I’ll always use double. For a wavetable oscillator, I want double parameters but float wavetables. A delay line element might be float or double depending on the need. I’d rather build the choice into the DSP unit than to run into a different need and have to take time to rewrite the DSP unit or make a new version of it.

I tend to avoid posting templated stuff on my website, because it can be a distraction from what I’m trying to show.

No virtual functions

I don’t want a vtable. I’m not going to inherit from these basic DSP functions anyway, they are built to do one thing efficiently.

Minor detail: I wrap each DSP header file in a namespace. (I use namespace ESP—which stand for “EarLevel Signal Processing”.) Then I can be lazy with my class names without concern of one day having a namespace collision issue (my “Biquad” versus another “Biquad” built into the plug-in library, for instance).

This entry was posted in Source Code. Bookmark the permalink.

18 Responses to How I write code

  1. Eric Nichols says:

    Hi Ni!
    Just happened to catch your site right after you made a new post….that doesn’t happen often. 🙂
    What? You aren’t writing everything in LISP? 🙂
    Have a great week!

    • Nigel Redmon says:

      Hi Eric! At least someone reads these things…Well, LISP is not an easy choice for real time DSP, as you probably know, and even tougher for wedging into constraints of things like audio processing plug-ins. I’m fully thankful to no longer be constrained to 56k assembly language, so C++ isn’t so bad 😉 Plus, I’m also pretty happy for C++11 and later improvements, and enjoying it more these days…

  2. Miles Egan says:

    I see the advantages of doing things this way. I’ve been working on my own DSP library and one of my goals was to make the components modular, so they could in theory be linked together at run time like a software modular. I haven’t figured out a way to do this without vtables though. Any ideas?

    And yes, someone is definitely reading your articles. Thanks for taking the time to write them and share your knowledge!

  3. Bart says:

    I understand you inline for 2 reasons: (1)no function call overhead (2) you don’t need to think about which cpp files to add to your projects, as the primitives consist of inline functions only. No duplicate symbols complaints from your linker when using inline.
    I fail to get it…
    (1)this benefit is thrown completely out of the window with sample based processing, a technique comparable with driving to the shop 10 times when you want to fill your fridge with 10 items.
    (2)just throw all your cpp files into a library that you link in. The linker will happily throw out any unused code out of the executable. Yes your function prototypes are redundant, but it is a handy place to fully comment the usage of the API. Function prototypes (or classes) are a nice TOC/summary on itself.
    But i guess… probably I misunderstand the full context or goal/target of your code…

    • Nigel Redmon says:

      Don’t like your analogy—if my atomic action is equivalent to placing a can of tuna into your cart, it does not imply driving to the store each time you want to do that…

      • Bart says:

        I agree the analogy is over-the-top, I should have used more nuance, I apologize.
        In my embedded DSP world adding loops at a higher level than where the action is, has the effect of frequent pipeline flushing and memory io stalling, doing away with the benefits of DSP (multiple MACS per cycle). Driving to the store would indeed be a lot worse even.

  4. Bart says:

    PS about building upon primitives: that I fully endorse. It makes your code easy to maintain (limited areas of change) and smarter (abstraction is powerful), and enhances re-use and portability.

  5. Bart says:

    On second thought… when you rely on optimizations done mostly by the compiler itself (as opposed to hand-optimizing), inlining can make sense… The compiler should be smart enough to replace calls to inlines by their code and then use optimisation techniques like loop unrolling, vector processing (SIMD operations), re-ordering, pipeline fill enhancing, branch prediction, io serializing, etc… on the higher level code. Thinking of it even more… this could be quite powerful when using multiple layers (inline functions calling inlines), the compiler has several degrees of freedom to optimize depending on the specific use of or order of primitives.

  6. Hasan Murod says:

    I used to code with low level audio API on windows platform in the past, then VST plugin using JUCE, but recently getting more works using low cost micro controller in trying to develop cheap guitar effect pedal. Now I’m moving my codes from ARM Cortex M3 with 32-bit fix point math in C to single precision floating point math with ESP32-A1S in C++, also from sample processing block processing mode. I released my codes at github.com/hamuro80 , just wanna say thanks for sharing many useful tips and keep up a good work!

    • Nigel Redmon says:

      Thanks for sharing! As a reminder of how fast time passes, I have a STM32F4DISCOVERY board from 2012, did some experimenting with it back then…

    • robert bristow-johnson says:

      i’ve used JUCE a long time ago. in 2013.

      i really only used AudioSampleBuffer() as a framework to hold and pass audio around. i had an email discussion with Jules Storer about the problem that AudioSampleBuffer() contained essentially all of the physical data of the audio **except** for one glaring omission: the SampleRate parameter. so when we would pass audio around to various processes (usually filters, but also delay lines, and i had written a pitch detector), we had to also pass the sample rate along as another argument. that parameter really should exist in the AudioSampleBuffer().

      i didn’t use any of the JUCE utilities.

  7. robert bristow-johnson says:

    hay Nigel,

    i’m coming late to this party.

    even for low-latency, live, realtime audio processing or synth apps, processing samples in blocks (usually around 32 samples, but i have seen it as low as 4 and as many as 64) has several advantages. sure, you can take your single-sample processing and wrap it in a for loop, but there are small tasks (that can add up) that you need do only once before the loop (like loading states and coefficients) and once after the loop (like saving states).

    coefficients can be calculated from the user parameters (this is the process i like to call “coefficient cooking”) just once per block. and the corresponding process that i like to call “meter massaging” need be done once per block, also.

    all that work can be amortized over the number of samples per block and can cut down on real-time computation a lot. the DSP I/O is double-buffered so that at the beginning of a sample-block processing period (this would be once every 32 samples if the block size is 32 samples) *every* input sample is guaranteed to be good. and, because the output is also buffered, you need not guarantee the correctness of any of the output samples until the very end of the sample-block processing period.

    of course you can’t make the block too large. since it’s double-buffered, the delay is two block lengths (say, 64 samples or 1.333 ms). and if there is any feedback path, there is an inherent and implied delay of one block (32 sample delay). otherwise the execution of each primitive (inline) process should be done in the order that the signal flows (left to right).

    • Nigel Redmon says:

      Hi Robert!

      I definitely split out computations that can be done once per buffer, and also once per parameter change (which run at at the UI level), including automation. So a filter’s calculation of coefficients normally happens only when the UI or automation changes frequency/Q, and of course on a sample rate change. The buffer-processing routine from the plugin framework does anything it needs to do in prep, then there’s a for loop executing all the single-sample primitives (filter, delay, upsample, etc.) and macros (also built of primitives—reverb, etc.).

      • robert bristow-johnson says:

        but even in those single-sample primitives, you gotta move state into register, process sample, store state from register back into state memory. the loading and storing of states can be done once per block.

        so it’s more efficient to write all your primitives (what appear as modules or “blocks” on a signal flow diagram) to process blocks of samples rather than single sample. stationary and time-variant filters and delays is an example. but any process will gain some efficiency, regarding overhead, with sample block processing.

        • Nigel Redmon says:

          But using an inline function is that same as writing the same lines of code explicitly. Wrapping the “primitive” (inline function call) in a for loop is the same as wrapping the same statements in a for loop. So, if you want to lowpass the entire buffer before highpass filtering it, you still can (with each wrapped in its own for loop). Or you can lowpass a sample then highpass its result, wrapped in a single for loop. The latter actually has less loop overhead, and is far more flexible (you can go either way—built buffer-based functions, or sample-based; if your primitives are buffer-based, you’re stuck). C++ is very convenient for this, as you don’t need to load states, in the sense that you would pass variable to a function—they are part of the object either way.

          The historical advantage of buffer-based primitives is to spread the function-call hit over more samples. The shorter the function, the worse the hit. But modern compilers are very good about processing inline functions and optimizing the code they are used in.

      • robert bristow-johnson says:

        and, also, you might want to modulate the user parameter (like frequency or pitch-shift in cents), which happens *before* the coefficient cooking. like a tremelo or vibrato or just an ADSR envelope. that moving parameter is much slower than an audio waveform and need not be sampled at the same rate. but the samplerate divided by the blocklength (which is the execution rate for the block processing) can sample the user parameter once (at the beginning of the block process), cook the coefficients, and then slew the coefficients in a per-sample process in the innermost loop.

        • Nigel Redmon says:

          OK, but I don’t see how single-sample primitives hinders that. For sure, any native processing on a computer host will be done in blocks. it’s more about how the code library is organized.

          Just to make sure we’re talking about the same thing, I see it this way, block primitives versus singles; say the plugin framework supplies ProcessAudioBuffer, which your plugin overrides:

          void myPluggie::ProcessAudioBuffer(float** inputs, float** outputs, int nFrames) {
           for (int jdx; idx < myPluggie->numChans; jdx++) {
            float *in = inputs[jdx];  // simplifying, hardwire mono
            float *out = outputs[jdx];
            // any other one-per buffer...
            myPluggie->LpFilter_buf(in, out, frames);
            myPluggie->HpFilter_buf(in, out, frames);
           }
          }
          
          void myPluggie::ProcessAudioBuffer(float** inputs, float** outputs, int nFrames) {
           for (int jdx; idx < myPluggie->numChans; jdx++) {
            float *in = inputs[0];
            float *out = outputs[0];
            // any other one-per buffer...
            for (int idx = 0; idx < nFrames; idx++) {
             out[idx] = myPluggie->HpFilter_single(myPluggie->LpFilter_single(in[idx]));
            }
           }
          }
          

          The latter has another for loop, but the former has two hidden loops (in the functions), as well as two function call overheads if the _buf versions don’t meet the compiler’s criteria for inlines. But the overhead isn’t my main point, it’s the flexibility.

Leave a Reply to robert bristow-johnson Cancel reply

Your email address will not be published. Required fields are marked *