What is dither?

To dither means to add noise to our audio signal. Yes, we add noise on purpose, and it is a good thing.

How can adding noise be a good thing??!!!

We add noise to make a trade. We trade a little low-level hiss for a big reduction in distortion. It’s a good trade, and one that our ears like.

The problem

The problem results from something Nyquist didn’t mention about a real-world implementation—the shortcoming of using a fixed number of bits (16, for instance) to accurately represent our sample points. The technical term for this is “finite wordlength effects”.

At first blush, 16 bits sounds pretty good—96 dB dynamic range, we’re told. And it is pretty good—if you use all of it all of the time. We can’t. We don’t listen to full-amplitude (“full code”) sine waves, for instance. If you adjust the recording to allow for peaks that hit the full sixteen bits, that means much of the music is recorded at a much lower volume—using fewer bits.

In fact, if you think about the quietest sine wave you can play back this way, you’ll realize it’s one bit in amplitude—and therefore plays back as a square wave. Yikes! Talk about distortion. It’s easy to see that the lower the signal levels, the higher the relative distortion. Equally disturbing, components smaller than the level of one bit simply won’t be recorded at all.

This is where dither comes in. If we add a little noise to the recording process… well, first, an analogy…

An analogy

Try this experiment yourself, right now. Spread your fingers and hold them up a few inches in front of one eye, and close the other. Try to read this text. Your fingers will certainly block portions of the text (the smaller the text, the more you’ll be missing), making reading difficult.

Wag your hand back and forth (to and fro!) quickly. You’ll be able to read all of the text easily. You’ll see the blur of your hand in front of the text, but definitely an improvement over what we had before.

The blur is analogous to the noise we add in dithering. We trade off a little added noise for a much better picture of what’s underneath.

Back to audio

For audio, dithering is done by adding noise of a level less than the least-significant bit before rounding to 16 bits. The added noise has the effect of spreading the many short-term errors across the audio spectrum as broadband noise. We can make small improvements to this dithering algorithm (such as shaping the noise to areas where it’s less objectionable), but the process remains simply one of adding the minimal amount of noise necessary to do the job.

An added bonus

Besides reducing the distortion of the low-level components, dither let’s us hear components below the level of our least-significant bit! How? By jiggling a signal that’s not large enough to cause a bit transition on its own, the added noise pushes it over the transition point for an amount statistically proportional to its actual amplitude level. Our ears and brain, skilled at separating such a signal from the background noise, does the rest. Just as we can follow a conversation in a much louder room, we can pull the weak signal out of the noise.

Going back to our hand-waving analogy, you can demonstrate this principle for yourself. View a large text character (or an object around you), and view it by looking through a gap between your fingers. Close the gap so that you can see only a portion of the character in any one position. Now jiggle your hand back and forth. Even though you can’t see the entire character at any one instant, your brain will average and assemble the different views to put the characters together. It may look fuzzy, but you can easily discern it.

When do we need to dither?

At its most basic level, dither is required only when reducing the number of bits used to represent a signal. So, an obvious need for dither is when you reduce a 16-bit sound file to eight bits. Instead of truncating or rounding to fit the samples into the reduced word size—creating harmonic and intermodulation distortion—the added dither spreads the error out over time, as broadband noise.

But there are less obvious reductions in wordlength happening all the time as you work with digital audio. First, when you record, you are reducing from an essentially unlimited wordlength (an analog signal) to 16 bits. You must dither at this point, but don’t bother to check the specs on your equipment—noise in your recording chain typically is more than adequate to perform the dithering!

At this point, if you simply played back what you recorded, you wouldn’t need to dither again. However, almost any kind of signal processing causes a reduction of bits, and prompts the need to dither. The culprit is multiplication. When you multiply two 16-bit values, you get a 32-bit value. You can’t simply discard or round with the extra bits—you must dither.

Any for of gain change uses multiplication, you need to dither. This means not only when the volume level of a digital audio track is something other than 100%, but also when you mix multiple tracks together (which generally has an implied level scaling built in). And any form of filtering uses multiplication and requires dithering afterwards.

The process of normalizing—adjust a sound file’s level so that its peaks are at full level—is also a gain change and requires dithering. In fact, some people normalize a signal after every digital edit they make, mistakenly thinking they are maximizing the signal-to-noise ratio. In fact, they are doing nothing except increasing noise and distortion, since the noise level is “normalized” along with the signal and the signal has to be redithered or suffer more distortion. Don’t normalize until you’re done processing and wish to adjust the level to full code.

Your digital audio editing software should know this and dither automatically when appropriate. One caveat is that dithering does require some computational power itself, so the software is more likely to take shortcuts when doing “real-time” processing as compared to processing a file in a non-real-time manner. So, an applications that presents you with a live on-screen mixer with live effects for real-time control of digital track mixdown is likely to skimp in this area, whereas an application that must complete its process before you can hear the result doesn’t need to.

Is that the best we can do?

If we use high enough resolution, dither becomes unnecessary. For audio, this means 24 bits (or 32-bit floating point). At that point, the dynamic range is such that the least-significant bit is equivalent to the amplitude of noise at the atomic level—no sense going further. Audio digital signal processors usually work at this resolution, so they can do their intermediate calculations without fear of significant errors, and dither only when its time to deliver the result as 16-bit values. (That’s OK, since there aren’t any 24-bit accurate A/D convertors to record with. We could compute a 24-bit accurate waveform, but there are no 24-bit D/A convertors to play it back on either! Still, a 24-bit system would be great because we could do all the processing and editing we want, then dither only when we want to hear it.)

This entry was posted in Digital Audio, Dither. Bookmark the permalink.

13 Responses to What is dither?

  1. Michael Thomas says:

    Thats Nice….

    • Vesna says:

      Finally an excellent audio dmsnoetration of why dithering is used. I’ve been dithering my final mixes at the last mastering stage on the basis that it was recommended by professionals. I knew that dithering had something to do with improving the quality of the final mastered audio but I couldn’t really hear why. With these audio examples you can actually hear the disturbance to the waveform in the tail of the non-dithered sine wave as it fades away. Thanks for this excellent explanation.

  2. Torben@orevox.dk says:

    Very nice straight forward explanation,
    thank you.

    • jeslapriya says:

      u say in audio signal what is dither but in digital image processing, means how its suitable 16 bit can reduce 8bit please mail me

      what is dither in digital image processing

      • Nigel Redmon says:

        In image processing, the bit depth controls color; dither is used to approximate the correct color over over several pixels when reducing bit depth to the point where the target bit depth can’t represent the image’s true colors accurately. We see this most often in print media—take a close look at an image in a newspaper. An extreme example of reducing bit depth is to convert covert a full color image to black and white (not just gray scale, but but pure black or white pixels).

  3. tut tut says:

    That’s all good and i understand now ..But Does The internet radio stations do dithering such as shout cast or microsoft station guide or radio that is pumping the net too???

  4. Nigel Redmon says:

    I can’t say, but in general, it should be done when there is a reduction in resolution. I can’t say specifically, not knowing those formats, where or if dithering is done. For instance, such a format might allow compression of 24-bit fixed, or 32-bit float source, and play it back on a computer system that can accept those bit depths, and no dithering is needed. If the format itself reduces the bit depth, then the encoder should dither; if the playback system require a smaller bit depth, then the decoder should dither.

  5. Mark says:

    Ehh, is this info still accurate, i mean: it was posted in 1996?
    I am pretty shure I have a 24Bit 96Khz interface:S.

    I’m not a pro or something so i’m just asking.

    thx

    • Nigel Redmon says:

      Yes, Mark, it’s true that we have 24-bit converters now. But not really.

      That is, each additional bit gives us half of what the previous bit gave us. By the time you get to the 24th bit, its contribution is so small, that it’s now below the thermal noise of any real circuit we can put it in. For instance, if one bit gives us one volt, the 24th bit would give us one ten-millionth of a volt. Considering that the background noise of your circuit needs to be quieter than that in order for it to be effective, you’ll need to resort to cryogenics.

      But that’s not to say that we might as well use 20-bit converters. Converters are typically spec’d based on the accuracy of their least-significant bit. So, the 20th bit of a 24-bit converter is likely to be more accurate (maybe 16 times more accurate, at the same spec) than the 20th bit of a 20-bit converter. So we might as well use 24-bit converters, whether we think we can hear the last couple of bits or not.

      Other than that, we’re at the limits of physics here—the thermal noise of atoms isn’t going to change any time soon. So if someone tries to sell you a 23-bit converter 5 years from now, claiming a breakthrough in knowledge, just have a good chuckle about it.

  6. Shay says:

    I think a great example of dither is that of determining whether a die is fair by rolling it many times and examining the distribution. Each roll alone tells you absolutely nothing about whether it’s fair; you have no expectation of any particular side coming up. But since the process is random (uncorrelated), multiple rolls can tell you to high accuracy just how fair it is, and which sides come up every so slightly more often than others.

  7. Shamefully Ignorant says:

    Nigel- I’m still a little confused. I am pretty sure that your explanation is really thorough and easy to understand for most people who are reading this, but I spent too much time in my life obsessing over learning instruments and not enough time trying to understand physics, so please forgive my total obtuseness. Simply put, I am merely trying to determine whether or not it is beneficial to use one of the dithering options when I bounce a completed track from Logic Pro 8. My resolution is 24 bit and the sample rate is 44100. The program defaults to no dithering. Should I change that to get a higher quality result?

    • Nigel Redmon says:

      No problem—I know this is a confusing issue…

      No dithering is needed on a 24-bit result—it would be a waste of CPU cycles. While same may be inclined to argue that any word-length reduction should be dithered, at a point it’s just silly to add noise that you won’t hear to avoid distortion that you won’t hear. At 24 bits—144 dB dynamic range—distortion in the lowest bit will be buried under the noise floor of any electronics that you will play it through.

      To put it another way: When you shorten the word length, you create a distortion; we dither to change the nature of the distortion to one that’s less irritating to us. So, if the distortion level is too low to hear, why bother changing its form?

  8. M. says:

    This is seriously a great response to a question I’ve had for a long time. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>