I’ve been involved in several discussions on timing resolution in digital audio, recently—honestly, I never knew it was a concern before. For example, in a video on MQA, the host explained that standard audio sample rates (44.1 and 48 kHz) were unable to match human perception of time resolution. He gave little detail, but the human limits for time resolution apparently came from a study how finely an orchestra conductor could discern timing events. At least, that’s the impression I got, with little said and no references given. But the main point is that there was a number—7 µs (or a range of 5-10 µs) for human timing resolution.
The argument against lower sample rates was that 44.1 kHz has a period (1/SR) of 22.68 µs, about three times larger than this supposed resolution of the ear. It’s also noted that a sample rate of 192 kHz would match the ear’s ability.
However, this is a fundamental misunderstanding of how digital audio works. Audio events don’t start on sample boundaries. You can advance a sine wave—and by extension, any audio event—in time by much smaller increments.
But certainly, since digital audio uses finite values, there must be some limitation, no? In a sense, there is, but it’s based on bit depth. This should be intuitive—if a signal gets small enough, below one bit, it won’t register in the digital domain and we cannot know what time it happened. In his excellent article, Time resolution of digital audio, Mans Rullgard derives an equation for time resolution. The key finding is that it depends only on the number of bits in a sample and the audio frequencies involved, and has no connection to sample rate. And 16-bit audio yields far greater time resolution than we can resolve as humans. Please read the article for details.
Here’s where I tell you it doesn’t matter
There is no time resolution limitation for dithered audio at any bit depth. Not due to sample rate, not due to bit depth.
The linked article shows why sample rate isn’t a factor, and that the time resolution is far better than the 7 µs claim. I’ll take that further and show you why even bit depth isn’t a factor—in dithered audio.
I wrote earlier that, intuitively, we can understand that if a signal is below one bit, it won’t register, and we could not know when it happened from the signal information. Does the topic of sub-bit-level error sound familiar? Isn’t that what we solved with dither? Yes, it was…
Proof by demonstration
We typically dither higher resolution audio when truncating to 16-bit. We do this to avoid audible quantization errors. This supposed timing resolution issue is a quantization issue—as Mans’ article points out. I assert that dither fixes this too, and it’s easy to demonstrate why.
You can do this experiment in your DAW, or an editor such as Audacity. It’s easiest to discuss the most common dither, TPDF dither, which simply adds a low, fixed level of white noise (specifically, triangular PDF, which is just random plus random—still sounds like white noise).
Take any 24-bit audio file, and dither it (TPDF) to 16-bit. In your DAW or editor, subtract one from the other. In your DAW, you can have each file on a separate track. If you did this right and didn’t change boundaries or alignment, both will be lined up exactly. Keep both track faders at 0 dB. Invert one—you probably have a “trim” plugin or something that lets you invert the signal. Now play both together. You should hear white noise. Actually, you’ll probably hear nothing at all, unless you’re monitoring very loudly. Be careful, but if you turn up you sound system you’ll hear faint white noise for the duration of the audio.
This is expected, of course—the whole idea of dither is to add noise to decorrelate quantization error from the music.
Now, think about what this means. The difference between the dithered 16-bit track and the original is white noise. If you record this white noise onto a third track via bussing, and play just the 24-bit original and the noise track, it is indistinguishable from the dithered 16-bit track. It’s basic mathematics. A – B = n, therefore B = A – n. (See, for this test to be 100% mathematically correct, you’ll need to invert the resulting noise track. But practically speaking, the noise isn’t correlated, so it will sound the same either way. But yeah, invert it if it worries you.)
Let’s run that by again. The 16-bit dithered audio is precisely a very low level (typically -93 dB full scale rms) white noise floor summed with the original 24-bit audio. If you don’t hear timing errors from the 24-bit audio, you won’t hear it from that with low-level white noise added, which means you won’t hear it from dithered 16-bit audio. Ever, under any circumstances.
Consider that the same is true even with 12-bit audio, or 8-bit audio—even 4-bit. If you prefer, though, you could say that there are terrible timing resolution errors in 4-bit audio, but all that dither noise is drowning them out!