When dealing with audio processing modules it might be important to know how much delay they introduce. This parameter is often called “latency.” Typically we need to care about latency when using processing modules for real-time performance, or when they need to be synchronized with other audiovisual streams. Examples from my everyday applications are:
The module’s delay is called “round trip” because the audio signal entering the processing module must eventually return back. With digital signal processing, the typical sources for delays are filters and various buffers that are used to reduce processing load and prevent glitching.
Measuring the round trip delay manually is a relatively easy task. The typical approach is to send a pulse through the processing box, capture it on the output, and somehow lay out the input pulse and the output pulse on the same timeline for measurement. This can be done either by using an oscilloscope, or with audio recording software, like Audacity. Below is an example of input and output impulses as seen on MOTU Microbook’s bundled digital oscilloscope:
Here, by eye we can estimate the delay to be about 25 ms. Needless to say, we need to use a pulse which doesn’t get filtered out or distorted severely by the processing box. Also need to check the group delay of the box for uniformity, otherwise measuring latency at one particular frequency would not reveal the whole picture.
However, the manual approach is not always convenient, and I’ve spent some time researching automated solutions. From my experience with Android, I’m aware of several mobile applications: Dr. Rick’o’Rang Loopback app, AAudio loopback command-line app, and Superpowered Audio Latency Test app. On computers, there is a latency tester for the Jack framework—jack_delay. All these apps come with source code. What’s interesting, they all use different approaches for performing measurements.
Yet another automatic delay measurement is bundled into ARTA and RoomEQ Wizard (REW), but their source code is not open. At least, for ARTA it’s known that the delay estimation is based on cross-correlation between reference and measured channels.
I decided to compare different automatic approaches. The purpose is to figure out how reliable they are, and how robust they are when encountering typical kinds of distortions occurring in audio equipment: noise, DC offset, echoes, non-uniform group delay, signal truncation or distortion.
Let’s start with the app that uses the most straightforward approach for round trip latency measurement. The source code and the description of the algorithm is located in this file. I’m referring to the state of the code tagged as “Version 1.7”. The algorithm is designed to measure latency on the acoustical audio path of the device—from the speaker to the microphone. It can also be used with an electrical or a digital loopback, too.
At first, the algorithm must measure the average noise level of the environment. It does so over a 1 second interval (and for some reason, in the code the average of absolute sample values are called “energy”, although in fact energy is defined in time domain as a sum of squares of sample values). The noise level is then translated into decibels, padded by 24 dB, and the resulting value is translated back into a 16-bit sample value, which is called the threshold.
Then the program outputs a pulse formed by a ramped down 1 kHz sine wave, 20 ms duration, with maximum loudness (the output level is set up manually via media volume on the device). On input, the algorithm waits for the first block of data where the average of absolute sample values exceeds the threshold, and within that block, finds the first sample exceeding the threshold. The index of this sample from the moment when the test pulse has been emitted is considered to be the round trip latency (in frames).
This process is repeated 10 times, current minimum and maximum of measured latency are tracked, and the measurement is abandoned if the maximum is exceeding the minimum more than twice. If not, then the resulting latency is calculated as an average of all measurements.
Leaving the implementation details aside, what can we say about the approach? The idea here is to reduce the test signal to a burst of energy and try to find it in time domain. What are potential issues with this?
What about resilience to distortions?
In fact, the last issue is a serious one. When doing manual testing, I always check the returned pulse visually, but the algorithm is “blind” to signal being ramped up. And ramping up can actually happen in mobile devices, where sophisticated processing is used for power saving and speaker protection purposes. Note that the algorithm can’t use a “warm up” signal to put the system under measurement into a steady state because the warm up signal could be mistaken for the test pulse.
So, although a straightforward time domain approach has its strengths, it can be fooled, and a manual check of the results is required anyway.
I’m going to consider the methods used by other apps in following posts.