Understanding Loudspeaker Acoustic Measurements
By Jeff Bagby
The Impulse Response –
The most effective measurement method is to use an Impulse Response, and because of this nearly
every measurement system will use it (or a very short Sin sweep similar to it). The Impulse Response is a
loudspeaker’s response to a sharp, narrow, short pulse that contains a uniform distribution of all
frequencies in the audio band. Due to its length of time, which is only a few milliseconds in duration, it is
less likely to excite room modes and reflections than using longer term signals like Pink Noise or Warble
tones.
The Impulse Response is a time domain response plotted along a time scale. This allows you to see the
arrival of later reflections being picked-up by the microphone as ripples in the Impulse Response plot. By
examining the impulse response you can “window” the data to remove these reflections. Once the
software has captured the speaker’s Impulse Response the frequency response, phase response,
cumulative spectral decay, and energy storage data are calculated from it using a Fast Fourier
Transform. All of this information is contained in the single impulse response. The following discussion
will be based on using the Impulse Response method.
Far-Field Measurements –
To be in the far-field is not as far as you may think. You are effectively in the far-field when your
microphone is at a distance that is 3-5 times the radiating diameter of the driver. So, for a 6.5” woofer
that has a radiating diameter of approximately 5” you will be in the far-field once your mic is placed 18”
or so out from the driver. In order to make sure that you are picking up the baffle step correctly you
need to make sure your mic is at least twice the width of the baffle away from the speaker as well. I
bring this up because usually smaller drivers are on smaller baffles, but if the baffle is wide you need to
take this into consideration. Also, large speakers with multiple drive units that are spaced quite a bit
apart need some distance to integrate. So, bigger speakers can become more problematic when
measured in rooms and may require a greater measuring distance to be effective.
Here’s a tip, if you are measuring in your room then closer is usually better. There is no rule that says
you need to measure at one meter to have good measurements, and usually the opposite turns out to
be true – measuring at one meter may instead allow for much more room interaction in your
measurements. For small two-way monitor size speakers I usually measure at a distance of around 20”.
Near-Field Measurements –
Near-field measurements are usually done to overcome the effects of standing waves and reflections in
the room. For a near-field measurement to work correctly certain rules are usually followed. The mic
needs to be placed as close to the center of the driver as possible. The mic then needs to be spaced 0.10
times the effective radiating radius of the driver from the cone. So, for our 6.5” woofer, the 5” radiating
diameter has a radius to 2.5”, so the mic should be positioned 0.25” off the dust cap. The reason we aim
for the center of the cone is because at higher frequencies sound will arrive from different areas of the
cone out of phase and create cancellation nulls and peaks that you won’t see in the far-field. Because of
this the usable upper limit of the near field response is defined as Fmax = 4311 / radiating diameter (in
inches). In our example this would be 4311 / 5” or 862Hz. Below this frequency the near field response
will be perfectly accurate and free of room and cabinet diffraction effects.
Gating the Impulse Response –
This sounds so technical, but it’s not that complex. We have already seen that the Impulse Response is a
time domain response, so “gating” or “windowing” is a time window. It is a measurement of how long of
duration in terms of milliseconds that we continue to capture the impulse response.
Long or short what are the pros and cons? When you view the impulse response the high frequencies
are at the beginning of the impulse due to their short wavelengths, whereas low frequencies make up
the tail of the impulse because they take longer to fully develop. Therefore, the longer your time
window is open, the lower in frequency you allow the wavelengths to develop. In fact, the lowest useful
frequency is the inverse of the time in milliseconds, for example, 5 mSec = 1 / 0.005 = 200 Hz. So, a 5
mSec Gate time is technically good down to 200Hz. The “pro” here is that the longer you leave the gate
open the lower in frequency your data covers. The “con” is the flip side of this coin. The longer your gate
time, the lower in frequency you can go, but you also allow more time for reflections of the impulse to
bounce off objects in the room and reflect back to the mic. We will see these as late ripples in an
otherwise smooth impulse response. We can then set a limit on the Impulse Response window; this is
called a “Gate”. A gated response is also called a “quasi-anechoic response” because we are attempting
to capture only the first arrival sound pressure without any echoes (reflections) included in the data.
Here’s a another tip - The compromise solution is to take impulse measurements closer to the speaker,
like we are doing, and if needed raise the test volume (SPL) a little – I like it around 85db – and this helps
to make the first arrival from the loudspeaker more dominant over the room modes to a lower
frequency. One interesting feature in the Omnimic software is the “blended” response. It uses the gated
response down to the frequency corresponding to the gate time and below that “blends” the room
curve with it. This can be quite useful, but if the room is making a mess of things then we can make our
own blended response by merging near and far field measurements.
Gating and Smoothing the Response –
It should be noted that Gating the time window of the Impulse Response results in a form of response
smoothing. You will easily see this in the frequency response, even at higher frequencies, if you shorten
the gate marker to shorter and shorter time intervals. In addition to this there is octave to octave
smoothing of the response as well. This is a mathematical process where the frequency response is
averaged over a broader and broader band of frequencies. It is usually expressed in terms like 1/3rd
Octave smoothing or 1/24th Octave smoothing, etc. Our brains tend to perceive the tonal balance of a
speaker based on 1/6th Octave smoothing. In other words, a response plot shown with 1/6th Octave
smoothing will correlate most closely with our perceived tonal balance of the speaker. But, this is too
smooth for good crossover design work because it can allow narrow band peaks and dips to be hidden
from our view.
Here’s a tip - I typically use 1/48th Octave smoothing when I design a crossover because I want to see
everything that is going on and make sure nothing is sneaking past me. I would recommend 1/24th
Octave as a bare minimum when designing a crossover.
Merging Near-Field and Far-Field Responses –
Above we discussed the proper methods for collecting the near-field frequency Response and the far-
field frequency response. We have also discussed Gating the Impulse Response window. From this we
have seen that the near-field response is limited to a maximum frequency for accuracy and the gated
far-field response is limited in the minimum frequency for accuracy. Wouldn’t it be great if we could get
the best of each? We can! Using our 6.5” woofer example we saw that the near-field response was
limited to an upper limit of 862 Hz. And using our gated impulse response with a 5 mSec gate time our
far-field response was limited to a low frequency limit of about 200 Hz. This means that these two
response curves should both be good in the two octave range of 200 – 800 Hz, and this gives us the
range in which we can look to merge this data into one “spliced” or “blended” frequency response. You
will typically find that the near-field (NF) data is much higher in SPL than the far-field (FF), and it is
always best to lower it to match the far-field.
Here’s another big tip - Traditionally, NF and FF are merged with a hard splice at a specific frequency.
For example, say 300 Hz is chosen; the two response lines will be adjusted to meet at 300 Hz and then
spliced with a hard cut at this frequency. This can work fine, but it can also be difficult to get a good
match that does leave some degree of discontinuity in the response plot. Most software works this way,
including my Response Modeler. However, recently Charlie Laub and I have worked together to create a
spreadsheet that rather than performing a hard splice provides a softer splice and “blends” the response
over a user-defined frequency band. This eliminates the risk of a discontinuity and provides a much
smoother transition from far-field to near field response data. This program also allows for a modeled
bass response to be blended if desired, and sometimes this is more than adequate and very similar to
merging the near field data on sealed systems and maybe a little better than working with the
complexities of vented box data where the port and cone have to summed as well.
Near Field Cone and Port Summation –
In a vented speaker we may want to sum the output of the port and the cone into one single response.
My Passive Crossover Designer spreadsheet can do this very easily if you do a little work in advance. It
works like this: Measure the port and cone responses using our near-field technique described earlier
and save these responses as FRD files. Now, once you have taken your two responses unless the port
and cone have the same area (which is unlikely, unless the port is a passive radiator) we will have to
adjust their relative SPL’s. The smaller of the two, usually the port, will need to be lowered in SPL (dB) by
the ratio of their areas using this formula: Lower the port by 20 Log (Port Diameter / Cone Diameter).
So, for our 6.5” woofer, with its 5” cone area, working with a 2” port, the port output will need to be
lowered by 20 Log (2 / 5) = -7.96 dB, or roughly 8 decibels.
Next take both near-field measurements and import them into PCD with the cone response as the
woofer and the port response as the tweeter in a two-way configuration. Turn on the active section for
the tweeter and adjust the tweeter’s (port’s) level down by -8 dB. Now, if you look at the summed
response (black line) in PCD it will be correct for the summed port and cone response for both
amplitude and phase response. You might note that the Impulse Response on the port measurement is
negative, so you don’t have to change the phase in PCD it will already be correct. Finally, save the
summed response as an FRD file and you can now use it to merge with the far-field response if you
choose.
Note: for dual ports, find the total port area of both ports and determine the effective diameter for the
combined area. Now, take the measurement at only one port, find the ratio of the total port diameter to
the cone and make the calculation and then add 6dB. So, for our example, if there had been 2 – 2” ports
then the port output would be lowered by -7.96 + 6 dB = -1.96 dB. For dual woofers and a single port we
will need to do it the other way around and subtract another 6dB from the port output.
Here’s a tip – what is the best indicator of the tuning frequency (Fb) from this data? The peak in the
port’s output is usually a little higher in output than the actual tuning frequency. However, the deep
notch in the woofer’s response will be right on the actual tuning frequency. Do a near-field
measurement at your cone and you will find the tuning frequency very quickly.
Phase Response –
All frequencies are cyclical; they involve an outward (compression) and inward (rarefaction) movement
of the cone at X number of times a second. These show up in the impulse in the time domain. If the
cone is moving forward the spike will be in the upward or positive direction and when it is oscillating
backward the impulse will go downward in the negative direction. When the FFT convolves the impulse
response it will not only show the frequency response but also the phase response, since this is captured
in the time domain data. Loudspeaker drivers are what are called “minimum phase devices”. This means
that the phase response can be directly extracted from the frequency response because there is direct
causality. The mathematical means to perform this extraction is called a Hilbert-Bode Transform.
Minimum phase means there is no excess delay affecting the phase, only what is attributable to the
frequency response, since it is the phase at the cone. Since drivers and crossover components are
minimum phase, how they sum is predictable and clearly mathematically defined. If this was not the
case then crossover programs would never be able to predict the final summed response. Measurement
systems can yield different versions of phase data depending on how it is captured, whether there is a
distinct time marker used to begin the impulse, and whether there is time of flight included in the phase
data. All of these factor in to determine whether it is best to use measured phase to extracted phase in
crossover development. Programs like Passive Crossover Designer can work with either, but if extracted
minimum phase data is used with accurate offsets to define the relative locations of the drivers, then
the program can calculate the frequency response on various axes and show their effect on the
crossover summation. This can add a powerful dimension to the crossover design process. A comment
should be made about Group Delay. Group Delay is a measurement of the rate of change of the phase
response. Since frequency response and phase are tied together, so is Group Delay. When the frequency
response is rolling off rapidly the Group Delay will be greater. Studies show that large amounts of Group
Delay can be audible, even when the relative phase is not.
Additional Topics of Interest
Ground Plane Measurements –
Ground plane measurements are a bit different. For these we place the mic and the speaker on a flat,
smooth, highly reflective surface, with the mic lying on the ground. The speaker will sit on the ground at
least the minimum far-field distance from the mic. You may need to tilt the speaker downward toward
the mic in order to maintain the proper axis during the measurement. The reflection from the ground
creates a virtual image of the speaker, so ground plane measurements are always 6 dB higher than half-
space measurements.
One of the problems with ground plane measurements though are due to the fact that the virtual baffle
is now twice as large as it really is and this will change the low frequency response some. Another
problem is if the surface is not perfectly flat or very highly reflective, these will alter the response too.
The advantage to the ground plane response is that if performed outdoors can provide response that is
accurate to 20Hz or below. There are techniques that can be used to get around some of the issues but
they require some very precise distances and placement that the other methods don’t require. For this
reason I tend to stick with near-field and far-field and merging the data.
Real Time Analyzer Data –
Real Time Analyzers (RTA) will either use Pink Noise or White Noise. White noise has the same amount
of every frequency and in Pink Noise the spectrum is evenly distributed by octave. Our hearing
perception is balanced by octave so Pink Noise is distributed the way we hear and white noise will tend
to sound like it is dominated by high frequencies. RTA’s will use a series of bandpass filters to derive
center frequencies and these will be spaced by octaves or some fraction of an octave (like a 1/3rd Octave
analyzer). An RTA will give the average SPL in each of the bands it uses at these center frequencies. We
lose resolution with RTAs but they are useful in giving us a good picture of a speaker’s tonal balance and
room response. They are not, however, very useful in crossover development.
Cumulative Spectral Decay –
The Cumulative Spectral Decay (CSD) gives a detailed presentation of a loudspeaker’s resonances.
Ideally, a speaker’s impulse response should die away very rapidly and evenly. Real loudspeakers,
however, have resonances and stored energy which take time to dissipate. The way the CSD works is
after the initial impulse the FFT convolves the impulse into the frequency response. Then after a very
short period of time it convolves the data again to produce another frequency response plot from the
Impulse data a little later in time on the Impulse. It does this process repeatedly, each time showing the
frequency response later and later in time. Each of these response calculations is presented in a three
axis graph with Frequency as the X axis, decibels as the Y axis, and time as the Z axis, with each response
shown as a slice in time. The visual presentation will then show how certain frequencies take longer to
die out than other frequencies do. This is often referred to as a “Waterfall Plot” because of its
appearance. The CSD is also bounded by the gate time used to window the Impulse response. As a result
it can only show frequencies down to the point corresponding to the gate time. It should be noted that
anything that changes the frequency response of a driver will also change the CSD response. Because of
this it is easy for the CSD to be misinterpreted. For example, an edge diffraction causing a dip and peak
in a tweeter’s response will show a ridge in the CSD that would not show up on a different baffle. These
may be confused as resonances when they are not.
Linear and Nonlinear Distortion –
Linear distortion is any deviation from flat frequency response, and it is arguably the most audible even
though we don’t usually think of it as distortion. Nonlinear distortion occurs when the speaker produces
output that is not in the original signal. These outputs usually occur at specific multiples of the
frequency of the signal; these multiples are called “harmonics”, so this form of distortion is usually
referred to as Harmonic Distortion. The audibility of harmonic distortion is hotly debated. Tests indicate
that distortion with simple signals like Sine waves is easily detected at just a few percent. However, tests
also show that with complex music material it is very difficult to detect even fairly high levels of
distortion. To further complicate things nonlinear distortion is tied heavily to the output level of the
speaker, which makes comparing distortion data from one speaker to another nearly impossible unless
there were strict controls to make the tests identical. Along these lines, it is generally believed that a
speaker’s maximum acceptable SPL is determined by the amount of distortion the listener can tolerate.
Harmonic distortion is measured by applying a frequency sweep to a speaker. At each frequency the
analyzer will measure the SPL of not just the fundamental frequency, but also its second, third, fourth,
and fifth harmonic. At the end of the sweep the graph will display the fundamental frequency response
along with lines corresponding to these harmonics. The most objectionable form of harmonic distortion
is 3rd order distortion which adds to the sound in an unnatural, “grating” and usually unpleasing way.
Second order distortion, on the other hand, adds to the sound in a way to tends to add “warmth”, and in
some cases people do not find this type of distortion bothersome at all. The chief source of third order
distortion is in the design of the speaker’s motor and the changing voice coil inductance. Modern driver
design will often include copper sleeves and shorting rings in the voice coil gap to cancel inductive
distortion. As a result, in these speakers, 3rd order distortion can be significantly reduced. If harmonic
distortion is measured it is best to keep Total Harmonic Distortion below 1% and for third order
distortion to remain below second order distortion whenever possible.
Jeff Bagby
12/4/2013