Masking Final
Masking Final
Masking refers how sensitivity for one sound is affected by the presence of another sound,
and also with psychoacoustic phenomena that are for one reason or another typically
associated with masking.
Most of the time our world presents us with a multitude of sounds simultaneously. We
automatically accomplish the task of distinguishing each of the sounds and attending to the
ones of greatest importance. It is often difficult to hear one sound when a much louder sound
is present. This process seems intuitive, but on the psychoacoustic and cognitive levels it
becomes very complex. The term for this process is Masking, and it is probably the most
researched phenomenon in audition (Zwislocki 1978).
1) The process by which the threshold for detecting one sound (signal) is raised in the
presence of another sound (masker).
2) The amount by which the threshold of audibility of sound is raised by the presence of
another (masking) sound. (American Standards Association, 1960).
For example, a loud car stereo could mask the car's engine noise. The term was originally
borrowed from studies of vision, meaning the failure to recognize the presence of one
stimulus in the presence of another at a level normally adequate to elicit the first perception
(Schubert 1978).
We may use the word “masking” to denote either the threshold shift, per se, or the amount (in
dB) by which the threshold of a sound is raised due to the presence of another sound. Thus,
sound A in our example has been masked by sound B, and the amount of masking due to the
presence of B is equal to 26−10 dB, or 16 dB. In this case, 10 dB is the unmasked threshold
of sound A, 26 dB is its masked threshold, and 16 dB is the amount of masking. These
notions are illustrated in Fig. 10.1. We will adopt the convention of calling sound B the
masker, and sound A the signal. (The signal is often referred to as the test signal or probe
signal, and occasionally as the maskee.)
As will become obvious, masking not only tells us about how one sound affects another, but
also provides insight into the frequency-resolving power of the ear. This is the case because
the masking pattern to a large extent reflects the excitation pattern along the basilar
membrane The basic masking experiment is really quite straightforward.
First, the unmasked threshold of the test stimulus is determined and recorded. This unmasked
threshold becomes the baseline. Next, the masker is presented to the subject at a fixed level.
The test stimulus is then presented to the subject and its level is adjusted (by whatever
psychoacoustic method is being used) until its threshold is determined in the presence of the
masker. This level is the masked threshold. As just described, the amount of masking is
simply the difference in decibels between this masked threshold and the previously
determined unmasked (baseline) threshold. This procedure may then be repeated for all
parameters of the test stimulus and masker. An alternative procedure is to present the test
stimulus at a fixed level and then to vary the masker level until the stimulus is just audible (or
just marked).
Similar effects take place in most pieces of music. One instrument may be masked by another
if one of them produces high levels while the other remains faint. If the loud instrument
pauses, the faint one becomes audible again. These are typical examples of simultaneous
masking. To measure the effect of masking quantitatively, the masked threshold is usually
determined. The masked threshold is the sound pressure level of a test sound (usually a
sinusoidal test tone), necessary to be just audible in the presence of a masker. Masked
threshold, in all but a very few special cases, always lies above threshold in quiet; it is
identical with threshold in quiet when the frequencies of the masker and the test sound are
very different.
Masking effects can be measured not only when masker and test sound are presented
simultaneously, but also when they are not simultaneous. In the latter case, the test sound has
to be a short burst or sound impulse which can be presented before the masker stimulus is
switched on. The masking effect produced under these conditions is called pre-stimulus
masking, shorted to “premasking” (the expression “backward masking” is also used). This
effect is not very strong, but if the test sound is presented after the masker is switched off,
then quite pronounced effects occur. Because the test sound is presented after the termination
of the masker, the effect is called post-stimulus masking, shorted to “postmasking” (the
expression “forward masking” is also used).
Nature of masking
The masking produced by a particular sound is largely dependent upon its intensity and
spectrum. Let us begin with pure tones, which have the narrowest spectra. As early as 1894,
Mayer had reported that, while low-frequency tones effectively mask higher frequencies,
higher frequencies are not good maskers of lower frequencies. Masking, then, is not
necessarily a symmetrical phenomenon. This spread of masking to frequencies higher than
that of the masker has been repeatedly demonstrated for tonal maskers (Wegel and Lane,
1924; Ehmer, 1959a; Small, 1959; Finck, 1961). We must therefore focus our attention not
only upon the amount of masking, but also upon the frequencies at which masking occurs.
Figure 10.2 shows a series of masking patterns (sometimes called masking audiograms)
obtained by Ehmer (1959a). Each panel shows the amount of masking produced by a given
pure tone masker presented at different intensities. In other words, each curve shows as a
function of signal frequency how much the signal threshold was raised by a given masker
presented at a given intensity. Masker frequency is indicated in each frame and masker level
is shown near each curve. Several observations may be made from these masking patterns.
First, the strongest masking occurs in the immediate vicinity of the masker frequency; the
amount of masking tapers with distance from this “center” frequency. Second, masking
increases as the intensity of the masker is raised.
The third observation deals with how the masking pattern depends upon the intensity and
frequency of the masker. Concentrate for the moment upon the masking pattern produced by
the 1000-Hz masker. Note that the masking is quite symmetric around the masker frequency
for relatively low masker levels (20 and 40 dB). However, the masking patterns become
asymmetrically wider with increasing masker intensity, with the greatest masking occurring
for tones higher than the masker frequency, but with very little masking at lower frequencies.
Thus, as masker intensity is raised, there is considerable spread of the masking effect upward
in frequency but only a minimal effect downward in frequency. This phenomenon is aptly
called upward spread of masking. Note too that there are peaks in some of the masking
patterns corresponding roughly to the harmonics of the masker frequency. Actually, however,
these peaks are probably not due to aural harmonics because they do not correspond precisely
to multiples of the masker (Ehmer, 1959a; Small, 1959). Small (1959) found that these peaks
occurred when the masker frequency was about 0.85 times the test tone frequency.
Finally, notice that the masking patterns are very wide for low-frequency maskers and are
considerably more restricted for high-frequency maskers. In other words, high-frequency
maskers only are effective over a relatively narrow frequency range in the vicinity of the
masker frequency, but low frequencies tend to be effective maskers over a very wide range of
frequencies.
These masking patterns reflect the activity along the basilar membrane, as illustrated in Fig.
10.3. The traveling wave envelope has gradually increasing amplitude along its basal (high-
frequency) slope, reaches a peak, and then decays rapidly with a steep apical (low-frequency)
slope. It is thus expected that higher (more basal) frequencies would be most affected by the
displacement pattern caused by lower-frequency stimuli. In addition, the high-frequency
traveling wave peaks and “decays away” fairly close to the basal turn, so that its masking
effect would be more restricted. Lower frequencies, on the other hand, produce basilar
membrane displacements along most of the partition. In addition, the excitation pattern
becomes wider as the signal level increases.
Although a great deal of information about masking has been derived from studies using
tonal maskers, difficulties become readily apparent when both the masker and test stimulus
are tones. Two major problems are due to the effects of beats and combination tones. Beats
are audible fluctuations that occur when a subject is presented with two tones differing in
frequency by only a few cycles per second (e.g., 1000 and 1003 Hz) at the same time.
Consequently, when the masker and test tones are very close in frequency, one cannot be sure
whether the subject has responded to the beats or to the test tone. These audible beats can
result in notches at the peaks of the masking patterns when the masker and signal are close in
frequency (Wegel and Lane, 1924). The situation is further complicated because combination
tones are also produced when two tones are presented together. Combination tones are
produced at frequencies equal to numerical combinations of the two original tones (fl and f2),
such as f2–fl or 2fl –f2.
Beats may be partially (though not totally) eliminated by replacing the tonal maskers with
narrow bands of noise centered around given frequencies; however, the elimination of
combination tones requires more sophisticated manipulations (Patterson and Moore, 1986).
The results of narrow-band noise masking experiments have essentially confirmed the
masking patterns generated in the tonal masking studies (Egan and Hake, 1950; Ehmer,
1959b; Greenwood, 1961).
We have seen that upward spread of masking is the rule as masker level is increased.
However, a very interesting phenomenon appears when the stimulus level is quite high, for
example, at spectrum levels of about 60 to 80 dB. Spectrum level refers to the power in a
one-cycle-wide band. In other words, spectrum level is level per cycle. It may be computed
by subtracting 10 times the log of the bandwidth from the overall power in the band. Thus:
If the bandwidth is 10,000 Hz and the overall power is 95 dB, then the spectrum level will be
95−101og (10,000), or 95−40 =55 dB.
Higher-frequency maskers presented at intense levels can also produce masking at low
frequencies (Bilger and Hirsh, 1956; Deatherage et al., 1957a, 1957b). This is called remote
masking because the threshold shifts occur at frequencies below and remote from the masker.
In general, the amount of remote masking increases when the bandwidth of the masking noise
is widened or its spectrum level is raised (Bilger, 1958). Although the acoustic reflex can
cause a threshold shift at low frequencies, it is unlikely that this is the cause of remote
masking because remote masking has been shown to occur in the absence of the acoustic
reflex (Bilger, 1966). Instead, remote masking is most likely due primarily to envelope
detection of distortion products generated within the cochlea at high masker intensities
(Spieth, 1957; Deatherage et al., 1957a, 1957b).
It is apparent from Fig. 10.2 that masking increases as the level of the masker is raised. We
may now ask how the amount of masking relates to the intensity of the masker. In other
words, how much of a threshold shift results when the masker level is raised by a given
amount? This question was addressed in the classical studies of Fletcher (1937) and Hawkins
and Stevens (1950). Since the essential findings of the two studies agreed, let us concentrate
upon the data reported by Hawkins and Stevens in 1950. They measured the threshold shifts
for pure tones and for speech produced by various levels of a white noise masker. (It should
be pointed out that although white noise connotes equal energy at all frequencies, the actual
spectrum reaching the subject is shaped by the frequency response of the earphone or
loudspeaker used to present the signal. Therefore, the exact masking patterns produced by a
white noise depend upon the transducer employed, as well as on bandwidth effects that will
be discussed in the next section.)
Figure 10.4 shows Hawkins and Stevens’ data as masked threshold contours. These curves
show the masked thresholds produced at each frequency by a white noise presented at various
spectrum levels. The curves have been idealized in that the actual results were modified to
reflect the masking produced by a true white noise. The actual data were a bit more irregular,
with peaks in the curves at around 7000 Hz, reflecting the effects of the earphone used. The
bottom contour is simply the unmasked threshold curve. The essential finding is that these
curves are parallel and spaced at approximately 10-dB intervals, which is also the interval
between the masker levels. This result suggests that a 10-dB increase in masker level
produces a 10-dBincrease in masked threshold; a point which will become clearer soon.
The actual amount of masking may be obtained by subtract-ing the unmasked threshold (in
quiet) from the masked thresh-old. For example, the amount of masking produced at 1000 Hz
by a white noise with a spectrum level of 40 dB is found by subtracting the 1000-Hz
threshold in quiet (about 7 dB SPL) from that in the presence of the 40-dB noise spectrum
level (roughly 58 dB). Thus, the amount of masking is 58−7=51 dB in this example.
Furthermore, because the masked thresholds are curved rather than flat, the white noise is not
equally effective at all frequencies. We might therefore express the masking noise in terms of
its effective level at each frequency. We may now show the amount of masking as a function
of the effective level of the masking noise (Fig. 10.5).
As Fig. 10.5 shows, once the masker attains an effective level, the amount of masking is a
linear function of masker level. That is, a 10-dB increase in masker level results in a
corresponding 10-dB increase in the masked threshold of the test signal. Hawkins and
Stevens demonstrated that this linear relationship between masking and masker level is
independent of frequency (as shown in the figure), and that it applies to speech stimuli as
well as to pure tones.
Masking Patterns
In many of masking experiments, the signal frequency has been varied while the masker
frequency has held constant. Typically, the masker level has been held constant and the signal
level required for threshold has been measured as a function of signal frequency.
The Function relating masked threshold to the signal frequency is known as Masking pattern.
It is sometimes known as Masked Audiogram.
Masking patterns are plots of the amount of masking or the threshold of a signal as a
function of signal frequency in the presence of a masker with fixed frequency and level.
The masking patterns of narrow-band sounds, either sinusoids or bands of noise, have
been measured in many experiments.
Early work on masking patterns was particularly concerned with using the patterns as a
tool for estimating the spread of excitation of the masker within the cochlea (Wegel and
Lane, 1924; Fletcher and Munson, 1937; Egan and Hake, 1950; Zwicker, 1956).
It was thought that the signal threshold might be directly related to the amount of
excitation evoked by the masker at the place whose characteristic frequency corresponds
to the signal frequency. However, it soon became apparent that the masking patterns
showed a variety of complex features that could not be readily explained in terms of
spread of excitation. Also, there were systematic differences between the shapes of
masking patterns obtained using sinusoidal and Narrow-band noise maskers, even though
these are generally assumed to produce similar long-term-average excitation patterns.
Wegel & Lane (1924) reported the first systematic investigation of masking of one tone by
another. They determined the threshold of the signal with adjustable frequency and intensity.
The results obtained in this experiment were complicated by the occurrence of beats when the
signal and masker were close together in frequency.
To avoid this problem the later experimenters (Egan and Hake, 1950) have used a narrow
band of noise as either the signal or the masker. Such a noise has built in amplitude and
frequency vibration and does not produce regular beats when added to a tone.
Different kinds of noises are common in psychoacoustics. White noise represents a broad-
band noise most easily defined in physical terms. The spectral density of white noise is
independent of frequency; it produces no pitch and no rhythm. The frequency range of white
noise in auditory research is limited to the 20 Hz to 20 kHz band. Besides white noise, there
exist noises such as pink noise, in which high frequencies are attenuated. Another important
broad-band noise discussed in this section is called uniform masking noise. Strong frequency
dependence in the spectral density of noise leads to narrow-band noise and to low-pass or
high-pass noise. If masking effects on the slopes of such noises are sought, then care has to
be taken to produce slopes of the attenuation of the noise as a function of frequency, that are
at least as steep as the frequency selectivity of our hearing system
White noise is defined as having a frequency-independent spectral density. Figure 4.1 shows
threshold level as a function of the frequency of the test tone, in the presence of a white noise
with several different density levels.
Threshold in quiet is indicated by the broken line. Al-though white noise has a frequency
independent spectral density, the masked thresholds, indicated by solid lines, are horizontal
only at low frequencies. Above about 500 Hz, the masked thresholds rise with increasing
frequency. The slope of this increase corresponds to about 10 dB per decade, illustrated by
the dotted line. At low frequencies, the masked thresholds lie about 17 dB above the given
density level. Thus numbers representing the values of spectral density, I WN, indicate that
even negative values of the density level produce masking. Increasing the density level by 10
dB shifts the masked threshold upwards by the same 10 dB. This interesting result indicates
the linear behaviour of masking produced by broad-band noises. At very low and very high
frequencies, masked thresholds are the same as the threshold in quiet. It is interesting to note
that the strong individual differences in the dependence of threshold in quiet on frequency
almost completely disappear when thresholds masked by broad-band noises are measured an
effect that is based on the ear’s frequency selectivity representing masker and test tone within
the same band.
For some measurements, a masked threshold independent of frequency over the entire
audible frequency range is required. Such a masking curve can be produced by a special noise
with a density level that depends on frequency. Such a noise represents a mirror image of the
frequency dependence of the masked threshold for white noise. The attenuation of a network,
which has to be put in series with a white-noise generator to produce such a uniform masking
noise, is shown in the upper panel of Fig. 4.2. The resulting noise is called uniform masking
noise, because it produces – as shown in the lower panel of Fig. 4.2 – a masked threshold that
is independent of frequency. In this case, the parameter is given as the density level of white
noise from which the attenuation of the network is subtracted. Because this attenuation is
zero at frequencies below about 500 Hz, the masked thresholds indicated in Fig. 4.2 are the
same as those shown in Fig. 4.1 for this low frequency range.
Pure Tones Masked by Narrow-Band Noise
In this context, narrow-band noise means a noise with a bandwidth equal to or smaller than
the critical bandwidth (about 100 Hz below and 0.2f above 500 Hz, as outlined in Chap. 6). It
is more meaningful when narrow-band noise is used to give data in terms of the total level of
the noise instead of its density level. Using the equations given in Sect. 1.1, it is easy to trans-
form the density level into the total level once the bandwidth is known. Figure 4.3 shows the
thresholds of pure tones masked by critical-band wide noise at centre frequencies of 0.25, 1,
and 4 kHz. The level of each masking noise is 60 dB and the corresponding bandwidths of
the noises are 100, 160, and 700 Hz, respectively. The slopes of the noises above and below
the centre frequency of each filter are very steep (more than 200 dB/octave), in order to
exceed the frequency selectivity of our hearing system. The frequency dependence of the
threshold masked by the 1-kHz narrow-band noise is very similar on the axes of Fig. 4.3 to
that produced by the 4-kHz narrow-band noise. The frequency dependence of the threshold
masked by the 250-Hz narrow-band noise, however, seems to be broader. A second effect is
also noticeable: the maximum of the masked threshold shows the tendency to be lower for
higher centre frequencies of the masker, although the level of the narrow-band masker is 60
dB at all centre frequencies. The difference between the maximum of the masked thresholds
and the horizontal dashed line in Fig. 4.3, indicating the 60-dB test-tone level, amounts to 2
dB for 250-Hz, 3 dB for 1-kHz, and 5 dB for 4-kHz centre frequency. Ascending from low
frequencies, masked thresholds show a very steep increase, and after reaching the maximum,
a somewhat flatter decrease. The increase amounts to about 100 dB per octave. This steep
rise indicates the need for very steep filters, otherwise the frequency response of the filter and
not that of our hearing system is measured.
Figure 4.4 shows the dependence of masked threshold on the level of a noise centred at 1
kHz. All masked thresholds show a very steep rise from low to higher frequencies before the
maximum masking is reached. The slope of this rise seems to be independent of the level of
the noise masker, and the maximum always is reached 3 dB below the level of the masking
noise. Be-yond the maximum, the masked thresholds decay towards lower levels quite
quickly for low and medium masker levels. At higher masker levels, however, the slope
towards high frequencies becomes increasingly shallow. Therefore, the frequency
dependence of the masked threshold is level-dependent or non-linear. The nonlinear rise of
the upper slope of the masked threshold with masker level is an interesting effect which plays
an important part both in masking and in other auditory phenomena. The dips indicated in
Fig. 4.4 for masker levels of 80 and 100 dB stem from nonlinear effects in our hearing
system, which lead to audible difference noises created by interaction between the test tone
and the narrow-band noise. With increasing test-tone level, the subject reaches threshold by
listening for anything additional; in this case, it is the difference noise and not the test tone
that is heard. The latter only becomes audible when the test-tone level is increased to the
values indicated by the dotted lines.
Pure Tones Masked by Low-Pass or High-Pass Noise
Z The masking of pure tones by white noise limited by a steep low-pass filter (solid lines) or
a very steep high-pass filter (dotted lines) with cut-off frequencies of 1.1 and 0.9 kHz,
respectively, is shown in Fig. 4.5. The parameter, as for white noise, is the density level. The
masked thresholds decrease at the cut-off frequency not with the steepness of the attenuation
of the noise, but in the form shown in Fig. 4.4 for the masking of narrow-band noise. Below
the cut-off frequency of the low-pass noise, the masked thresholds are the same as found
using white noise as masker. The same holds true for frequencies of the test tone above the
cut-off frequency of the high-pass noise. There, the masked threshold increases with the test-
tone frequency by about 10 dB per decade. This means that masking on the slopes produced
by band-limited noises can be approximated by the sound pressure levels of the masker
falling within the critical band at the cut-off. The slopes found with narrow-band maskers
show up again in the masked thresholds produced by low-pass (solid) and high-pass noises
(dotted). This result indicates that the masked thresholds produced by narrow-band noises
and shown in Figs. 4.3 and 4.4, play an important part in describing masking effects of noise
maskers with different spectral shapes
Although the stimuli needed to study the masking of pure tones by pure tones are simple,
such masking experiments have many difficulties especially at medium and higher levels of
the masker. Figure 4.6 shows the threshold of a test tone as a function of its frequency when
masked by a 1-kHz masker at a level of 80 dB. As in all the measurements described in the
preceding sections, the subject responds as soon as the presence of the test tone produces
some sensation in addition to the sensation of the steady-state-masker (detection of anything).
An effect that appears to be quite dominant in this case is that beats are audible when the
frequency of the test tone is in the neighbourhood of the 1-kHz masker. For example, a test
tone presented at a frequency of 990 Hz and a level of 60 dB produces a beating quality at 10
Hz. The subject listening to such a beating tone hears something different from the steady
state masker and therefore responds, although the criterion is very different from hearing an
additional tone. Considering the whole frequency range from 500 Hz to 10 kHz it is clear that
beating becomes audible in two regions around 2 and 3 kHz in addition to the region around
1 kHz.
In addition to the problem of beats, another difficulty arises for in experienced subjects. At
test-tone frequencies near 1.4 kHz, the subject indicates audibility of an additional tone at the
relatively low test-tone level of 40 dB. A careful examination of these results and discussions
with experienced subjects show that inexperienced subjects do not hear the test tone at that
frequency and level, but a difference tone near 600 Hz. This difference tone is produced
through nonlinear distortions that originate in our own hearing system. The threshold of this
difference tone is not the threshold of the test tone we are seeking. The test tone with its
appropriate pitch is only detected at levels above about 50 dB. Only experienced subjects can
differentiate between the threshold of the difference tone and the threshold of the test tone.
To explain this complicated situation, different regions in the plane out-lined in Fig. 4.6 are
marked with indications of which sounds are heard by the subject. Below threshold in quiet
of the test tone (broken line), nothing but the masker is audible. Below about 700 Hz,
increasing the level of the test tone above threshold in quiet produces a region in which the
masker tone and the test tone are audible. At frequencies between about 700 Hz and 9 kHz,
the 80-dB 1-kHz masker produces a region in which only the masker tone is audible, even
though threshold of the test tone in quiet is much lower. Areas of audible beats are marked by
hatching. The region in which only the masker tone and the difference tone (but not the test
tone) are audible is marked by stippling. Above masked threshold of the test tone at
frequencies between 1 and 2 kHz, difference tones are also audible. All these results indicate
that thresholds of tones masked by tonal maskers are far more difficult to measure than
thresholds of tones masked by noise.
None the less, with well-trained subjects and some special equipment to reduce the audibility
of the difference tones, thresholds of the test tones masked by tonal maskers can be measured
or at least estimated. The region of beats cannot be avoided but one data point, where the
frequency of the test tone is identical to that of the masker, can be measured. For the point
shown, the test tone was 90O Out of phase with the masker. Figure 4.7 shows average results
from many subjects using this method. Individual differences are larger for such
measurements relative to those obtained with noise maskers. In contrast with the results
shown in Fig. 4.4, the data in Fig. 4.7 indicate a clear tendency for the slope towards lower
frequencies to become less steep with decreasing masker level. On the other hand, slopes
towards higher frequencies become shallower with increasing level of the masker. The
pronounced maximum in the neighbourhood of the masker tone occurs at similar frequencies
to those found for masking by narrow-band noise. However, the peak of masked threshold is
reduced with tonal maskers.
The different behaviour of the high and low frequency slopes at low levels produces an effect
that is somewhat unexpected. At low levels, a greater spread of masking towards the lower
frequencies than towards the higher frequencies occurs. At high levels, this behaviour is
reversed, so that a greater spread of masking is found towards higher frequencies than
towards lower frequencies. While the effect at higher levels is well known from masking with
narrow-band maskers, the effect at low levels is rather unexpected. Be-tween these two level
ranges, i.e. near a masker level of about 40 dB, the masking patterns are approximately
symmetrical. This effect is found at all frequencies for which it is sensible to distinguish
between different low- and high-frequency slopes. Figure 4.8 illustrates the findings in more
detail. The sensation level of the test tone, i.e. the level above threshold in quiet, is used as
the ordinate and is indicated by solid lines. The dotted lines show exactly the same data with
an inverse frequency scale (upper abscissa) mirrored at 1 kHz. This superposition illustrates
the inversion of the masking characteristic with increasing level. At a 20-dB masker level,
more spread of masking towards lower frequencies occurs; at 40 dB, masking is nearly
symmetrical and more spread of masking towards higher frequencies shows up at 60 dB.
The spread of masking towards higher frequencies shows a strong dependence on masker
level as already indicated in Fig. 4.4. This effect can be illustrated more clearly if the abscissa
and the parameter of Figs. 4.7 and 4.8 are exchanged. In this way, Fig. 4.9 is created with the
level of the test tone again as ordinate, but with the frequency of the test tone as the
parameter and the level of the masker as the abscissa. In such a display, an identical
increment of masker level and test-tone level would produce a 45◦ line, which is only
approximated by the data for the 1-kHz test tone 90◦ out of phase with the masker (broken
line). However, the increment in this case is a little less over the whole range of masker level.
The failure to produce a 45◦ line exactly is called the near-miss of Weber’s law, which
describes the audibility of an increment in level for tones. The higher the test-tone frequency,
the more the slopes of the rising curves deviate from the 45◦ slope. The solid lines in Fig. 4.9
represent test-tone frequencies above the masker frequency. The lines remain flat at low
masker levels at the threshold in quiet but rise more and more steeply with increasing test-
tone frequency. Instead of a slope of 1, the curve for the test-tone frequency of 6 kHz shows a
slope as high as 3. Hence the increase in threshold level of the test tone is three times larger
than the increase in masker level. The data given in Fig. 4.9 represent average values.
Individual data for single subjects sometimes yield slopes as steep as 6. This means that an
increment in masker level of 1 dB can produce an increment in the masked threshold of the
test tone of up to 6 dB.
The results displayed in Figs. 4.7 to 4.9 also hold for other masker frequencies if appropriate
scales are chosen. Here, the effects shown with narrow-band noise maskers appear again:
except at frequencies of the masker below 500 Hz where the masked thresholds as a function
of the test-tone frequency appear to be broader, the shape of the curves may be predicted by
shifting the whole curves in Figs. 4.4, 4.7 and 4.8 horizontally until the maximum appears at
the masker frequency.
Pure Tones Masked by Complex Tones
Pure tones appear relatively rarely in nature. Only some bird songs and the sounds produced
by a flute can be considered to be pure tones. Most of the instrumental sounds in music are
composed of a fundamental tone and many harmonics. The difference in timbre produced by
different musical instruments depends on the frequency spectra of their harmonics. Whereas a
flute produces primarily one single component, the fundamental, a trumpet produces many
harmonic partials and therefore elicits a much broader masking effect than a flute. Figure
4.10 shows thresholds of pure tones, masked by a complex tone composed of a 200-Hz
fundamental frequency and nine higher harmonics, all with the same amplitude but random in
phase. The masked thresholds are given for sound pressure levels of 40 and 60 dB of each
partial. On the logarithmic frequency scale, the distance between the partials is relatively
large at low frequencies, but becomes very small between the ninth and tenth harmonic.
Accordingly, the dips between the harmonics be-come smaller and smaller with increasing
frequency of the test tone. In the frequency range between 1.5 and 2 kHz, the maxima and the
minima can hardly be distinguished. At frequencies above the last harmonic, in our case 2
kHz, the masked thresholds are flatter towards higher frequencies at higher levels of the
masking complex. At frequencies one to two octaves above the highest spectral component,
masked thresholds approach threshold in quiet. In music, many complex tones, each
composed of many harmonics, are used at the same time. This means that the corresponding
masking effect can be assumed to produce shapes similar to those outlined in Fig. 4.10.
However, the minima between the lines become even smaller because the density of the lines
is higher.
It should be noted here, that non-random phase conditions of the components can lead to
temporal envelopes of the sound that can be described as impulsive. Consequently, temporal
effects in masking may become a crucial factor in determining masked thresholds.
The masking patterns produced by narrow-band noise maskers or by pure-tone maskers show
differences despite the same level and the same (centre) frequency: prominent differences
occur with respect to the level dependence of the slope towards lower frequencies. It is
possible to approximate noise by a relatively small number of equal-sized pure tones, the
frequencies of which are spread randomly within the bandwidth of the “noise”. Thus, it may
be reasonable to measure masking with an increasing number of tones and to compare the
effects with the masking effects produced by narrow-band noise. Figure 4.11 gives an
example for a centre frequency of 2 kHz (left) and an overall level of 70 dB. The critical
bandwidth at that frequency is about 330 Hz. The approximation of the critical-band wide
noise starts with just one tone at 2 kHz, continues with two tones at 1910 and 2100 Hz, or
1840 and 2170 Hz, and ends with five tones at frequencies of 1840, 1915, 2000, 2080, and
2170 Hz (right). The corresponding masking produced at the low frequency side is illustrated
in Fig. 4.11. Again, the test-tone level is shown as a function of frequency (upper scale) or of
critical-band rate (lower scale). The masking effect produced by the tone or the combination
of tones is shown by open symbols connected with solid lines.
The data displayed in Fig. 4.11 show very clearly that a single masker tone is an
inappropriate approximation of a narrow-band noise masker. Two tones produce masking
effects relatively close to those produced by narrow-band noise, provided the distance
between the two tones is chosen in such a way that the tone frequencies correspond closely to
the lower and higher cut-off frequency of the narrow-band noise. However, there still remain
differences of up to 7 dB between the two masking curves. An approximation of the narrow-
band noise by five tones produces an almost identical masking curve. The remaining
differences are within the accuracy of measurement.
Because five tones produce the same masking as a narrow-band noise, but one single tone
produces a much steeper masking slope, it may be possible to find the reason for this
difference using the five-tone complex. Using a special procedure, the level of the difference
tones produced by the five-tone complex can be estimated. The difference tones of odd order
play the most important role. The level of all these difference tones, as estimated through
subjective measurements, is displayed in Fig. 4.12. The difference tones of third, fifth,
seventh, and ninth order are indicated by different symbols. The frequencies of the five-tone
complex are indicated by vertical lines. The threshold of a pure tone masked by the five-tone
complex is given as a solid curve.
A comparison between frequency and level of the difference tones on the one hand, and the
masked thresh-old on the other, suggests that the masked threshold in the frequency range of
1300 to 1700 Hz is due to the difference tones which also produce masking. Therefore, it can
be assumed that the frequency selectivity of the ear remains the same, irrespective of whether
a narrow-band noise or a tone is used as the masker. However, the internally produced
nonlinear components, either difference tones or difference noises (in the case of narrow-
band noise as the masker) change the physical stimulus into an internal stimulus which is
broader and therefore produces more masking at the low-frequency side of the masker. At the
high-frequency side, this effect does not appear to play a role because masking is already
spread much more towards higher frequencies. Therefore, it can be assumed that the
frequency selectivity measured with a pure tone as the masker, although level dependent, is
the largest that is possible. Masking produced by a narrow-band noise is somewhat less
selective at the low-frequency side. Due to the appearance of distortion products (in this case
continuous spectra) it is almost level independent.
Factors influencing masking patterns
Similar Frequencies:
How effective the masker is at raising the threshold of the signal depends on the
frequency of the signal and the frequency of the masker.
The graphs in figure given below are a series of masking patterns. Each graph shows the amount
of masking produced at each masker frequency shown at the top corner, 250, 500, 1000 and
2000Hz. For example, in the first graph the masker is presented at a frequency of 250Hz at the
same time as the signal. The amount the masker increases the threshold of the signal is plotted
and this is repeated for different signal frequencies, shown on the X axis. The frequency of the
masker is kept constant. The masking effect is shown in each graph at various masker sound
levels.
Figure B
The Figure shows along the Y axis the amount of masking. The greatest masking is when the
masker and the signal are the same frequency and this decrease as the signal frequency moves
further away from the masker frequency (Gelfand 2004). This phenomenon is called on-frequency
masking and occurs because the masker and signal are within the same auditory filter
This means that the listener can not distinguish between them and they are perceived as one
sound with the quieter sound masked by the louder one.
The amount the masker raises the threshold of the signal is much less in off frequency
masking, but it does have some masking effect because some of the masker overlaps into the
auditory filter of the signal, (Moore 1998).
Off frequency masking requires the level of the masker to be greater in order to have a
masking effect; this is shown in figure F
This is because only a certain amount of the masker overlaps into the auditory filter of the
signal and more masker is needed to cover the signal (Moore 1998).
Figure F
Lower Frequencies:
The masking pattern changes depending on the frequency of the masker and the intensity
(figure B).
For low levels on the 1000Hz graph, such as the 20-40 dB range, the curve is relatively
parallel. As the masker intensity increases the curves separate, especially for signals at a
frequency higher than the masker (Gelfand 2004). This shows that there is a spread of the
masking effect upward in frequency as the intensity of the masker is increased. The curve
is much shallower in the high frequencies than in the low frequencies. This flattening is
called upward spread of masking and is why an interfering sound masks high frequency
signals much better than low frequency signals (Gelfand 2004).
Figure B also shows that as the masker frequency increases, the masking patterns become
increasingly compressed. This demonstrates that high frequency maskers are only effective
over a narrow range of frequencies, close to the masker frequency. Low frequency maskers
on the other hand are effective over a wide frequency range (Gelfand 2004).
He carried out an experiment to discover how much of a band of noise contributes to the
masking of a tone.
In the experiment, a fixed tone signal had various bandwidths of noise centred on it. The
masked threshold was recorded for each bandwidth.
His research showed that there is a critical bandwidth of noise which causes the
maximum masking effect and energy outside that band does not affect the masking.
This can be explained by the auditory system having an auditory filter which is centered
over the frequency of the tone. The bandwidth of the masker that is within this auditory
filter effectively masks the tone but the masker outside of the filter has no effect.
Application:
This principle is used in MP3 files to reduce the size of audio files. Parts of the signals
which are outside the critical bandwidth are cut out leaving only the parts of the signals
which are perceived by the listener (Sellars 2000).
Effects of intensity
The lower end of the filter becomes flatter with increasing decibel level, whereas the
higher end becomes slightly steeper (Moore 1998).
Changes in slope of the high frequency side of the filter with intensity are less consistent
than they are at low frequencies.
At the medium frequencies (1-4kHz) the slope increases as intensity increases, but at the
low frequencies there is no clear inclination with level and the filters at high centre
frequencies show a small decrease in slope with increasing level (Moore 1998).
The sharpness of the filter depends on the input level and not the output level to the filter.
The lower side of the auditory filter also broadens with increasing level (Moore 1998).
These observations are illustrated in figure above.
Experiments have been carried out to see the different masking effects when using a masker
which is either in the form of a narrow band noise or a sinusoidal tone.
The masking patter of a tone has been used for Example by Wegel and Lane (1924) to infer
the patter of activity set up by the tone in the cochlea. They used one of six sinusoids in
frequency range from 200 Hz to 3,500 Hz as the masker. With the masker at some fixed
intensity level the measured the change in threshold of another sinusoidal signal at various
frequencies. The summary of the figure is shown in figure below.
The solid line indicates for each frequency how many decibels the signal must be raised
above its absolute threshold (SL) to be just detectable in the presence of a 1,200 Hz, 80 dB
SL masker. Below the solid line only the masker can be heard. Above the solid line both
masker & signal are audible. Most masking occurs when the signal frequency is close to the
masker frequency. As the signal and masker frequency diverge, the amount of masking
diminishes.
Disadvantages
a. Beats and
b. Combination tones.
When a sinusoidal signal and a sinusoidal masker (tone) are presented simultaneously the
envelope of the combined stimulus fluctuates in a regular pattern described as beats. Beats are
audible fluctuations that occur when a subject is presented with two tones differing in frequency
by only a few cycles per second. E.g. 1k & 1.03 kHz at the same time. When the masker and the
tones are very close in frequency, it is difficult to say whether the subject has responded to the
beats or to the test tone. These audible beats can result in notches at the peaks of the masking
patterns when the masker and the maskee are close in frequency (Wegel & Lane 1924). Beats can
be a cue to the presence of a signal even when the signal itself is not audible. The influence of
beats can be reduced by using a narrowband noise rather than a sinusoidal tone for either signal or
masker. (Moore 1986)
The situation is further more complicated because combination tones are also produced when
two tones presented together.
Typical set of results of masking pattern for a NBN centered at 410 Hz is shown in figure below:
The masking patterns obtained in these experiments show steep slopes on the low frequency
side of between 55-190dB/octave for NBN (80 & 240 dB/octave for pure tone masking). The
slopes on the HF side are less steep and depend to some extend on the level of the masker
On the high frequency sides of the patterns, the slopes become shallower at high levels. Thus
if the level of a LF masker is increased by say 10dB, the masked threshold of a HF signal is
elevated by more than 10dB; the amount of masking grows nonlinearly (in an expansive way)
on the HF side. This has been called the Upward spread of masking.
The masking patterns do not reflect the use of a single auditory filter.
Rather for each signal frequency the listener uses a filter centered close to the signal
frequency.
Thus the auditory filter is shifted as the signal frequency is changed
Simultaneous masking
Comodulation masking release
Recall that only a certain critical bandwidth of noise around a signal tone is involved in the
masking of that tone: The masked threshold of the signal will not be changed by widening the
noise bandwidth beyond the CB or adding one or more other bands outside of the CB.
However, a different situation occurs when the masking noise is amplitude modulated, as
illustrated by the following example.
It will be convenient for the noise band centered on the test tone to be called the on-signal
band, and for any other bands of noise to be called flanking or off-frequency bands. We will
begin by masking a pure tone signal by an on-signal band of noise that is being amplitude
modulated, as illustrated by the waveform in the panel labelled “on-signal band alone” in
upper portion of Fig. 10.13. The graph in the lower part of the figure shows that the masked
threshold of the signal is 50 dB in presence of this amplitude modulated on-signal noise. We
will now add another band of noise that is outside of the CB of the test tone. The off-signal
band will be amplitude modulated in exactly the same way as the on-signal band, as
illustrated in the panel labelled “comodulated bands” in Fig. 10.13. These two noise bands
are said to be comodulated bands because the envelopes of their modulated waveforms
follow the same pattern over time even though they contain different frequencies. We do not
expect any change in masking with the comodulated bands because adding the off-frequency
band is outside of the signal’s critical band. However, we find that the masked threshold of
the signal actually becomes better (lower) for the comodulated bands compared to what it
was for just the on-signal band alone. This improvement is called comodulation masking
release (CMR)(Hall, Haggard, and Fernandes, 1984).
In the hypothetical example of Fig. 10.13, the masked threshold of the tone improved from 50
dB in the presence of just the on-signal band (left bar) to 39 dB for the comodulated bands
(middle bar), amounting to a CMR of 11 dB. Notice that the masked threshold does not
improve (right bar) if the on-signal and off-signal noise bands are not comodulated (panel
labelled “uncomodulated bands”). Comodulation masking release also occurs for complex
signals (made up of, e.g., 804, 1200, 1747, and 2503 Hz) even if there is some spectral
overlap between the signal and the masker (Grose, Hall, Buss, and Hatch, 2005)
Comodulation masking release reveals that the auditory system is able to capitalize upon
information provided across critical band filters, although a cohesive model explaining CMR
is not yet apparent. One type of explanation suggests that the information provided by the off-
signal band(s) helps the subject know when the troughs or “dips” occur in the modulating
noise. Listening for the signal during these dips would result in a lower threshold (less
masking) compared to times when the noise level is higher. Another type of model suggests
that the auditory system compares the modulation patterns derived from the outputs of
auditory filters in different frequency regions. This pattern would be similar for the filters that
do not contain a signal but would be modified for the filter that contains a signal.
Detecting a disparity between the outputs of the filters would thus indicate the presence of a
signal. The interested student should see the informative review by Moore (1990) and the
many contemporary discussions of CMR parameters, models, and related effects (e.g., Buus,
1985; Hatch et al., 1995; Hall and Grose, 1990; Moore et al., 1990; Hicks and Bacon, 1995;
Bacon et al., 1997; Grose et al., 2005)
Overshoot
The masked threshold of a brief signal can be affected by the temporal arrangement of the
signal and a masking noise. The typical experiment involves a very brief signal and a longer
duration masker, with various timing relationships between them. For example, the signal
onset might be presented within a few milliseconds of the masker onset (as in Fig. 10.14a), in
the middle of the masker (Fig. 10.14b), or the signal onset might trail the masker onset by
various delays between these extremes (as in Fig. 10.14c to d). Compared to the amount of
masking that takes place when the signal is in the middle of the masker, as much as 10 to 15
dB more masking takes place when the signal onset occurs at or within a few milliseconds of
the masker onset. In other words, a brief signal is subjected to a much larger threshold shift at
the leading edge of a masker compared to when it is placed in the temporal mid-dle of the
masker. This phenomenon was originally described by Elliott (1965) and Zwicker (1965a,
1965b) and is known as overshoot.
The amount of masking overshoot decreases as the signal delay gets longer, usually
becoming nil by the time the delay reaches about 200 ms (e.g., Elliott, 1969; Zwicker, 1965a;
Fastl, 1976). Masking overshoot is maximized for signals with high frequencies (above 2000
Hz) and short durations (under 30 ms) (e.g., Elliott, 1967, 1969; Zwicker, 1965a; Fastl, 1976;
Bacon and Takahashi, 1992; Carlyon and White, 1992), and when the masker has a very wide
bandwidth, much broader than the Figure 10.14 Timing relationships between a masker and
brief signal. The signal onset is within a few milliseconds of the masker onset in the first
frame and occurs at increasing delays in the subsequent frames. The signal is presented in the
temporal middle of the masker in the last frame. critical band (e.g., Zwicker, 1965b; Bacon
and Smith, 1991).
In addition, overshoot becomes greater as the masker increases from low to moderate levels,
but it declines again as the masker continues to increase toward high levels (Bacon, 1990).
The different amounts of overshoot produced by narrow versus broad band maskers has been
addressed by Scharf, Reeves, and Giovanetti (2008), who proposed that overshoot is caused
(or at least affected) by the listener’s ability to focus on the test frequency at the onset of the
noise. This is disrupted by the wide range of frequencies in a broadband masker, but is
focused by the narrow band masker because its spectrum is close to the signal frequency.
Consistent with their explanation, they found that narrow band maskers caused little if any
overshoot when the test tone always had the same frequency (stimulus certainty), as in the
typical overshoot experiment. In contrast, narrow maskers produced more overshoot when
the test frequency was changed randomly between trials (stimulus uncertainty).
The opposite effect occurred with wide band maskers, in which case stimulus uncertainty
produced less overshoot. Although the precise origin of overshoot is not definitively known,
the most common explanation is based on adaptation in auditory neurons (e.g., Green, 1969;
Champlin and McFadden, 1989; Bacon, 1990; McFadden and Champlin, 1990; Bacon and
Healy, 2000). That the initial response of an auditory neuron involves a high discharge rate,
which declines over a period of roughly about 10 to 20 ms. The neural response produced by
the masker would thus be greatest at its leading edge and would weaker thereafter. As a
result, more masking would be produced at the beginning of the masker than in the middle of
it. Other hypotheses suggest that the basis for masking overshoot may be related to processes
associated with the basilar membrane input–output function (von Klitzing and Kohlrausch,
1994; Strickland, 2001; Bacon and Savel, 2004), or a disruption in the listener’s ability to
attend to the signal frequency (Scharf et al., 2008).
Masking overshoot can also occur when the signal is very close to the offset of the masker,
although it is considerably smaller than the onset effect (e.g., Elliott, 1969; Bacon and
Viemeister, 1985; Bacon and Moore, 1986; Bacon et al., 1989; Formby et al., 2000). A
peripheral origin for offset overshoot is unlikely because the increased spike rate seen at the
onset of auditory neuron firing patterns does not also occur at offset. Hence, overshoot at
masker offset has been attributed to central processes (e.g., Bacon and Viemeister, 1985;
Bacon and Moore, 1986)
Non-simultaneous masking
Masking that occurs when the test signal and masker do not overlap in time, referred to as
temporal or nonsimultaneous masking.
This phenomenon may be understood with reference to the diagrams in Fig. 10.15, which
show the basic arrangements used in masking experiments. In Fig. 10.15a, the signal is
presented and terminated, and then the masker is presented after a brief time delay following
signal offset. Masking occurs in spite of the fact that the signal and masker are not presented
together. This arrangement is called backward masking or pre-masking because the masker is
preceded by the signal, that is, the masking effect occurs backward in time (as shown by the
arrow in the figure).Forward masking or post-masking is just the opposite (Fig. 10.15b).
Here, the masker is presented first, and then the signal is turned on after an interval following
masker offset. As the arrow shows, the masking of the signal now occurs forward in time.
The amount of masking of the test signal produced under backward, forward, or combined
forward/backward masking conditions is determined while various parameters of the probe
and masker are manipulated. These parameters may be the time interval between signal and
masker, masker level, masker duration, etc.
Figure 10.16 shows some examples of temporal masking data from Elliott’s (1962a) classic
paper. The ordinate is the amount of masking produced by 50-ms noise bursts presented at 70
dB SPL for a test signal of 1000 Hz lasting 10 ms. The abscissa is the time interval between
masker and test signal for the backward and forward masking paradigms. Finally, the solid
lines show the amount of masking produced when the masker and signal are presented to the
same ear (monotically), and the dotted lines reveal the masking that results when the noise
goes to one ear and the signal goes to the other (dichotic masking). Notice that considerably
more masking occurs monotically than dichotically.
The amount of temporal masking is related to the time gap between the signal and the
masker. More masking occurs when the signal and masker are closer in time, and less
masking occurs as the time gap between them widens. However, the backward and forward
masking functions are not mirror images of each other. As the figure shows, Elliott found
greater threshold shifts for backward masking than for forward masking. Similar findings
were reported by some investigators (Lynn and Small, 1977; Pastore et al., 1980), but others
found the opposite to occur (Wilson and Carhart, 1971).
Backward masking decreases dramatically as the delay between the signal and masker
increases from about 15 to 20 ms, and then continues to decrease very slightly as the interval
is lengthened further. Forward masking also decreases with increasing delays, but more
gradually. It declines linearly with the logarithm of the masker-signal delay (Fig. 10.17) and
exists for intervals as long as about 200 ms depending on the study (e.g., Wilson and Carhart,
1971; Smiarowski and Carhart, 1975; Fastl, 1976, 1977, 1979; Widin and Viemeister, 1979;
Weber and Moore, 1981; Jesteadt et al., 1982)
The amount of temporal masking increases as the level of the masker is increased, but not in
the linear manner seen for simultaneous masking (Elliott, 1962a; Babkoff and Sutton, 1968;
Jesteadt et al., 1981). Rather, with temporal masking, increasing the masker level by 10 dB
may result in an additional threshold shift on the order of only about 3 dB. The duration of
the masker influences the amount of forward masking, but this does not appear to occur for
backward masking (Elliott, 1967). The amount of forward masking increases as the masker
duration gets longer up to about 200 ms (Zwicker, 1984; Kidd and Feth, 1982).
The combined effects of forward and backward masking may be found by placing the signal
between the two maskers, as shown in Fig. 10.15c. More masking occurs when backward and
forward masking are combined than would result if the individual contributions of backward
and forward masking were simply added together (Pollack, 1964; Elliott, 1969; Wilson and
Carhart, 1971; Robertson and Pollack, 1973; Penner, 1980; Pas-tore et al., 1980; Cokely and
Humes, 1993; Oxenham and Moore, 1994, 1995). Such findings suggest that forward and
backward masking depend upon different underlying mechanisms. The underlying
mechanisms of temporal masking are not fully resolved. Duifhuis (1973) suggested that the
steep segments of the monotic temporal masking curves (Fig. 10.16) may be associated with
cochlear processes, while the shallower segments at longer delays may be related to neural
events. Several findings implicate some degree of central processing in temporal masking.
For example, we have already seen that some degree of temporal masking occurs under
dichotic conditions for forward and backward masking. In addition, masking level differences
have been shown to occur for for-ward, backward, and combined forward-backing masking
(e.g., Deatherage and Evans, 1969; Robertson and Pollack, 1973; Berg and Yost, 1976; Yost
and Walton, 1977).
Several lines of evidence in addition to the material describe above suggest that central
processes are the principal factors in backward masking. Providing the listener with
contralateral timing cues supplies information that affects the uncertainty of the task and has
been shown to influence backward masking but not forward masking (Pastore and Freda,
1980; Puleo and Pastore, 1980). Moreover, performance decrements in backward masking
(but not forward or simultaneous masking) have been found to be associated with disorders
such as specific language impairment and dyslexia (Wright et al., 1997; Wright, 1998; Wright
and Saberi, 1999; Rosen and Manganari, 2001)
Frequency resolution
The ability of the ear to resolve individual components of a complex sound have significant
impact on a listeners ability to perceive the important subtleties of spectral shape in nearly all
sound discriminations and identifications. When frequency resolution is very good, all or
most frequency component can be detected and the internal spectral representation of the
signal will be faithful to the spectral characteristics of the sound itself, sharp and full of detail
but when resolution is poor, as is often the case in sensorineural hearing impairment, the
internal spectrum of the signal may lack clarity and detail, making discrimination of similar
signals difficult.
The encoding of spectral information is commonly modified as though the periphery were
composed of a bank of overlapping, band pass filters. Each hypothetical filter corresponds to
the highly frequency specific movement of the basilar membrane. The narrower these filters,
the finer will be the internal spectral representation of sounds. Frequency resolving abilities,
or estimates of the widths and shapes of the theoretical auditory filters, are most often studied
behaviourally in humans through masking experiments, one sound, usually a pure tone is
presented in the presence of another sound, usually noise masker, and the degree of inference
from the masker on detection of the signal is measured.
In a series of masking studies published first many years ago (Fletcher, 1940), it is noted that
detection of a tonal signal decreases as the band width of a noise masker, centered at the
signal frequency, increases. This decrease in detectability continues until a critical masker
bandwidth is reached, at which point, although masker loudness continues to grow, the
masker has no additional effect on the detection of the tone. The bandwidth at which masking
effects such as asymptote is termed the critical band. Estimates of the critical band show that
it increases with increase in signal frequency and is ~160Hz at 1000 Hz (Scharf, 1970).
Recent studies and knowledge of basilar membrane mechanics suggest that the auditory filter
is neither rectangular nor symmetrical as critical band data may lead one to believe. The
shape of the auditory filter at low signal levels is more likened to a rounded exponential with
very steep slopes above the signal and more shallow slopes below it. At higher intensities, it
becomes much more symmetrical and more broadly tuned (Patterson, 1976).
There are several methods by which the frequency resolving ability of the auditory system
and the shape of the auditory filter can be measured behaviourally. Two procedures the
measurement of psychoacoustic tuning curves and notched noise masking procedures, will be
discussed here. Though not all inclusive, they provide an example of ways in which
frequency resolution can be measured with relative ease.
However, one must avoid the temptation to think of the PTC as the psychoacoustic analogy
of an individual neural tuning curve. It is clear that much more than a single neuron is being
sampled, and that PTCs are wider than neural tuning curves. Moreover, the earlier
discussions dealing with the implications of beats, combination tones, and off-frequency
listening in masking are particularly applicable to PTCs. For example, PTCs become wider
when off-frequency listening is minimized by the use of notched noise (Moore et al., 1984).
A notched noise is simply a band-reject noise in which the band being rejected is centered
where we are making our measurement. Therefore, the notched noise masks the frequency
regions above and below the one of interest, so that off-frequency listening is reduced.
Figure 10.12 shows two sets of individual PTCs at four test-tone frequencies (500–4000 Hz)
from a more recent experiment. These PTCs were obtained by using simultaneous masking
(triangles) versus forward masking (squares). As the figure shows, PTCs generated under
forward masking conditions generally show sharper tuning than those described for
simultaneous masking conditions (e.g., Houtgast, 1972; Duifhuis, 1976; Wightman et al.,
1977; Weber, 1983; Lufti, 1984; Moore et al., 1984). These differences go beyond the current
scope, but the interested reader will find several informative discussions on this topic (e.g.,
Weber, 1983; Jesteadt and Nor-ton, 1985; Patterson and Moore, 1986; Lufti, 1988).
Application of psychoacoustic tuning curves
Classical masking describes an effect that is more precisely defined as simultaneous spectral
tone-on-tone masking. This effect was discussed in detail in Figure 4.13 shows two
possibilities for measuring this effect using the method of tracking: the classical masking
curve and the psychoacoustical tuning curve. The latter shows a set of psychoacoustical
tuning curves, i.e. the level LM of the masking tone necessary to just mask a test tone of level
LT(parameter), as a function of the frequency of the masker. This masker frequency can be
expressed in ∆z, the separation between masker and test-tone frequencies measured as
critical-band rate.
The method of tracking is relatively easy even for an untrained subject. However it is more
convenient for clinical use to measure step-wise, not continuously. To achieve this, an
electronic device was developed with which seven points of the tuning curve could be
measured, three points below and three points above the test-tone frequency. The seventh
point is obtained from the level of the test tone normally set 5 to 10 dB above threshold in
quiet. In most clinical cases, data from both the low-frequency and high-frequency regions
are needed. Therefore, the apparatus provides test frequencies of 500 Hz and 4 kHz, and 6
masker frequencies in the neighbourhood of each of the two test frequencies. The frequency
spacing’s of the six masker frequencies in relation to the test frequency are chosen so that the
simplified tuning curve consisting of the seven measured points, becomes a useful
approximation of the continuously measured tuning curve.
The procedure begins with the determination of threshold in quiet for the test-tone frequency.
In order to make the test tone more clearly audible, it is switched on and off smoothly every
600 ms. After determination of threshold in quiet, the test-tone level is set to a fixed value
about 10 dB above threshold in quiet. For determining tuning curves, the subject again listens
to the interrupted test tone, although a continuous masker tone of different frequencies is now
simultaneously presented. Masker frequencies of 215, 390, 460, 540, 615, and 740 Hz are
used for a test-tone frequency of 500 Hz. These frequencies are spaced unevenly to determine
tuning curves with only seven measurements as well as possible. The level of the masker is
determined at each masker frequency so that the continuous masker tone just masks the
interrupted test tone. Therefore, the hearing-impaired listener always listens to the same
interrupted tone and signals when the tone is heard. This way, seven masker-level data
indicating the level where the test tone is just masked are produced. The same procedure is
used for a test-tone frequency of 4 kHz, where masker frequencies of 1.72, 3.12, 3.68, 4.32,
4.92, and 5.92 kHz are produced in order to also obtain information about the “tail” of the
tuning curve towards low frequencies.
Simplified psychoacoustical tuning curves have been measured for normally hearing subjects
using the method described above. Because threshold in quiet varies somewhat for normally
hearing subjects and varies even more for hearing-impaired listeners, a meaningful averaging
procedure has to be introduced. Such a procedure leads to the masker levels that are plotted
as open circles in Fig. 16.21 for n= 33 normal hearing subjects. The data connected by
straight lines represent the simplified tuning curves. The frequency scaling drawn on the
abscissa is chosen so that equal distance on the abscissa corresponds to equal distance along
the basilar membrane, i.e., critical-band rate.
Pathological ears show tuning curves that may differ from those produced by normally
hearing subjects. To make this comparison possible, the tuning curve of normally hearing
subjects is also indicated in the data of listeners with pathological hearing. The first example
is of a group with conductive hearing loss (Fig. 16.22). It can be clearly seen, by comparing
the tuning curves of normally hearing subjects with those produced by listeners with
conductive hearing loss, that in the latter case the whole tuning curve is shifted upwards by
about 30 dB. These 30 dB correspond to the hearing loss. This means that although there
exists a hearing loss of 30 dB, the frequency resolution outlined in the psychoacoustical
tuning curve does not change for the group with conductive hearing loss.
The group with the degenerative hearing loss also shows a threshold shift of about 30 dB at
500 Hz (Fig. 16.23), however, the form of the tuning curve is very different. It is very
shallow towards lower frequencies of the masker and flatter still towards higher frequencies,
although there remains an increment of about 10 dB between the masker frequency of 540 Hz
and that of 740 Hz. For the high frequencies of 4 kHz, the hearing loss is even greater, the
form of the tuning curve is again very flat towards low frequencies, and the slope towards
high frequencies has an angle on these coordinates that is only one quarter that of normal
hearing subjects. This means that the frequency selectivity of patients in this group is
distinctly worse. Under such conditions, an amplifying hearing aid may bring back the
sensitivity as such, but cannot restore the impaired frequency selectivity.
The last example given here is that of noise-induced hearing loss. The tuning curve in the
500-Hz frequency range outlined in Fig. 16.24 shows a normal configuration, while the form
of the tuning curves measured at 4 kHz depends on the size of the hearing loss. For this
reason two groups have been separated, one with a hearing loss less than 55 dB and another
one with a hearing loss greater than 55 dB. This separation makes it clear that frequency
resolution becomes worse with larger hearing loss. It should be pointed out, however, that
noise-induced hearing loss, which produces mostly a threshold shift only at frequencies
above about 1500 or 2000 Hz, produces conditions by which cubic or quadratic difference
tones may become audible. Therefore, an additional masking noise should be added in the
low frequency range in order to mask this possible difference tone which, if heard, would
alter the shape of the tuning curve. The form of the tuning curve of this group indicates
clearly that it is not easy to restore normal hearing conditions through a hearing aid if the
listener suffers from a noise-induced hearing loss.
Another procedure for measuring the width and shape of the auditory filter id the notched noise
masking technique (e.g., Patterson, 1976). A listener is asked to detect a tonal signal in the
presence of a noise masker with either a flat or notched amplitude spectrum. The masker level is
fixed at a fairly moderate intensity, for example 30 to 40 dB sensation level and signal level is
varied in a search for a detection threshold.
Thresholds should approach quiet detection levels when notch is wide enough so that it no longer
interferences with the detection of the tone, that is, when the excitation pattern of the masker and
the signal no longer interfere with one other. Results are analysed in term of the slope of the
function relating threshold to notch width. If the function drop quickly as the notch is widened, it
is assumed that the frequency resolution of the listener is quite good. If instead the function is
shallow with little change in threshold as notch width is increased, it is assumed that frequency
resolving abilities are poor (or that the hypothetical filters are wide). Efficiency as with the
psychoacoustic tuning curve, is indicated by the overall threshold levels (intercept of the
function) and is independent of the slope of the function
When noise notched masking procedures are used with the noise bands and spectral notches
placed asymmetrically around the signal frequency, not only the width of the auditory filter but
also its shape can be mapped. When this is done, it can be seen that the shape of the auditory
filter is best approximated by a rounded exponential with steeper high frequency sides and
shallower low frequency tails, consistent with data obtained using other masking procedures and
with neural tuning curve data (Patterson, 1976).
Excitation patterns
It is common to regard the cochlea as a bank of overlapping band-pass filters. Because we
have a reasonable understanding of characteristics of these fibers, and how these characteristics
change with frequency and level, we can produce an estimate of how an arbitrary sound is
represented in the cochlea by plotting the output of each filter as a function of its center
frequency (equivalent to the characteristic frequency of each place in basilar membrane). This
plot is called an Excitation pattern.
One way of interpreting the masking patterns is a crude indicator of the excitation pattern of
the masker.
In other words, the excitation pattern of a sound is a representation of the activity or
excitation evoked by that sound as a function of CF or ‘place’ in the auditory system.
(Zwicker, 1970).
The signal is detected when the excitation produced is some constant proportion of the
excitation produced by the masker in the frequency region of the signal.
Thus the threshold of the signal as a function of the frequency is proportional to the masker
excitation level.
Excitation patterns provide a good model of auditory frequency selectivity and masking.
Frequency components that are resolved by the auditory system produce distinct peaks in the
excitation pattern.
Using the filter shapes and bandwidths derived from masking experiments we can produce the
excitation pattern produced by a sound. The excitation pattern shows how much energy comes
through each filter in a bank of auditory of auditory filters. It is analogous to the pattern of
vibration on the basilar membrane.
0
base apex
The figure 5.12 shows excitation pattern for a 1000Hz pure tone as a function of level of the
tone. It is seen that:
If the center frequency of the auditory filter matches the frequency of the tone, then
auditory filter output has a high value, and there is a peak in the excitation pattern at that
frequency.
For center frequency higher or lower than the frequency of the tone, the output of the
auditor filter is less (the tone is attenuated) and, hence, the excitation level is not as high.
Because the fibers are broader at HF, and have steeper HF slopes, a filter that has a center
frequency below the frequency of the tone will let less energy through a filter with a center
frequency the same deviation above the frequency of the tone. Therefore on a linear center
frequency axis such as this, the excitation pattern has a shallower HF slope than LF slope.
The excitation pattern to a complex tone is simply the sum of the patterns to the sine
waves that make up the complex tone (since the model is a linear one)
We can hear out a tone at a particular frequency in a mixture if there is a clear peak in
the excitation pattern at that frequency.
unresolved resolved
1600 25.0
800 600 400 200
20.0
15.0
10.0
5.0
0.0
base apex
log (ish) frequency -5.0
Nonlinearities, such as suppression and distortion products, are not accounted for.
Abstract
Excessive noise is becoming a significant problem for intensive care units (ICUs). This paper
first reviews the impact of noise on patients’ sleep in ICUs. Five previous studies have
demonstrated such impacts, whereas six other studies have shown other factors to be more
important. Staff conversation and alarms are generally regarded as the most disturbing noises
for patients’ sleep in ICUs. Most research in this area has focused purely on noise level, but
work has been very limited on the relationships between sleep quality and other acoustic
parameters, including spectrum and reverberation time. Sound-absorbing treatment is a
relatively effective noise reduction strategy, whereas sound masking appears to be the most
effective technique for improving sleep. For future research, there should be close
collaboration between medical researchers and acousticians
Sound masking
The sound masking system is often used to increase speech privacy and to minimize
distractions from other sounds. The system is being introduced to hospitals while patient
confidentiality is becoming more of an issue where responsible handling of personal details
forms an essential part of a data protection policy. Limited case studies have also shown that
using the systems in hospital wards could improve patient satisfaction [58-60]. In Gragert’s
study [58] the masking signal was proved to be an effective intervention and should be
considered a viable method of enhancing the sleep quality of patients in noisy ICU
environments. Patients with sound masking intervention believed that they slept better and
that it was quieter than in the control group. Williamson [59] investigated the influence of
ocean sounds (white noise) on the night sleep pattern of postoperative coronary artery bypass
graft patients after being transferred from an ICU. The group receiving ocean sounds reported
higher scores in sleep depth, awakening, return to sleep, quality of sleep, and total sleep
scores, indicating better sleep than the controlled group. The study by Stanchina and
colleagues [60] suggested that white noise increased arousal thresholds in healthy individuals
exposed to recorded ICU noise. The change in sound from baseline to peak, rather than the
peak sound level, determined whether an arousal occurred. From Table it can be seen that
sound masking has the most significant effect in promoting ICU patients’ sleep, producing an
improvement of 42.7%.
Conclusion
Based on a number of original papers, the impact of noise on patients’ sleep and the
effectiveness of noise reduction strategies in ICUs have been reviewed. These have shown:
noise is just one of a number of factors that may disrupt the sleep of patients on the ICU; staff
conversation and alarms are generally regarded as the most disturbing noises for patients’
sleep in ICUs; no research has been done on the relationships between ICU patients’ sleep
quality and the other room acoustic parameters besides sound level; and there are generally
four interventions for sleep improvement, including earplugs, behavioural modification,
sound masking, and acoustic absorption. Sound-absorbing treatment is a relatively effective
noise reduction strategy, whereas sound masking appears to be the most effective technique
for improving sleep.
There are some limitations of the existing studies, including the lack of attention to other
room acoustic conditions in addition to sound level, the combined effects of different sleep
disturbing factors, and the effects of noise on staff. For future research, there should be close
collaboration between medical researchers and acousticians to examine the different
characteristics of sound
Abstract
Future scientific and diagnostic interest in frequency resolution requires an evaluation of the
different methods that are available to measure it. We compared three methods: (1) pure-tone
thresholds in broadband noise, (2) pure-tone thresholds in the presence of a fixed pure-tone
masker and (3) psychoacoustical tuning curves. We additionally obtained estimates of
temporal integration and of speech intelligibility in noise. Three subject groups were tested:
10 normals, 13 subjects with a noise-induced hearing loss and 18 subjects with a cochlear
hearing loss but no history of noise exposure. Generally the three measures of frequency
resolution show moderate agreement with each other. Poor frequency resolution is invariably
associated with a pure-tone threshold loss. Temporal integration appears unrelated either to
the pure-tone threshold loss or frequency resolution. Some of the measures of frequency
resolution display significant correlation with speech intelligibility in noise. However, since
both variables are correlated with pure-tone threshold loss the exact relationship between
frequency resolution and speech intelligibility cannot be clearly established.
Abstract
We measured masked thresholds for pulsed pure tone (5, 1.0, 2.0, and 4.0 kHz) in the
presence of different levels of broad-band noise (nominally 0, 20, 40, and 60 dB/Hz). Several
of the 16 cochlear-impaired listeners displayed masked thresholds that were considerably
higher than those obtained from 10 normal listeners. At the 60 dB/hz noise level the
correlation coefficients between thresholds in noise and thresholds in quiet were r =
36, .44, .63 and .64 for signal frequencies of .5, 1.0, 2.0, and 4.0 kHz, respectively. The
growth of masking, as masker level was increased, was linear for the normal listeners but was
disproportionate and nonlinear in some cochlear-impaired listeners. In these data and data
from others studies, it is clear that thresholds in noise cannot be predicted from thresholds in
quiet. Masked thresholds are related to other measures of frequency resolution and to speech
intelligibility in noise, but it is argued that psychoacoustic tuning curves provide more direct
measures of the auditory-filter characteristics
Abstract
The correlation between classic masking patterns and psychoacoustical tuning curves is
discussed quantitatively. A simplified method to measure such tuning curves in clinical use is
described. They are shown to be insensitive to the frequency dependence of the hearing loss.
Tuning curve data of six different groups including normal and hard-or-hearing observers are
given: normal hearing, conductive hearing loss, degenerative hearing loss, noise-induced
hearing loss, otosclerosis and Menière's disease. The resulting tuning curve data indicate that
the frequency resolving power of the four groups mentioned last is greatley reduced but not
completely absent, especially in the range of greater hearing loss. The correspondence
between the frequency-resolving power measured by the tuning curve method and the result
of speech discrimination tests is demonstrated. The measured data indicate that more than
50% of the patients with otosclerosis show reduced frequency selectivity although
otosclerosis is commonly regarded as a conductive hearing loss.
Frequency selectivity in normally-hearing and hearing-impaired
observers.
Florentine M, Buus S, Scharf B, Zwicker E.,1980
Abstract
Author:Oberfeld D, Stahn P.
Source:PLoS One. 2012;7(10):e48054. doi: 10.1371/journal.pone.0048054. Epub 2012 Oct
24.
Abstract
The presence of non-simultaneous maskers can result in strong impairment in
auditory intensity resolution relative to a condition without maskers, and causes a complex
pattern of effects that is difficult to explain on the basis of peripheral processing. We suggest
that the failure of selective attention to the target tones is a useful framework for
understanding these effects. Two experiments tested the hypothesis that the sequential
grouping of the targets and the maskers into separate auditory objects facilitates selective
attention and therefore reduces the masker-induced impairment in intensity resolution. In
Experiment 1, a condition favoring the processing of the maskers and the targets as two
separate auditory objects due to grouping by temporal proximity was contrasted with the
usual forward masking setting where the masker and the target presented within each
observation interval of the two-interval task can be expected to be grouped together. As
expected, the former condition resulted in a significantly smaller masker-induced elevation of
the intensity difference limens (DLs). In Experiment 2, embedding the targets in an
isochronous sequence of maskers led to a significantly smaller DL-elevation than control
conditions not favoring the perception of the maskers as a separate auditory stream. The
observed effects of grouping are compatible with the assumption that a precise representation
of target intensity is available at the decision stage, but that this information is used only in a
suboptimal fashion due to limitations of selective attention. The data can be explained within
a framework of object-based attention. The results impose constraints on physiological
models of intensity discrimination. We discuss candidate structures for physiological
correlates of the psychophysical data.
References