The Psychoacoustics of Sound-Quality Evaluation
Institute of Man-Machine-Communication, Technical University Munich Arcisstr. 21, D - 80333 Munich, Germany
Summary In this paper, the concepts of psychoacoustics which form the basis of sound7quality evaluation are assessed. The relations between stimuli and perceptual judgements are discussed. In particular, the auditory field as well as spectral masking and critical bands are described. Psychoacoustic facts and the correlated models are given for loudness, sharpness, fluctuation strength, and roughness. The paper concludes with some hints for practical applications.
1. Stimuli and perceptual judgements
One goal of the scientific field of psychoacoustics is the quantitative description of listening judgements elicited by physically well defined stimuli. This means that a solid electroacoustic background is prerequisite for meaningful psychoacoustic experiments. With respect to sound-quality evaluation, carefully controlled electroacoustic parameters are indispensable.
As concerns for example the freefield response of headphones, the standard DIN 45 619  gives some guide-lines. In this standard, a loudness comparison of sounds from loudspeaker versus headphone is proposed. However, for example some manufacturers of dummy-heads realize their free field equalization by means of tiny probe microphones placed in the ear-canal of subjects.
The level differences measured in the ear-canal, which can occur for the same perceived loudness from a headphone or in a free field are displayed in Figure 1. The results from Fastl et al.  clearly show that frequency dependent differences AL up to 5 dB show up. This means that the quality of the sound reproduced depends considerably on the procedure used for freefield equalization. This holds true for the playback of recordings by conventional microphones as well as dummyheads.
The data displayed in Figure 1 can be taken as an example that meaningful sound-quality evaluation by psychoacoustic methods is only possible, if the equalization of the transducers has been performed with ultimate care. For the time being, the presentation of sounds by loudspeakers cannot be recommended (see e.g. ).
The psychophysical procedures used in the experiments can influence the results of sound evaluations considerably. Although the books referenced at the end of this paper (a.o. ) give some valuable hints, some basic expertise usually gained in psychoacoustically skilled university labs, is necessary to arrive at meaningful results.
Psychoacoustic data represent a solid basis for the assessment of sound quality problems. However, in addition cognitive and in particular aesthetic effects have to be considered (see ) for cost effective sound quality solutions.
2. Auditory field
The results displayed in Figure 2 illustrate the auditory field. The sound pressure level as well as sound intensity level, sound intensity, and sound pressure are given as a function of frequency. The solid line denotes, as a lower limit, the threshold in quiet, i.e. the level neccesary for a pure tone to be just audible. The dashed line represents the threshold of pain as an upper limit of the auditory field. However, it is recommended that long term sounds should not exceed the limit of damage risk. Hatched areas indicate the regions of speech and music by conventional instruments. With electronic amplification, of course higher levels can be reached. The dotted curve in Figure 2 shows as an example the elevated threshold in quiet of a student who frequently listens to extremely loud music (see also e.g. [I I]).
The human hearing system is extremely sensitive. An acoustic power of only 0.0001 mW may already exceed the limit of damage risk.
3. Masking and critical bands
Masking represents one of the most basic effects in psychoacoustics. Usually the audibility of pure tones in the presence of masking sounds is determined. Figure 3 gives an example for white noise as masker. The level of the just audible test tone is given as a function of its frequency. The dashed curve represents the threshold in quiet, i.e. the audibility of test tones without masker. The solid curves represent masking patterns of white noise at different spectral density levels.
With increasing masker level, the masking patterns of white noise are shifted in parallel towards higher test tone levels. Up to a test tone frequency of about 500 Hz, the masking patterns are horizontal, at higher frequencies an increase with a slope of about 10 dB per decade shows up. Since white noise has a spectral density level which is independent of frequency, the shape of the masking pattern is somewhat unexpected. However, it can be explained on the basis of critical bands described later in this section.
Figure 4 shows the masking pattern of a narrow-band noise centered at 1 kHz with a bandwidth of 160 Hz. The test tone level is given as a function of its frequency. Again the dashed curve represents the threshold in quiet. The solid curves illustrate masking patterns for different levels of the narrow-band noise.
At low-levels of the narrow-band masker, the masking pattern shows a symmetrical shape. However, when increasing the masker level above 40 dB, the lower slope is shifted in parallel, whereas the upper slope gets flatter and flatter. This effect is called the "non-linear upward spread of masking". The slope of the masking pattern towards lower frequencies reaches values of about 100 dB per octave. Having in mind that a first order filter shows a slope of only 6 dB per octave, it becomes clear that meaningful measurements of masking patterns for narrow-band sounds can be achieved only if filters with extremely steep slopes are used.
A basic feature of psychoacoustics is the concept of critical bands. Simply speaking, it is assumed that the sound is analyzed in the hearing system by a bank of filters. Figure 5 shows the bandwidth of these filters (critical bandwidth) as a function of frequency. The dashed lines illustrate useful approximations: Up to a frequency of about 500 Hz, the critical band has a constant bandwidth of 100 Hz, and at higher frequencies, a constant relative bandwidth of about 20% shows up. At frequencies above 500 Hz, the critical bands can be compared with 1/3-oct-band filters, which have a constant relative bandwidth of 23%.
When placing the critical bands next to each other, the critical band-scale or Bark-scale is produced. The name "Bark" is chosen in honor of the late famous acoustician Barkhausen from Dresden.
Figure 6 enables a comparison of the Bark-scale with the frequency scale. In the left panel, the frequency axis shows a linear scaling, in the right panel a logarithmic scaling. The solid curves describe the relations between Bark-scale and frequency scale. The dashed curves indicate useful approximations with formulas to calculate Bark values from frequency values. The formula in the left panel describes a linear relation between Bark and frequency, which is valid up to frequencies of about 500 Hz. A frequency of 200 Hz for example, corresponds to a value of 2 Bark. The formula given in the right panel holds for higher frequencies, where a logarithmic relation between frequency and Bark value shows up. For example, a frequency of 2 kHz corresponds to a Bark value of 13 Bark.
An additional advantage of the Bark-scale is its linear relation to physiological features of the human hearing system, namely the length of the basilar membrane in the inner ear. However, these fascinating relations can not be expanded in this paper.
One of the many advantages of the Bark-scale is illustrated in Figure 7. The masking patterns of narrow-band noises centred at different frequencies are plotted along the Barkscale as solid curves, the threshold in quiet is given as dashed curve.
The patterns shown in Figure 7 can be regarded as filter characteristics installed in the human hearing system. Their arrangement along the Bark-scale has the advantage that their shape is independent of the centre-frequency. Only at extremely low centre-frequencies (70 Hz), some deviations from this rule show up.
In the following, some temporal effects of masking will be described. When a masker impulse is switched off, in the human hearing system decay effects show up, which are called post-masking (sometimes also the term forwardmasking is used; see e.g. ).
Figure 8 illustrates post-masking effects for different masker levels. The level of a just audible short impulse, presented with a delay time td after the end of the masker, is given as a function of the delay time. The temporal pattern of the stimuli is illustrated by the inset in Figure 8.
The post-masking curves displayed in Figure 8 as solid lines show a strong level dependence: the decay starting from a level of 80 dB is much steeper than the decay starting from 40 dB. This nonlinearity contrasts with the dashed lines, which correspond to an exponential decay with a time constant of 10 ms. A comparison of solid and dashed curves in Figure 8 reveals that the decay processes in the human hearing system are not exponential. Irrespective of the starting level, the decay process is finished after a delay time of about 200 ms.
Another nonlinearity of post-masking in the human hearing system is illustrated in Figure 9. The level of the just audible test-tone burst is given as a function of the delay time. Two masker durations, namely 200 ms and 5 ms, are considered.
After a longer masker (200 ms, solid) the decay in the human hearing system is more gradual than after a short masker (5 ms, dotted). These results again reveal that the decay processes in the human hearing system are highly nonlinear.
The combined effects of spectral masking and temporal masking can be demonstrated by means of transient masking patterns (e.g. ). An example is given in Figure 10. The level of the just-audible test-tone is plotted as a function of its frequency as well as its temporal relation to the masker.
The masker is illustrated by the line of arrows extending from 0 to 300 ms. With respect to spectral masking it becomes clear that the masker influences the audibility of the test-tone not only at frequencies where masker energy is physically present, but also at lower and higher frequencies. For durations larger than 300 ms, effects of post-masking show up, illustrating the complicated decay processes in the human hearing system. The effects for negative values of the time scale are called pre-masking. They are assumed to represent complex effects in the neural processing of sounds.
Transient masking patterns as displayed in Figure 10 are taken as a quantitative description of the representation of sounds in the human hearing system.
Loudness represents a dominant feature for sound-quality evaluation. The dependence of loudness of narrow-band sounds on frequency is illustrated in Figure 11. The level of pure tones with the same loudness is given as a function of their frequency. The solid curves in Figure 11 are frequently called "equal loudness contours". They are labelled by the loudness level LN in phon, indicating the level of a 1 kHz tone which produces the same loudness as the test-tone of frequency fT. These curves demonstrate that the hearing system is most sensitive for frequencies around 4 kHz and shows reduced sensitivity at higher are lower frequencies. In particular at low frequencies the equal loudness contours are not shifted in parallel, but show a level dependence.
The dashed line in Figure 11 illustrates the widely used A-weighting. For extremely soft sounds (20 phon) there is fair agreement between A-weighting and the equal loudness contour. At higher levels, e.g. 80 phon however, the attenuation of the A-weighting curve at low frequencies is much too high in comparision to the corresponding equal loudness contour. This means that for everyday sounds, A-weighting underestimates the loudness of their low frequency components.
Another drawback of A-weighting is that it does not consider the dependence of loudness on bandwidth (see e.g. ). Figure 12 illustrates the loudness level of noises as a function of their bandwidth for the case that their A-weighted level LA is kept constant. The solid curve shows that the perceived loudness increases with increasing bandwidth despite the fact that the A-weighted level (dashed curve) is kept constant.
The results displayed in Figure 12 clearly indicate that when using A-weighted levels, the loudness of broadband sounds is systematically underestimated.
The spectral effects of loudness can be assessed by a multi-channel analysis. Usually three procedures for physical loudness measurements are used. The procedure by Stevens  is based originally on oct-band-analysis of the sounds, whereas the procedures of Kryter [161 and Zwicker  are based on 1/3 oct-band-analysis. Kryter's procedure is frequently used for certification of aircraft noise (see e.g. ) while Zwicker's procedure can be used for all kinds of noises as well as speech (e.g. ) or music (e.g. ).
Figure 13 shows the processing of sounds in the loudness model according to Zwicker. The left panel shows the spectral distribution of a narrow-band noise centred at 1 kHz corresponding to 8.5 Bark. In the middle panel, the corresponding masking pattern is displayed. The right panel shows the specific loudness pattern. With some simplification it can be said that its ordinate scale is proportional to the square root of sound pressure of the stimulus.
The most important feature o Zwicker's loudness model is that the area under the specific loudness curve (hatched) is directly proportional to the perceived loudness. This direct relation is the great advantage of loudness patterns in comparision to alternative spectral representations like FFT spectra or 1/3-octave-band spectra.
The loudness procedure illustrated in Figure 13 has been standardized in DIN 45 631, and computer programs, which run on IBM-compatible PCs, are at hand.
In Figure 14 the dependence of loudness on the duration of sounds is illustrated. The solid line shows the dependence of loudness level on the duration of 1 kHz tone-impulses. For comparison, the readings of A-weighted level on a sound level meter are given when using the time constants "impulse", "fast" or "slow".
The perceived loudness remains constant for sounds with durations larger than appoximately 100 ms. For shorter sounds, the loudness level decreases by 10 phon per decade. Neither of the traditionally used tune constants is in complete agreement with features of the human hearing system. However, the time constant "fast" can be regarded as a fair compromise.
Figure 17. Sharpness of narrow-band noise (solid), high pass noise (dashed), and low-pass noise (dotted).
The temporal processing of loudness in a loudness meter [21 ], simulating the human hearing system, can be illustrated by means of Figure 15. The upper panel shows the temporal envelope of a 5 kHz-tone impulse of 100 ms (solid) or 10 ms (dashed) duration. The middle panel illustrates an essential feature of temporal processing: At the end of the 10 ms sound impulse (dashed), the decay of specific loudness is much steeper than at the end of the 100 ms sound impulse (solid). This temporal nonlinearity was already illustrated in Figure 9. The lower panel of Figure 15 shows the time patterns of overall loudness. The 10 ms impulse reaches only half the loudness of the 100 ms impulse, corresponding to a reduction of loudness level by 10 phon as displayed in Figure 1.1.
FM-tones represent an excellent tool to study both spectral and temporal effects of masking and loudness. An example is given in Figure 16. Masking and loudness patterns are displayed for an FM-'tone centred at 1500 Hz with a frequency deviation of ±700 Hz. "Snapshots" for instantaneous frequencies of 800 Hz, 1500 Hz, and 2200 Hz are given. Panels a and b show psychoacoustically measured masking patterns and physically measured loudness patterns for a modulation frequency of 0.5 Hz. At such low modulation frequencies, the hearing system can follow the pitch excursions, and accordingly shifts in the masking patterns or loudness patterns are visible. Figure 16 c and d show the masking patterns and loudness patterns for a modulation frequency of 128 Hz. In this case, the hearing system can no longer follow the pitch excursions and unpleasant sounds with broad spectral distributions show up.
The patterns displayed in Figure 16 illustrate that FMtones represent an excellent tool to check the accuracy of simulations of the nonlinear spectral and temporal behaviour of the human hearing system. These simulations also can be used as measurement tools in sound quality research, assessing the distribution of specific loudness to achieve a desired product sound.
Sharpness represents an attribute for the evaluation of timbre (e.g. ). Figure 17 shows the dependence of sharpness on frequency. The solid line gives the results for narrow-band noise with a bandwidth of 1 Bark. With increasing centre frequency of the narrow-band noise, the perceived sharpness increases. The dashed curve gives the results for high-pass noise of different cut-off frequencies, the dotted curve shows the data for low-pass noise.
The results displayed in Figure 17 clearly reveal that for high values of sharpness significant spectral components at high frequencies are necessary.
A model of sharpness can be illustrated by means of Figure 18. In the left panel, the spectral distributions of a narrow band noise (solid), a broadband noise (dashed), and a highpass noise (dotted) are displayed. Essentially, the model of sharpness is based on the loudness patterns as shown in Figure 13. However, because of their great relevance for the perception of sharpness, the higher frequencies are boosted by a weighting function g. The arrows in the right panel of Figure 18 illustrate that the sharpness corresponds to the centre of gravity of the weighted loudness patterns.
The patterns displayed in the right panel of Figure 18 indicate interesting possibilities for sound-quality design: By adding low frequencies, the sharpness of sounds can be reduced. Despite the fact that loudness is somewhat increased by such modifications, in many cases because of the reduced sharpness, the resulting sound is prefered. This holds in particular for sounds, which are already relatively soft.
6. Fluctuation strength
Temporal variations of sounds can lead to two different perceptions: Fluctuation strength at low frequencies of variation, and roughness at higher frequencies of variation. Although there is no strict limit between fluctuation strength and roughness, for modulation frequencies larger than about 20 Hz, fluctuation strength vanishes and roughness takes over.
The dependence of fluctuation strength on relevant stimulus parameters is illustrated in Figure 19. Data displayed in the left panel reveal that, as a function of modulation frequency, fluctuation strength shows a bandpass characteristic with a maximum around a modulation frequency of 4 Hz. This bandpass characteristic holds true for amplitude modulated as well as frequency modulated sounds. It is also valid for sounds with stochastic amplitude variations like for example narrow band noise with a bandwidth Δf . In this case, the effective modulation frequency has to be taken into account, which can be calculated by the formula fmod = 0.64Δf (see ).
The middle panel shows the dependence of fluctuation strength on modulation depth. Up to variations in the temporal envelope of about 3 dB, no fluctuation strength is perceived. For higher modulation depth, d, fluctuation strength increases linearly. This dependence holds true for amplitude modulated broad-band as well as narrow-band sounds. The right panel in Figure 19 shows the dependence of fluctuation strength on level. For a level increase by 40 dB, fluctuation strength increases approximately by a factor of 3. This result holds true for amplitude modulated as well as frequency modulated sounds.
A model of fluctuation strength can be illustrated by means of Figure 20. The hatched areas indicate the variation of the temporal envelope of a sinusoidally amplitude modulated sound, when plotted in terms of level. The solid curve indicates the related temporal masking pattern. Because of post-masking effects, the hearing system is not able to "listen completely into the valleys". Therefore, the depth, ΔL, of the masking pattern is smaller than the physical modulation depth, d, of the sound. In principle, fluctuation strength can be calculated on the basis of the masking patterns by the formula displayed in the right panel of Figure 20. For the description of fluctuation strength, the depth, ΔL, of the temporal masking pattern as well as the modulation frequency in relation to a modulation frequency of 4 Hz are of relevance.
Figure 21 shows the dependence of roughness on some relevant stimulus parameters. The left panel in Figure 21 illustrates that roughness shows a bandpass characteristic as a function of modulation frequency with a maximum around 70 Hz. Data displayed in the middle panel of Figure 21 reveal that with increasing degree of modulation, roughness increases. This holds true for narrow-band as well as broadband sounds. Data displayed in the right panel of Figure 21 show that with a level increase by 40 dB, roughness increases by approximately a factor of 3. This holds true for amplitude modulated as well as frequency modulated sounds.
Figure 22 illustrates the main features of a model of roughness. Again the depth ΔL of the temporal masking pattern plays a crucial role. As indicated in the right panel, roughness is proportional to the depth of the masking pattern multiplied by the modulation frequency. This magnitude can be interpreted as the speed of variations in the temporal masking pattern.
A more detailed model of course has to take into account the distribution of the depth, ΔL, of the temporal masking pattern along the critical band rate scale. In particular for frequency modulated sounds, the ΔL-values show a strong variation along the Bark scale, leading to large values of roughness for these sounds.
8. Practical examples
In this section, a few hints to practical examples will be given. More information about the usefulness of psychoacoustic principles in general and also for sound-quality evaluation can be found in the web (see reference  at the end of this paper).
Figure 23 shows the loudness patterns of the sounds produced by a circular saw. The left panel gives the data for a circular saw blade with normal spacing of the teeth, whereas the right panel shows the data for irregular spacing of the teeth. The comparision of the two pattens reveals two main differences:
1. The area under the curve in the left panel is much larger than the area under the curve in the right panel, indicating a higher loudness of the conventional circular saw blade.
2. The spectral components at high frequencies are much smaller in the right panel than in the left panel, indicating that the modified circular saw blade produces less sharpness.
Although the noise-reduced circular saw blade sounds not yet really pleasant, both its loudness and sharpness could be reduced substantially in comparison to a conventional blade, leading to significantly improved acceptance by the customer.
In many practical cases, noises containing tonal components are considered to be particularly annoying. As an example, Figure 24 shows loudness patterns of broadband sounds with a tonal component of different strength. These types of sound were assessed in psychoacoustic experiments with respect to the prominence of their tonal components. One goal of these studies was to give hints for the magnitude of tone corrections, which are used in several countries for the evaluation of industrial noise.
A computer program for IBM-compatible PCs is at hand, which calculates the magnitude of the tone correction on the basis of critical bands. Psychoacoustic experiments with experts from the consulting business revealed that the magnitude of the tone correction is frequency dependent. A corresponding weighting function was developed and is proposed for inclusion into DIN 45 681.
A final example describes the application of psychoacoustic principles in sound- quality design which; however, is somewhat questionable. Figure 25 shows the loudness patterns of two motorcycles, which both fulfill the legal limit of 70 dB(A). However, when comparing the areas under the patterns, it is easily seen that the motorcycle with a loudness distribution as shown in the right panel, is more than 30% louder than the other motorcycle. In addition, the spectral components at high frequencies are boosted, increasing the sharpness considerably, which leads to an "aggressive sound". The motorcycle with the loudness pattern displayed in the right panel could increase its marketshare within few years substantially. However, as university people striving for better sound quality, we have doubts, whether this can be regarded as a positive example of sound-quality design based on psychoacoustic principles.
J.Blauert: Spatial hearing. MIT Press, Cambridge MA, 1996.
H.Fastl:BeschreibungdynamischerH&rempfindungenanhand von Mithürschwellen-Musters. Hochschulverlag, Freiburg, 1982.
 J. Helibrück: H&ren. Hogrefe, Gottingen, 1993.
 B. Moore: An introduction to the psychology of hearing. Academic Press, London, 1997, IV. Edition.
[51 A. Schick: Schallbewertung. Springer-Verlag, Heidelberg, 1990.
 E. Zwicker, H.Fastl:Psychoacoustics-Facts and models. Springer-Verlag, Heidelberg, 1990.
 DIN 45 619: Teil 1, Kopfhtirer. 1986.
 H. Fastl, W. Schmid, G. Theile, E. Zwicker: Schallpegel im Gehôrgang für gleichlaute Schalle aus Kopfhtirem oder Lautsprechern. Fortschritte der Akustik, DAGA'85. Verl.: DPGGmbH, Bad Honnef, 1985. 471-474.
C. Fend, H. Prante, C. Maschke, N. Boemak: Untersuchung zur subjektiven Übereinstimmung von Lautsprecher- and Kopfhôrerdarbietungen. Fortschritte der Akustik, DAGA 95, Verl.: Dt. Gesell. für Akustik e. V , Oldenburg, 1995. 915-918.
 J. Blauert: Cognitive and aesthetic aspects of noise engineering. Proc. Inter-noise'86, 1986. Voi.I, 5-14.
 F. Rudloff, H. von Specht, S. Penk, J. Pethe, G. Schuschke: Sozioakusis infolge
exzessiven Musikkonsums? Fortschritte der Akustik, DAGA 95, Verl.: Dt. Gesell. für
Akustik e. V, Oldenburg,l995.187-190.
 R. Plomp: Rate of decay of auditory sensation. J. Acoust. Soc. Am. 36 (1964)
 H. Fastl: Temporal masking effects: III. Pure tone masker. Acustica 43 (1979) 282-294.