ENGINEERING REPORTS

Evaluating a Measurement System*

T. SPORER, AES Member, AND U. GBUR**

Lehrstuhl Technische Elektronik, Universität Erlangen-Nurnberg, Erlangen, Germany

AND

J. HERRE, AES Member, AND R. KAPUST

Fraunhofer-Institut für Integrierte Schaltungen, Erlangen, Germany

Noise-to-mask ratio (NMR) is a perceptual measurement scheme which gives information about the distance between actual noise and masking threshold. It has been shown to be a useful tool in the development and comparison of perceptual coding schemes. Some "perceptual experiments" carried out with the measurement system as a "test subject" are presented. The results of these measurements are compared with the results obtained with human listeners.

O INTRODUCTION

Perceptual coding schemes have entered the consumer market. Digital Compact Cassette (DCC) and Mini Disk (MD) have recently been introduced, Digital Audio Broadcasting (DAB) will follow soon. The audio quality of the consumer equipment used in this field has reached such a level that evaluating it has become very difficult and expensive. Classical measurement techniques as, for instance, signal-to-noise ratio have proved to be use less in measuring perceptual coding schemes [ Several perceptual measurement schemes have been presented in recent years [ They all use a more or less complex model of the human ear. The key problem of perceptual measurement systems is to prove whether they can imitate the behavior of the human ear. In this engineering report we present several psychoacoustic experiments which have been carried out with the noise- to-mask ratio (NMR), a novel measuring system, serving as the "test subject."

1 PSYCHOACOUSTIC REQUIREMENTS

In this section we give a short overview of the kind of psychoacoustic effects that might be important for the measurement of audio systems.

1.1 Critical Band and Critical Rate

In the human auditory system the sound intensities within one critical band are added to form a sum intensity. This sum intensity creates a combined masking threshold [ The width of one critical band has been defined as 1 bark. The size of a critical band can be approximated by the formula of Zwicker and Feldkeller [8]

where f is the center frequency and z the bandwidth. The frequency f can be in the range of 20 Hz to about 16 kHz [ The size of the critical bands depends partly on the structure of the stimulus. Different authors give different results for Δf [7], [9]-[11].

1.2 Simultaneous Masking

If masker and test signal are present at the same time, the test signal might be inaudible, even if it is consider ably above the listening threshold in quiet. The threshold at which the test tone is masked depends on:

1) the intensity of the masker and

2) the frequency structure of the masker and the test tone.

In general we can distinguish between three different masking situation- noise-masking tone, tone-masking noise, and tone-masking tone. If noise signals are involved, we have to look at the bandwidth of the noise. The masking effects of white noise, low-pass noise, and high-pass noise can be explained by the superposition of the masking effects of several narrow-band noise signals with a bandwidth of 1 bark. Other problems occur if tones are involved. Especially at higher sound pressure levels sum and difference signals are generated in the human ear. Even if the test signal may be masked, the sum and difference signals can be audible [ Figs 1—5 show masking thresholds found in the literature.

1.3 Temporal Masking

All the effects discussed so far are valid for stationary signals. For transients, or for signals having a variable structure, additional effects occur.

1) The masking threshold after the start of a masker is increased as compared to simultaneous masking. This is known as the overshoot effect [

2) After the end of a masker, the masking threshold needs some time to fall down to the threshold in quiet (forward masking, also called post masking). The slope and the fall time of the forward masking curve depend on the frequency domain structure of the masker and the test signal, the intensity of the masker, and the duration of the masker.

3) Prior to the start of the masker, the masking thresh old changes from the masking threshold in quiet to the threshold for simultaneous masking. The backward masking effect (also called pre masking effect) depends on the frequency-domain structure of the masker and the test signal and the intensity of the masker.

Figs. 6—8 show masking thresholds for simultaneous masking found in the literature.

1.4 Threshold in Quiet and Listening Conditions

Fig. 9 shows the masking threshold in quiet. Looking at audio transmission systems we normally do not know the sound pressure level of the final reproduction. There are two possibilities to cope with that problem in perceptual measurement systems.

1) Exact approach. One parameter of the measurement scheme is the sound pressure. All the masking effects are calculated dependent on the exact signal levels. In this approach it is even possible to introduce noise levels of certain rooms [6]

2) Worst-case approach. For all masking effects a worst-case approach is used. To construct the worst-case masking curve for simultaneous masking, all masking curves are shifted along the sound pressure axis so that the maskers of the different levels meet at the same point. Now for each frequency the lowest masking threshold is taken. This leads to worst-case masking thresholds with a steep slope closely above the masker and a not so steep slope at higher frequencies. The threshold in quiet is replaced by an estimate of the noise level on the input signal (for instance, the quantization noise of the 16-bit audio signal) [ A measurement scheme using such a criterion is more sensitive to errors than the human ear. If such a system detects no audible errors, one can assume that no error is audible for every listening level.

1.5 Binaural Effects

Today all known audio measurement systems use only one channel for the measurements. Binaural effects, such as binaural masking level differences (BMLD) [ and stereo unmasking [ are therefore not discussed here.

2 GENERAL STRUCTURE OF NMR

The NMR technique was developed in parallel with the optimum coding in the frequency domain (OCF) algorithm [ It proved to be a helpful tool in the development of the perceptual coding schemes of the ISO/MPEG/audio standard. NMR uses a very simple perceptual model. This facilitates implementation on real-time hardware [ The perceptual model used in NMR was frozen several years ago in order to get comparable results with different audio coding schemes over the years. The developers of audio coding schemes became familiar with the interpretation of the measurement results of NMR and learned what "coding artifacts sound and look like."

Fig. 4. Level of test tone just masked by masking tone (1 kHz, 80 dB). Crosshatched areas characterize regions of beating [7]

Fig. 3. Level of critical-band-wide noise (center frequency 250 Hz) just masked by sine tones (sound pressure level 80 dB) [

Fig. 6. Temporal masking. Masking of noise by Gaussian impulse. Noise has same spectrum as Gaussian impulse [ Gaussian impulse centered around t = 0. Left half (backward masking): x-axis shows start time of noise; noise ends at t = 0. Right half (forward masking): x-axis shows stop time of the noise; noise starts at t = 0.

Fig. 7. Temporal masking. Masking of noise by Gaussian impulse (see Fig. 6). In addition both reference and test signals have been filtered with low pass at 300 Hz [

Computation of the difference signal. The noise signal is calculated in the time domain as the difference between the reference signal and the signal under test. For certain applications it is better to take the difference between the power spectra instead of time-domain differences. This reduces the effect of phase errors on the estimated audible noise, but leads to a worse time resolution [20]

Overlap, windowing. The window length is 1024 samples (23.3 ms at 44. 1-kHz sampling rate, 21.3 ms at 48-kHz sampling rate). The calculation is done every 512samples. A Hann window is used.

Computation of the power spectra of both reference signal and noise. A fast Fourier transform (FFT) is used to map both the reference signal and the noise signal into the frequency domain.

Grouping into bands. The signal density S per band is calculated by summing the spectra Y(i),

where u(cb) is the lower band edge and s(cb) the size of the band. The sizes and positions of the bands are similar to the bark scale [7].  It is not possible to reproduce the exact bark scale with this transform. The bandwidth of the lower bands is smaller than 1 bark. The exact values of u(cb) and s(cb) can be found in [ The corresponding grouping is done for the noise density noise_denCb.

Fig.8. Temporal masking. Masking of noise by Gaussian impulse [see Fig.6]. In addition both reference and test signals have been filtered with low pass at 2KHz [13].

Applying masking functions. The signal density SCb is used in the following steps to estimate the masking threshold in each band:

1) Masking within each band and between bands. Fig. 11 shows the spreading function used in NMR. For in-band masking a distance of only 3 dB is used. This seems to be in contradiction with the theory (6 dB for noise-masking tones, even more for tone-masking noise), but proved to be correct, as will be seen later. No additivity of masking is assumed. The shape of the spreading function is a worst-case approach to avoid problems with changing listening levels.

Fig. 10 shows a block diagram of the NMR technique. A step-by-step description of NMR follows.

2) Applying an estimate of the absolute threshold. The absolute threshold is assumed to be at the energy level of a signal with 1/2 LSB in one line of the FFT in each band. Only for frequencies above 12.5 kHz a somewhat (12 dB) higher absolute threshold is used.

3) Temporal masking. Premasking occurs within the window length used in NMR. Therefore no additional premasking is taken into account. The threshold of post- masking is estimated to decrease by 6 dB per 512-sample block for each band independently. Pre- and postmasking are estimated only very roughly.

The result of these three steps is the mask density mask_denCb.

Calculation of NMR and masking flag. For each band the level of the error signal is compared to the masking threshold of the band. If in at least one band of a block the masking threshold is exceeded, the so-called mask ing flag is set. The local NMR (NMR value of the current block) in decibels is defined as:

3. MEASURING A MEASUREMENT SYSTEM

For the purpose of measuring the masking threshold of NMR we used the following approach:

1) A sound file (ret) with the reference signal was created.

2) A sound file (sut) containing the sum of the reference signal and the error signal was created. Note: In this context the error signal is the test signal of the psychoacoustic experiment.

3) Ref and sut are compared by means of NMR. For calculating the "listening threshold of NMR" the masking flag is used.

3.1 Masking in the Frequency Domain

All measurements are made with reference signals and test signals in the range of 20 Hz to 20 kHz. The band- limited noise is created from white noise with FIR filters (4096 taps) with a stopband attenuation of at least 90 dB and a bandwidth according to the critical bandwidth [ The ref/sut configuration is updated every 200 ms. At each transition between succeeding configurations the masking flags of at least two blocks are ignored. For each ref/sut configuration the fraction of blocks with masking flags set is calculated to find the probability of detection. The masking threshold is interpreted as the lowest level with more than 50% of blocks having masking flags set (within the same ref/sut configuration). This models the statistical behavior of human test subjects.

For sine tones 12 frequency steps per octave are used. For critical-band-wide noise three frequency steps for the tests in this section. Note: This figure is an example for a tone-masking noise and a noise-masking tone setup. The only difference between the two is whether the sine tone or the noise is used as the reference signal.

3.1.1 Noise-Masking Tone

For the noise-masking tone the level of the reference signal was chosen as — 16.7 dB. The test signal in creases every 200 ms by 1 dB starting at —90 dB.

3.1.2 Tone-Masking Noise

For the tone-masking noise the level of the test signal was chosen as —40.8 dB. The level of the reference signal decreases by 1 dB every 200 ms beginning at —6 dB. Note that this will result in the so-called psycho- acoustic tuning curve [ Both frequency and level axes look inverted as compared to the "normal" masking curves.

3.1.3 Tone-Masking Tone

For the tone-masking tone the level of the reference signal was chosen as — 15 dB. The test signal increases every 200 ms by I dB starting at —90 dB.

3.2 Masking in the Time Domain

For masking in the time domain Gaussian impulses are used for the masker and for the test signal. Gaussian impulses provide greater energy than simple clicks. The equivalent rectangular duration of the impulses used is 56.4 p The centers of the impulses are used as the reference position for the time scale. Fig. 13 shows one impulse, Fig. 14 the power density spectrum of the impulse. It can be seen that the spectrum is nearly flat in its auditory relevant parts. In order to test whether the position of the impulses in the analysis window of NMR is important for the sensitivity, eight time shifts of the test sequence (64 samples, 1.45 ms) are used. The lowest, the highest, and the average thresholds are shown in Fig. 14. The masker was chosen with maximum amplitude (16 bit). Note that for the measurements in the time domain 0 dB is defined as the maximum amplitude of a Gaussian impulse. This shifts the results about 40 dB as compared to the measurements in the frequency domain.

4 RESULTS

4.1 Masking in the Frequency Domain

4.1.1 Noise-Masking Tone

Fig. 15 shows the level of a sine tone just masked by critical-band-wide noise with a level of — 16.7 dB and center frequencies of 250 Hz, 1 kHz, and 4 kHz. Comparable results can be obtained with maskers in the range of 100 Hz to 12 kHz. The shape of the masking curve gets flatter for center frequencies below 100 Hz. The minimum distance between masker level and test tone test subjects [8]

Fig.14. Spectrum of Gaussian impulses used for measurements

 

At high frequencies of the test tone a kind of "masking threshold in quiet" can be seen. For the calculation of the masking flag only frequencies below 16 kHz are taken into account. A sine tone above 16 kHz can only cause masking flags if its level is very high. Due to the finite frequency selectivity (finite steepness of the prototype filters) of the FFT a loud signal can produce significant components in the FFT spectrum at lower frequencies.

The masking curve to lower frequencies below 120 Hz is increased for the 250-Hz masker. This is caused by the limited frequency resolution of the filter bank. There is a significant part of the energy of the masker not only in the third and fourth bands of the NMR (172 to 258 Hz and 258 to 345 Hz [ but also in the second band (86 to 172 Hz).

The slope to lower frequencies is 28 dB/bark as proposed by Brandenburg 151. The slope to higher frequencies is — 15 dB/bark near the masker and —7 dB at a greater distance. In [ similar values are proposed in accordance with measurements in [21].

Fig. 12. Typical configuration used for masking in frequency domain. Critical-band-wide noise at 1 kHz (bandwidth 200 Hz) and sine tone at 2 kHz.

Fig. 13. Example of Gaussian impulse used for measurements.

4.1.2 Tone-Masking Noise

Fig. 16 shows some examples of tone-masking noise. A critical-band-wide noise of constant level (—40.8 dB) is masked by sine tones of different frequencies. In this measurement the amplitude of the masker was changed whereas the level of the test signal was held constant. In this figure the threshold is the level of the masker at which the noise was masked. The narrow-band noises in the figure are placed at center frequencies of 250 Hz, 1 kHz, and 4 kHz. With all other center frequencies in the range of 100 Hz to 12 kHz comparable results can be obtained. The shape of the masking curves becomes flatter for frequencies below 100 Hz. The minimum distance between masker level and test signal level depends on the center frequency of the noise: 6.7 dB at 250 Hz, 10.8 dB at 1 kHz, and 14.8 dB at 4 kHz. Measured thresholds for human test subjects are in the range of 14 dB (at 1 bark) to about 28 dB (at 23 bark) [ The slopes of the curves are in good agreement with [8].

The curve measured for the 4-kHz noise is not centered around the noise. This does not correspond to results from psychoacoustic experiments with human listeners.

4.1.3 Tone-Masking Tone

Fig. 17 shows results of the tone-masking tone. A sine tone of variable frequency is masked by a sine tone with 250 Hz, 1 kHz, or 4 kHz. With all other sine tones in the range of 100 Hz to 16 kHz (as the masker) comparable results can be obtained. The minimum distance between masker and test signal depends on the frequency of the masker: 7 dB at 250 Hz, only 2 dB at 1 kHz and 4 kHz. These differences are very small and not in accordance with psychoacoustic results as obtained by Zwicker and Fastl [7].  Beating, difference, and sum tones are not modeled.

4.2 Masking in the Time Domain

Fig. 18 shows some results on forward masking and backward masking. As expected, the forward masking is 6 dB per block. The backward masking is very short, about 53.4 dB per block. The average backward masking is very near the psychoacoustic requirements for masking caused by Gaussian impulses [ However, there is a big difference between maximum and minimum curves. This may cause a kind of uncertainty in the detection of pre echoes. For other maskers the backward masking may be too sensitive. The forward masking fits the requirements for most kinds of maskers [ For Gaussian impulses the forward masking time should be assumed shorter.

Fig. 15. Noise-masking tone (NMR as test subject). Masking of sine tones by narrow-band noise signals at center frequencies at 250 Hz, 1 kHz, and 4 kHz and — 16.7-dB level.

5 CONCLUSIONS

It has been shown that NMR is able to model the temporal resolution of the human ear in an adequate way. Premising that the measurement technique should not be less sensitive than human perception, the time- domain behavior of NMR meets the requirements of psychoacoustic experiments.

Looking at the frequency domain behavior NMR can model noise-masking tones and tone-masking noise better than could be expected from its simple algorithmic structure. There are some limitations, especially at low frequencies. Beating, difference, and sum tones, which occur with tone-masking noise in particular, are not modeled in NMR.

These results are in good agreement with the experience of the users of the real-time implementation of NMR. For most kinds of signals the measurement technique can be used to avoid time-consuming listening tests. Especially for low-frequency signals, listening tests should be carried out in addition to the measurement process.

Future work on NMR will focus on the implementation of an explicit model of the tonality (to distinguish between noise and tone signals) and a masking scheme for temporal masking, which includes the temporal and frequency structure of the masker. Further improvements should also include the extension of NMR to bin aural measurements.

6 REFERENCES

[ C. Grewin and T. Ryden, "Subjective Assessments on Low Bit-Rate Audio Codecs," in Proc. 10th Int. AES Conf. (London, UK, 1991).

[ B. Paillard, P. Mabilleau, S. Morissette, and J. Soumagne, "PERCEVAL: Perceptual Evaluation of the Quality of Audio Signals," J. Audio Eng. Soc., vol. 40, pp. 21—31 (1992 Jan./Feb.).

[ R. Kapust, "A Human Ear Related Objective Measurement Technique Yields Audible Error and Error

Fig. 17. Tone-masking tone (NMR as test subject). Masking of sine tones by sine tones at 250 Hz, 1 kHz, and 4 kHz and —6- dB level. margin," in Proc. 11th Int. AES Conf. (Portland, OR, 1992).

[ K. Brandenburg and T. Sporer, "NMR’ and ‘Masking Flag’: Evaluation of Quality Using Perceptual Criteria," in Proc. 11th mt. AES Conf. (Portland, OR, 1992).

[ J. G. Beerends and J. A. Stemerdink, "A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation," J. Audio Eng. Soc., vol. 40, pp. 963—978 (1992 Dec.).

[ J. R. Stuart, "Noise: Methods for Estimating Detectability and Threshold," presented at the 94th Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 41, p. 387 (1993 May), preprint 3477.

[ E. Zwicker and H. Fastl, Psychoacoustics—Facts and Models (Springer-Verlag, Berlin, Germany, 1990).

[ E. Zwicker and R. Feldkeller, Das Ohr als Nach richtenempfanger (Hirzel-Verlag, Stuttgart, Germany,1967).

[ B. Scharf, "Critical Bands," in Foundation of Modern Auditory Theory (Academic Press, New York, 1970), pp. 159—203.

[ H. Scholl, "Das dynamische Verhalten des Ge- hors bei der Unterteilung des Schallspektrums in Frequenz gruppen,"Acustica, vol. 12, pp. 101—107 (1962).

[ B. C. Moore, An Introduction to the Psychology of Hearing (Academic Press, London, UK, 1989).

[ T. Sporer and H. Schröder, "Measuring Tone Masking Noise," presented at the 93rd Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 40, p. 1038 (1992 Dec.), preprint 3349.

[ J. Spille, "Messung der Vor- und Nachverdecknung bei Impulsen unter kritischen Bedingungen," In tern. Rep., Thomson Consumer Electronics, Hanover, Germany (1992).

[ E. Zwicker and W. Heinz, "Zur Haufigkeitsver teilung der menschlichen Hörschwelle," Acustica, vol. 5, pp. 75—80 (1955).

[ K. Brandenburg, "Em Beitrag zu den Verfahren und der Qualitatsbeurteilung für hochwertige Musik signale," Ph.D. dissertation, Universität Erlangen-Numberg, Erlangen, Germany (1989).

[ J. Blauert, Räumliches Hören (Hirzel-Verlag, Stuttgart, Germany, 1974).

[ J. Herre, E. Eberlein, and K. Brandenburg, "Combined Stereo Coding," presented at the 93rd Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 40, p. 1041 (1992 Dec.), pre print 3369.

[ K. Brandenburg, "OCF—A New Coding Algo rithm for High Quality Sound Signals," in Proc. ICASSP (IEEE, New York, 1987), pp. 141—144.

[ J. Herre, E. Eberlein, H. Schott, and C. Schmidmer, "Analysis Tool for Real Time Measurements Using Perceptual Criteria," in Proc. 11th Int. AES Conf. (Portland, OR, 1992).

[ T. Sporer and K. Brandenburg, "Constraints of Filter Banks Used for Perceptual Measurement," J. Audio Eng. Soc., vol. 43, p. 107—116 (1995 Mar.).

[ E. Zwicker, Psychoakustik (Springer, Berlin, Heidelberg, New York, 1982).

[ J. D. Johnston, "Estimation of Perceptual En tropy Using Noise Masking Criteria," in Proc. ICASSP (IEEE, New York, 1988), pp. 2524—2527.