Objective Measure of Sound Colouration in Rooms
X. Meynial, O. Vuichard
Centre Scientifique et Technique du Batiment, 24 rue Joseph Fourier, 38400 Saint Martin d'Heres, France
Summary The growing interest in assisted reverberation systems for multipurpose halls has highlighted the need for new objective measures of particular artifacts due to these systems, such as colouration caused by acoustic feedback from loudspeakers to microphones. A simple method based on the analysis of the distribution of the modulus of the frequency response is presented. The distribution is compared to the theoretical Rayleigh distribution and the difference is expressed in terms of a limited number of indices. A numerical analysis of this method is presented, and the accuracy of the indices estimated. Results obtained from objective and subjective evaluation of colouration with measured impulse response show that the objective indices can distinguish clearly diffusive from nondiffusive passive rooms, and are a relevant measure of colouration in active rooms equipped with assisted reverberation systems.
PACs no. 43.55.H, 43.55.)
Introduction
Active reverberation systems (ARS) are installed in an increasing number of venues in order to adapt the acoustic characteristics of a room to various types of programme materials ranging from speech to symphonic music.
Thanks to recent progress in digital signal processing technology, most of these systems are now quite efficient (see e.g. a review by Kleiner [1]). One of the major difficulties concerns the installation and tuning of a given system in a given auditorium. This problem is partly due to the difficulty to characterise objectively and thoroughly the acoustical quality of the result. Traditional indices such as TR, EDT, C80,... have been invented for "passive acoustics", and do not give information on particular features of "active acoustics" such as colouration, naturalness, and localisation of the reverberation.
New indices are therefore necessary to complement the description of traditional ones. They would be helpful tools at the design and installation stages.
Using theoretical and perceptive data, Kuttruff [2] has studied the relation between colouration and the gain of a ARS cell. Threshold of colouration detection is reached when the margin to instability is approximately 12 dB. No actual measure of colouration is proposed. Behler [3] has presented a theoretical and numerical analysis of statistical parameters of frequency responses of rooms equipped with an ARS, but does not propose a method for measuring colouration either .
Poletti [4] also analyses colouration of simulated transfer functions of rooms equipped with different ARSs by comparing their distribution with theoretical Rayleigh distribution, but he does not present results from actual measurements.
Recently, Nielsen [5] proposed a very interesting method based on the Modulation Transfer Function to characterise the colouration due to an ARS. In view of monitoring colouration during a performance, he uses music (or speech) as an input signal. In addition, his method is compatible with the use of time variant filters in the ARS. But, as pointed out by Nielsen, the difficulty of capturing the source signal is not solved, and the measurement is quite long.
As is well known, colouration is more accurately detected when listening to impulse responses (pistol shots, MLS measurements, hand claps... ) than when listening to music or speech. In the following, we describe the approach we derived for evaluating colouration from impulse responses (IRs) measured in rooms.
Note that in this paper, we shall talk about impulse responses even when the system is not strictly timeinvariant, as for rooms equipped with an ARS which uses time variant filters.
2. Principle
Let h(t) be an impulse response captured in a room, and H(w) the Fourier Transform of h(t). In a passive diffusive room, the modulus of the frequency response IHI between a source and a receiver situated in the reverberant field is distributed according to a Rayleigh law as explained by Kuttruff [6]. The ratio of the standard deviation σ of I H I to the mean value m of I H I is
This implies time windowing the measured impulse response h(t) in order to
 remove the beginning which obviously does not correspond to the diffuse field hypothesis,
 remove the end which is always contaminated by the background noise.
Let [t1;t2] he the tulle window. Time t1, should he greater than the mixing time tm, [7]. As mentioned by Kuttruff the mean distance between adjacent peaks in IHI is 4/T. Consequently, the time window [t1;t2] should he such that the frequency resolution is sufficient:
T1 – t1 >T/4
This duration corresponds to 15 dB decay in the IR. II makes sense to set the time window relative to the decay rate. In the following. we shall denote ell and d2 the decay values corresponding to times t1, and t2, respectively. We shall talk further about the choice of the lime window Iater. Let hw(t) be the windowed IR:
Colouration is of course better detected when listening to the late part of the IR. The ear performs a kind of "autogain" operation so that it detects even very low energy colouration in the tail of the room response to extinction transients. We have therefore chosen to analyse the amplitude spectrum of the IR after it has been weighted by a function which compensates for the decay, so that the signal level is constant over time. This operation makes the sensitivity of our set of indices (see below) better. The signal is now
where T is the reverberation lime. Of course, this decay compensation is not perfect in practice because the decay constant is frequency dependent, or the decay may be doublesloped. However, experience hits shown that the uncertainty on the compensation is not critical.
Then, the spectrum Hωω(ω) of hωω(t) should be equalized in order to compensate for the effect of the frequency response of the transducers used for measurements and for the acoustical properties of the room on a large frequency scale (e.g. dependence of mean absorption upon frequency).
where M is the above mentioned slowly varying ‘’mean’’ value. G is the rapidly varying part. and ω, is the angular frequency. M can be estimated by a logarithmic smoothing of I Hω I of width I/3rd to I/10th octave typically (see next section). Figure 1 shows an example of a 0.15 octave wide smoothing performed on an IR captured with the MLS technique. It illustrates how, necessary this smoothing can be. Then, as the mean of G equals unity by construction, the distribution of I HωI can be examined by simply looking at the distribution of G(ω). If I H I is Rayleigh distributed, the standard deviation of G equals 0.523. A larger value is obtained if there is culmination (smaller values can be obtained only if there is a high energy direct sound within the time window).
Figure 1. Modulus of the frequency response obtained by FFT of the late part of the decay of an IR captured in an amphitheatre with the MLS technique (grey). A 0.15 octave smoothing is also shown (black). Vertical: arbitrary scale.
M (ω) is linked to a broadband aspect of colouration, maybe something like "timbre". This aspect can probably be accounted for by traditional indices such as reverberation time or loudness as a function of frequency for example. We shall not talk further about it here.
A frequency windowing must now be applied to G(ω):
Frequency f1, should be greater or equal to the Schroeder frequency of the room (fs = 2000 √T/V where V is the volume of the room), but fs, is usually so small that there are only very few, frequency samples below, it. Frequency, f2 should be less or equal to the cutoff frequency of the transducers used for measurements.
Eventually, the distribution of Gωis examined in terms of its fit with a Rayleigh law. Let D be the backward integrated histogram (BIH) of Gω: point (x,y) of D means that x percent of the samples in Gω exceed y. Let G be the quadratic error between D and the backward integrated histogram Dr of a theoretical Ravleigh distribution R:
Figure 2. Coloration indices versus time window position for an uncoloured IR ",f(0) and a slightly coloured IR "b" (+). Window width is 0.74 s (16384 samples). Horizontal: upper boundary of the time window (t2). Smoothing width used is 1/33rd octave. Frequency window is 150 Hz: 4 kHz ]. 
We'll also look at the standard deviation σG of Gω. Note that E and σG, are not completely redundant: E tells to what extent the distribution of Gωdiffers from R and σG tells about the nature of the difference. Finally. we shall also consider index L1% which is the level (in dB) exceeded by 1 percent of the samples in Gωand index Lmax, which is the maximum value of Gω(in dB).
If the signal is not Coloured (i.e. Gωis Rayleigh distributed), then
E=0. σG=0,523, L1%=7,62 dB, Lmax~10.5 dB
Processing an IR involves the following steps: a. the IR is filtered in the [f1’f2]frequency band b. the signal amplitude is integrated over 50ms (this somewhat arbitrary value is not critical and yields adequately smooth decays) c. the reverberation time T is estimated, and the dynamic range is checked: if d2 is not greater than the background noise plus a margin m, calculation is aborted d. times t1 and t2 corresponding to decay values d1 and d2 are calculated e. the IR is multiplied by exp(6.91t/T) to compensate for the decay f. and FFT is performed g. the spectrum is smoothed in order to obtain Gω h. the distribution of the amplitude spectrum of G„. is analysed, and the indices ( Ε, Σg, L1%, Lmax) computed. Note that the histogram of Gω, is required for the calculation of σG and L max only. The choice of the time window and other parameters is discussed in the next paragraph. 3. Numerical analysis The method described above uses several parameters which influence should be analysed. These parameters are: The width and position of the tine window [t1’t2] the signaltonoise margin in. the smoothing width. the frequency window [f1; f2), and the number of bins used in the BIH of Gω.
Figure 2 shows σG , E, L1% and Lmax as a function of the position of the time window for two IRs. Impulse response "a" is uncoloured, and impulse response "b" is subjectively slightly Coloured (see section 5). These IRs were measured with a pistol shot and an omnidirectional electret microphone in a 1200 seat auditorium. Sampling rate was 22.05 kHz. The dynamic range is about 60dB as can be seen on Figure 3 which shows the 50ms integrated decay for the same two IRs (80dB peaktonoise dynamic range). Reverberation time RT30 equals 1.27s for IR "a", and 1.74s for IR "b". On Figure 2, horizontal axis gives the upper boundary of the time window (t2).
It can be seen for the uncoloured IR "a" that when the time window approaches the background noise the indices deviate from their normal value, which indicates that [lie background noise distribution is not gaussian. The limit is around t2~1.3s. for which the background noise level is about 10dB below the level of the decay (sec Figure 3). Minimum margin m is therefore set at IO dB. It is also clear oil Figure 2 that for the coloured IR "b". indices reveal colouration more clearly as the time window is shifted towards the late part of the IR. The time window should therefore be chosen as late as possible in the IR, but not too late for the signal not to be contaminated by the background noise.

The decay compensation operation obviously makes the process very sensitive to background noise. Index E has a more chaotic behaviour than σG (even for larger number of bins used in the histogram), and its evolution versus the time window position is not as smooth as that of σG. Computation of E is slower than computation of σG because σG does not require calculating the BIH of Gω,. Furthermore, unit of E is difficult to interpret.
Index L1% does not distinguish the two IRs as clearly as σG because of a larger uncertainty. The quantization effect visible on L1% values results from the number of bins used in the BIH. Increasing the number of bins reduces the quantization steps (and increases the computation time) but not the uncertainty on L1% values. The Lmax, index has a very similar behaviour to σG, but its dependence upon the time window position is a bit more chaotic.
Figure 4 shows the influence of the time window width on QG. The IR used is impulse response "b". It can be seen that uncertainty on σG (i.e. local variance) rises for window width smaller than one third of the reverberation time or so. The effect of the window width on other indices is similar to the effect on σG.
Considering the discussion above, we think that d1 = 15 dB and d2 = 35 dB (re. to the maximum value of the 50 ms integrated decay) is a sensible choice both for pistol shot measurements and MLS measurements. With m = 10 dB, this value of d2 requires at least 45 dB dynamic range for the integrated IR, i.e. about 6065 dB peak dynamic range. This is easy to achieve with pistol shots, but requires good quality MLS measurements. It would have been possible to place the window later in the decay for (high dynamic range) pistol shot measurements, which would have result in higher sensitivity of the colouration indices, but we prefer to keep the same set of parameter for pistol shots and MLS measurements. In addition, as will be seen later, the sensitivity of the colouration indices obtained with this set of parameters is adequate.
Figure 5 shows the influence of the smoothing width upon σG and the mean value of Gω. One can see that for smoothing width smaller than 0.1 octaves, σG decreases very rapidly because of a bad splitting of ΙΗ(ω)Ιinto M(ω) and G(ω) (equation 1). Excessively large smoothing width also causes a bad splitting, which result in an overestimation of σG. Index σG varies by 0.01 only for a smoothing width within the range [0.15; 0.35] octaves. Note that the efficiency of the smoothing can be checked by looking at the average of Gωwhich should be very close to 1: if it goes below 0.98 or so, the smoothing is not good.
On the other hand, the smoothing width must be kept as low as possible, specially if the spectrum of the sound source is not smooth (i.e. for MLS measurements made with loudspeakers). The smoother the amplitude spectrum of the source, the more efficient the smoothing, and the lesser the sensitivity to the smoothing width value. This is one of the reasons why we prefer using pistol shot measurements to MLS measurements (which use loudspeakers). A smoothing width around 0.2 octaves is a good choice both for pistol shot and MLS measurements.
The choice of the lower frequency f1 of the frequency window is not critical, provided it is smaller than the lower colouration frequencies of interest of course. f 1 = 50 Hz is always adequate in practice.
The upper frequency f2 of the frequency window should be chosen high enough so that it includes all colourations that are encountered in practice ; but not too high because the dynamic range is limited at high frequencies. In addition, considering many points at very high frequencies where colouration is unlikely to appear will cause many Rayleigh distributed points to be considered in the statistics, which results in lower sensitivity of the indices to colouration. f2 = 4 kHz should be a sensible choice.
Results presented in this paper were obtained with a 200 bins histograms dividing the amplitude of Gω in 0.05 steps (so than the last bin correspond to 10, i.e. 20 dB). Values obtained are not significantly altered by the number of bins as long as it is not less than 100 or so. Choosing a very large number of bins will lengthen the computation time without any benefit.
Considering the results in this section, the set of parameters used in the following is:
decay window: d1 = 15 dB; d2 = 35 dB;
signaltonoise margin: m = 10 dB ;
smoothing width: 0.2 octave;
200 bins histogram, with step 0.05 (affects E and L1% only) frequency window: f1 = 50 Hz; f2 = 4 kHz.
The uncertainty (i.e. local variance) over the indices estimated from plots of Figure 1 around t2 = 1.1 s is
±0.01 for σG, ±1.5 for E, ±0.2 dB for L1%, and ± 0.5 dB for Lmax,
and is essentially the same with the recommended set of parameters given above. These values apply in the usual range of variation of the indices. They may increase for very coloured IRs.
4. Results
In case the room comprises time variant elements such as those frequently used in ARS, the IR must be measured using an impulse source rather than steady noise like MLS (or other method assuming a timeinvariant system). Furthermore, as stated above, pistol shots generally have the advantage of having a smoother amplitude spectrum than most loudspeakers, which makes the smoothing more efficient and less sensitive to the smoothing width value. Finally, it is generally quite easy to achieve higher dynamic ranges on IRs obtained from pistol shots than those commonly obtained with MLS. Thus, we recommend pistols shots as sources for measuring colouration. However, both types of sources will be used in the next two paragraphs.
4.1. Passive rooms
Figure 6 shows the BIH of Gω, for different rooms, and the theoretical Rayleigh distribution R. Table I lists the values of QG, E, and L1% obtained from these plots (the IR # will be used in section 5).
It is remarkable that the results of the auditorium (T~1.1 s) fits the Rayleigh distribution better than the reverberation chamber (parallepipedic, which had some absorbent inside: T~ 2.8 s), which is consistent with the subjective evaluation as will be seen in next section. The "2D chamber' case is an example of a nondiffusing room: it is basically a small parallelepipedic room with two absorbing walls facing each other (the acoustic field is therefore essentially bidimentional). In this case, the BIH (Figure 6) clearly reveals something "abnormal". specially in its lower part; values of the indices are even more eloquent IR are issued from MLS measurements. These IRs were measured with a loudspeaker and the MLS technique, and an omnidirectional electret microphone.
4.2. Active rooms
As explained by Kuttruff (see Figures 3 and 5 in [2]) in an ARS acoustic feedback from loudspeakers to microphones ,ends to raise the ratio σ/m of IHI. As the loop gain of the channels of the ARS is increased and approaches instability, colouration (ringing tones) is perceived. Table II shows the values of σG. E, L1% and Lmax obtained for different values of the loop gain in a 1200 seat multipurpose hall equiped with a 24 channel ARS (1% = 6000m3, passive RT= 1.1 s). GBI is the gain before instability (in dB) of the ARS. As indicated by the timevariance column. Lime variance (phase modulation) was used in the last JR. The IRs were measured with pistol shots and an omnidirectional electret microphone. IRs # 4 and 7 correspond to the IRS noted "a" and "b" respectively in section 3.
Data in Table II show that all four indices reveal colouration very clearly: their variation is much larger than the uncertainty as estimated in section 3. A standard deviation σG of 0.54 reliably indicates the IR is slightly coloured. Index σG, is again the best indicator, as its evolution as the GBl is very smooth and monotonic.
Note that colouration indices of IR #10 is somewhere between IR #6 and IR #7. In other words, this time variance enables boosting the loop gain of the ARS by 2 to 3 dB, which is very consistent with what can be found in the literature 11].
5. Subjective test
The colouration of all three IRs from section 4.1 (MLS measurements) and seven IRs from section 4.2 (pistol shot measurements) were rated from 0 to 100% by 10 untrained subjects. The IRs were pink filtered, and presented to the subjects on headphones. Subjects were first taught about the difference between timbre and colouration. that 0% colouration corresponds a normal uncounted JR. and that 100% colouration corresponds to atone ringing with an infinetely long decay (threshold of stability). Each subject passing the test was free to listen to the IRs as many times as he wished, and in the order he wished. Total test duration was of the order of 15 minutes per subject.
Results are presented in Figure 7 which shows the subjective evaluation vs. the objective index σG. Error bars correspond to the standard deviation of the replies among the subjects. The correlation between objective index σG and subjective evaluation is very good (correlation coefficient = 0.91), except for IR #2. This IR is judged as coloured, but not as much as the value of σG suggests. This is probably linked to the nature of the colouration which is due to the low mode density in the 2Dchamber. For other coloured IRs, the mode density is high, with some modes having significantly lower damping than others.
IR #10 is also not quite as well correlated as the others because subjects have been confused by the artefact of the timevariance used which was slightly audible. This artefact is in our vocabulary a matter of "naturalness" of the reverberation, rather than colouration of the reverberation, but subjects were not sufficiently familiar with these subtleties. The fact that IRs #2 and 10 have the largest error bars could indicate that the subjects were not very comfortable with them. The correlation of subjective rating with the other three indices (not shown here) is not quite as good as with σG, although L1% is almost as good.
It is worth noting that the IR of the reverberation chamber  which should be gaussian, i.e. not coloured – is marked as being slightly coloured, which is very consistent with the values of the objective indices.
Finally, concerning the relation between perceptive colouration and GBI in ARSs, it seems that a GBI= 5 dB corresponds to the threshold of perception of colouration. It must be noted that this figure should not be compared with 12 dB threshold obtained by Kuttruff [2], as he only considered a single channel. In our case, the gain applies to all 24 cells.
6. Conclusion
We have presented a simple and fast method for characterising colouration of room responses in a wide frequency band. An analysis of the distribution of the modulus of the frequency response obtained from the late part of the impulse response is conducted, and several indices defined. This method involves a few parameters which influence has been analysed, specially the choice of the time window which is the most important one. A set of parameter values has been proposed, which contains a certain part of arbitrary choice (as most traditional room acoustic indices), but the method is nevertheless robust.
Indices have been shown to be a relevant measure of colouration, as evidenced by the subjective evaluation presented. Diffusive rooms (Rayleigh distributed) are clearly distinguished from nondiffusive ones, and colouration in active rooms (e.g. due to acoustic feedback in an ARS) is efficiently assessed. Index σG is the best one because it has a small uncertainty and a good correlation with subjective rating of colouration. Index L1% is almost as good, but it has a larger uncertainty and the disadvantage of a longer calculation because it needs computing the BIH of Gω. A more extensive subjective investigation is now needed to check the limits of this method. For example, the effect of the frequency dependence of the colourations should be analysed. Finally, other artefacts of ARS systems such as naturalness and localisation of the reverberation should also be studied, and objective indices derived.
References
[1] M. Kleiner: Review of active systems in room acoustics and electroacoustics. Proc. Active 95, Newport Beach, CA, 1995. 3954.
[2] H. Kuttruff, N. Hesselmann: Zur Klangfarbung durch akustische Riickkopplung bei Lautsprecheranlagen. Acustica 36 (1976) 105112.
[3] G. Behler: Untersuchungen an mehrkanaligen Lautsprecheranlagen zur Verlangerung der Nachhallzeit in Raumen. Acustica 69 (1989) 95108.
[4] M. A. Poletti: Colouration in assisted reverberation systems. Proc. ICASSP, Adelaide, 1994.
[5] J. L. Nielsen: Detection of colouration in reverberation enhancement systems. Proc. Active 95, Newport Beach, CA, 1995. 12131222.
[6] H. Kuttruff: Room acoustics. Elsevier, 1994, 3rd edition.
[7] J.D. Polack: Modifying the chambers to play billiards, or the foundations of reverberation theory. Acustica 76 (1992) 257270.