2. Audibility of Phase Distortion in Audio Signals

The physiological basis of how phase changes in audio signals may be perceived is now presented. Physiological foundation and previous research knowledge of the audibility of phase distortion may provide further insight in the human temporal auditory process. Furthermore, examination of previous research results provides assistance in the relevant design of further research.

2.1 Physiological Foundations

Figure 2.1

Fig. 2.1. The external, middle, and inner ears in man. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 11, Fig. 2.1]

Fig. 2.1 shows the outer, middle, and inner ear in humans. The external portion of the ear is the pinna. The "hole in the head" is the external auditory meatus. The tympanic membrane then separates the external and middle ear. For humans, in the range of frequencies of 2 ~ 7kHz, resonances of the external ear increases the sound pressure at the tympanic membrane, as shown in Fig. 2.2.

Figure 2.2

Fig. 2.2. The average pressure gain of the external ear. The gain in pressure at the eardrum over that in the free fired is plotted as a function of frequency, for different orientations of the source in the horizontal plane ipsilateral to the ear. Zero degrees is straight ahead. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 12, Fig. 2.2]

The Eustachian tube assists in the equalization of eardrum movement by maintaining equal air pressure on both sides of the tympanic membrane. Three small bones called malleus, incus and stapedius tendon connect the tympanic membrane to the oval window, which separates the middle and inner ear.

The oval window’s motions create wavelike movements in the fluid called the perilymph contained in the cochlea. The basilar membrane vibrates depending on the vibrations in the fluid. Where, when, and to what degree the basilar membrane is excited provides the foundation for aural sensation.

2.2 Temporal Processes

Timbre [2] is a multidimensionally perceived tonal attribute that differentiates tones of identical pitch, loudness, and duration. It is influenced by steady state waveforms, transient characteristics (the onset especially), and slower spectral changes over a series of tones. For example, a piano and a trumpet can play the note A440 of identical frequency, sound pressure, and duration but have clearly audible differences. Although it was once believed that the human ear is "phase deaf," in accordance to Ohm’s acoustical law [2], more recent research has shown that relative phase has subtle effects on timbre, in particular when changing phase relationships occur within a continuously sounding tone.

For timbral sensation, the onset portion and other transient characteristics within the dynamic waveform are especially important. The onset is the opening portion of a tone, where the energy supplied exceeds the energy expended. Tones produced by continuous excitation of the vibrating source, such as a blown reed or mouthpiece, or a bowed string, have an onset that is followed by a steady state section, where the energy supplied and expended are in balance for the most part. Tones produced by impulsive excitation of the vibrating source, such as plucked string and piano tones, do not possess a steady state. The offset or decay, where the energy expended exceeds any supplied, concludes a tone.

A steady-state vibration pattern from a vibrating system cannot be attained instantaneously. Onset times of various instruments vary. The trumpet has an onset time of about 20 msec while the flute requires 200 to 300 msec. Even this short (in absolute terms) onset time of 300 msec is significant in perception of timbre. This significance was demonstrated in an experiment [4] where the initial portion of a tone was removed. It was revealed that even experienced musicians had considerable difficulty in discerning common orchestral instruments.

Most music consists of superpositions of complex tones. An upper-level auditory neural system performs the timbre discrimination of two (or more) simultaneously occurring complex tones. Fig. 2.3 shows the hypothetical superposition of complex tones, a monaural signal comprised of instrument 1 playing exactly the note A4, and instrument 2 playing exactly A5, one octave higher, at similar intensity levels. The length of the vertical bars represents the total intensity of each harmonic actually reaching the ear.

Figure 2.3

Fig. 2.3. Resulting spectrum of two complex tones of different timbre (spectrum) an octave apart. [J. G. Roederer, The Physics and Psychophysics of Music : An Introduction (Springer-Verlag, New York, 1995), pp. 163, Fig. 5.2]

Although the human timbre discrimination mechanism is not yet well established, a time element does seem to play a key role. Neither the attack nor the tone buildup of two supposedly simultaneous tones is ever exactly synchronized, especially if both tones have disparate sources (stereo effect). During the transient period of an onset of a tone, the processing mechanism in the human brain seems to be able to lock on particular characteristic features of each instrument’s vibration pattern and keep track of them, despite being blurred by the other instrument. Periodic vibrations in pitch (vibrato) may also provide important cues for discriminating tone quality.

Figure 2.4

Fig. 2.4. Period histograms of a fiber activated by a low frequency tone indicated that spikes are evoked only one half of the cycle. The histograms have been fitted with a sinusoid of the best fitting sinusoid of the best fitting amplitude but fixed phase. Note that although the number of spikes increases little above 70 dB SPL, meaning that the firing is saturated, the histogram still follows the sinusoid without any tendency to square. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 83, Fig. 4.8]

Auditory nerve fibers [3] provide a direct synaptic connection between the hair cells of the cochlea and the cochlear nucleus. In humans, there are 30,000 of these fibers. Above 5 kHz, these nerve fibers fire with equal probability in every part of the cycle. However, at lower frequencies, it is apparent that the spike discharges are locked to one phase of the stimulating waveform. This phase locking can be shown by the means of a period histogram, Fig 2.4. In the construction of the period histogram, the occurrence of each spike discharge is plotted in time. However, the time axis appears to reset in every cycle at a constant point on the stimulus waveform, most likely at the positive zero crossings, as shown in Fig. 2.4. It appears that the period histogram follows a half-wave rectified version of the stimulus waveform. If hair cell activation is linked directly to the mechanical events, it is reasonable to assume that this corresponds to deflection of the cochlear partition in the effective direction. Deflection in the opposite direction reduces the spontaneous activity of the nerve fiber.

Phase-locking is a sensitive indicator of the activation of a nerve fiber by a low frequency tone. At low stimulus intensities, even though the mean firing rate is not increased, a tone can produce significant phase-locking. Although for the above reason they could be more sensitive by about 20 dB, tuning curves that are based on a criterion of phase-locking are similar to those based on an increasing firing rate. As can be seen in Fig. 2.4, phase-locking is preserved as the intensity is raised. Although above 70 dB SPL the total number of spikes evoked does not increase, meaning that the firing rate is saturated, the period histogram still follows the waveform of the stimulus, and does not display any sign of squaring. This may be due to that fact that the hair cell’s AC response is still sinusoidal. It is also possible that there exists a feedback mechanism that maintains the mean firing rate constant in saturation.

A click, which lasts a short time, but spreads spectral energy over a wide frequency range, may be thought of as the spectral complement of a tone, which has a long time duration but also has narrow frequency spread. Fig. 2.5 shows the poststimulus-time histograms of the auditory nerve fibers to click stimuli.

Fig. 2.5

Fig. 2.5.The form of the poststimulus-time histograms to clicks depends on the characteristic frequency of the fiber. Low frequency fibers show ringing (a ~ f), high frequency fibers do not (g ~ i). High frequency fibers also show a later phase of activations (f ~ h). [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 84, Fig. 4.9]

Each auditory fiber possesses a ‘characteristic’ frequency that appears to behave as a band-pass filter. Histograms of low-frequency fibers display several decaying peaks that appear to be produced by a decaying oscillation (as if the cochlear transducer rings in response to a stimulus). The frequency of this ringing is equivalent to the characteristic frequency of the cell. This ringing at the characteristic frequency is exactly what is expected if the tuning of the auditory nerve fibers were produced by an approximately linear filter. It is also expected that the rate of decay of the ringing to be inversely proportional to the bandwidth of the tuning curve, so that a sharply tuned fiber would ring for a long time. However, this is true only to a certain extent. There exist some practical difficulties in making an exact comparison, because the number of spikes in the early peaks tends to limit, similar to the response to tone saturation at high intensities.

In accordance with the response to tone stimuli, it also appears that only one phase of the basilar membrane movement is effective. The histogram corresponds to the upper half cycle of the decaying oscillation produced in the transducer. Since at the highest intensities, a rarefaction click (Fig. 2.6 (a)) produces the earliest response, it appears as though an upward motion of the basilar membrane is responsible for excitation. A rarefaction click will move the oval window outwards, which in turn moves the basilar membrane upwards, shown in Fig. 2.7 A(i). A condensation (Fig. 2.6 (b)), as opposed to a rarefaction click, reverses the positions of the peaks and troughs of the histogram, as though the basilar membrane were being driven in the opposite direction, shown in Fig. 2.6 A (ii). An approximate picture of the excitatory oscillation can be produced by inversion of the histogram for a condensation click below that for a rarefaction click, to produce what is called a compound histogram, as shown in Fig. 2.7 B. This resultant pattern can be compared with the basilar membrane impulse responses of Fig. 2.8. Histograms to clicks can also reveal that the suppression of activity during the less effective half cycle of the stimulating waveform is not due to refractoriness from previous activity, since the first sign of influence on a fiber can sometimes be a suppression of spontaneous activity produced by the less effective half cycle. It is not known at the moment if the nature of the decaying oscillation is only mechanical on the basilar membrane, mechanoelectrical, or purely electrical, as though the fiber were being driven by the decaying oscillation of an electrical filter.

Figure 2.6 (a)

Figure 2.6 (b)

Fig. 2.6 (a). Rarefaction click (or impulse) which possesses negative amplitude.
(b). Condensation click (or impulse) which possesses positive amplitude.

Figure 2.7 A

Figure 2.7 B

Fig. 2.7 A. Poststimulus-time histograms to (i) rarefaction and (ii) condensation clicks show that the peaks and troughs occur in complementary places for the two stimuli. Fiber characteristic frequency: 450 Hz.
B. A compound histogram is formed by inverting the histogram to condensation clicks under that to rarefaction clicks. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 86 - 87, Fig. 4.10 A, B.]

Kim et al. [6] demonstrated the propagation of a combination tone by recording the responses of auditory nerve fibers to a two-tone stimulus. The responses of a large number of fibers were sampled, producing a ‘neurogram,’ or a display of activity in the whole nerve fiber array. Phase-locking to the two fixed primaries and to the combination tone were measured. Fig. 2.9 shows neurograms for the primaries and for the 2f1 - f2 combination tone. For each of a large number of nerve fibers, they calculated the number of spikes phase-locked to the stimulus tone or combination tone of interest, divided by the number of spontaneous spikes. The results were then sideways averaged over fibers of a small range of characteristic frequencies, to produce the running averages illustrated. Note that while the two primaries produced separate peaks of activity at their characteristic place in the cochlea, the combination tone produced a peak at its characteristic place in the cochlea as well.

Figure 2.8

Fig. 2.8. Impulse responses of the basilar membrane show ringing.
A. Guinea-pig, measured by capacitive probe, at the 23 kHz point on the membrane.
B. Squirrel monkey, velocity of the impulse response, measured by the Mössbauer technique.
[J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 46, Fig. 3.12 A, B.]

The phase data in the neurogram produced very strong proof that the distortion component was transmitted along the basilar membrane by a travelling wave, just as though it were a tone introduced externally. For externally introduced tones, the phase of the activation in auditory nerve fibers increases steadily along the cochlea, due to the time taken by the travelling wave (line fs in Fig. 2.10). Highest to the site of generation, the phase of the response to the combination tone varied in the same exact manner, after an arbitrary phase shift which was necessary (since there is no reference phase - line 2f1 - f2 in Fig. 2.10). We conclude that the distortion tone is propagated along the cochlea just as though it was an externally introduced tone, producing a resonance in the cochlea at the characteristic frequency point.

Figure 2.9

Fig. 2.9. Neurograms for the primaries and for the 2f1 - f2 combination tone. The activity phase-locked to the primaries (f1 and f2) was most prominent in fibers of those characteristic frequencies (arrows). Activity phase-locked to 2f1 - f2 was most prominent in fibers tuned to 2f1 - f2. (There was also phase-locking to f1 - f2 which was deleted for the purposes of the illustration.) The frequencies of f1, f2, and 2f1 - f2 are indicated by the arrows. Note that fibers of high characteristic frequency are plotted to the left of the figure, so that points on the left refer to the base of the cochlea. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 141, Fig. 5.14.]

Figure 2.10

Fig. 2.10. The phase of the neurogram of the 2f1 - f2 combination tone increases with distance apically to the site of generation, in just the same way as does the phase of an introduced tone (fs). The points for the individual neurons are shown as well as the means; in the center the mean curves have been shifted so as to coincide at the frequency indicated by the arrow. ‘CF’ stands for ‘characteristic frequency.’ The stapes is one of three small bones of the ossicles which is connected to the oval window. [J. O. Pickles, An Introduction to the Physiology of Hearing (Academic Press, New York, 1982), pp. 142, Fig. 5.15.]

Since the amplitude of the combination tone depends highly on the frequency separation of the primaries (f1 and f2), the generation of the combination tone seems to occur after one stage of frequency filtering. It can be supposed therefore that the overlap occurs only if both primaries get through the same first filter, possibly to be identified with the mechanical resonance of the cochlear partition. Mechanical energy will have to be fed back to the basilar membrane in this instance. However, a reverse flow of energy from the transducer in the evoked cochlear mechanical response has also been shown. In this case, the energy was detected in the ear canal. Although the mechanism of the distortion tone re-emission is not certain, it would have to act very quickly since there is no indication that combination tones become weaker at high frequencies.

2.3 Previous Research

Although not in large numbers, previous research in investigation of the audibility of phase distortion has proven that it is an audible phenomenon. Lipshitz et al. [7] has shown that on suitably chosen signals, even small midrange phase distortion can be clearly audible. Mathes and Miller [8] and Craig and Jeffress [9] showed that a simple two-component tone, consisting of a fundamental and second harmonic, changed in timbre as the phase of the second harmonic was varied relative to the fundamental. The above experiment was replicated by Lipshitz et al., with summed 200 and 400 Hz frequencies, presented double blind via loudspeakers resulting in a 100% accuracy score. An experiment involving polarity inversion of both loudspeaker channels resulted in an audibility confidence rating in excess of 99% with the two-component tone, although the effect was very subtle on music and speech. Cabot et al. [10] tested the audibility of phase shifts in two component octave complexes with fundamental and third-harmonic signals via headphones. The experiment demonstrated that phase shifts of harmonic complexes were detectable.

Another very simple experiment conducted by Lipshitz et al. was to demonstrate that the inner ear responds asymmetrically. Reversing the polarity of only one channel of a pair of headphones markedly produces an audible and oppressive effect on both monaural and stereophonic material. This effect predominantly affects frequency components below 1 kHz. Because reversal of polarity does not introduce dispersive or time-delay effects into the signal, but merely reverses compressions into rarefactions and vice versa, these audible effects are due only to the constant 180° phase shift that polarity reversal brings about. Since interaural cross-correlations do not occur before the olivary complexes to which the acoustic nerve bundles connect, it must be concluded that what is changed is the acoustic nerve output from the cochlea due to polarity reversal. This change owes to two factors: cochlear response to the opposite polarity half of the waveform, and the waveform having a shifted time relationship relative to the signal heard by the other ear. This reaffirms the half-wave rectifying nature of the inner ear.

A frequent argument to justify why phase distortion is insignificant for material recorded and/or reproduced in a reverberant environment is that reflections cause gross, position sensitive phase distortion themselves. Although this is true, it is also true that the first-arrival direct sound is not subject to these distortions, and directional and other analyses are determined during the first few milliseconds after its arrival, before the pre-dominant reverberation’s arrival. Lipshitz et al. do not believe that the reverberation effects render phase linearity irrelevant, and there exists confirmatory evidence [12].

Lipshitz et al.’s research involved analog implementations of first- and second-order unity-gain all-pass networks ranging in frequency from roughly 100 Hz to 3 kHz, with frequency switchable in steps. The Q of the second-order networks was switchable in steps from 1/2 to 2. Transducers used in the experiment were electrostatic Stax headphones and Quad loudspeakers, for their notable phase linearity.

Test material used and notable results include:

The conclusion drawn by Lipshitz et al. was that midrange phase distortion can be heard not only on simple combinations of sinusoids, but also on many common acoustical signals. This audibility was far greater on headphones than on loudspeakers in a reverberant listening environment.

Hansen and Madsen [13], [14] have conducted several experiments that have been completed regarding the audibility of phase distortion. A displaced sine wave (non-zero DC component) will have the time and frequency functions as shown in Fig. 2.11.

Figure 2.11

Fig. 2.11. Single-sine pulses with differing displacements (left) and their common spectral plot (right). [V. Hansen and E. R. Madsen, "On Aural Phase Detection," J. Audio Eng. Soc., vol. 22, pp. 10-14 (1974 Jan./Feb.), pp. 12, Fig. 7.]

Frequency analysis demonstrated that there was no difference in the frequency spectrum. However, listening tests conducted with an electrostatic loudspeaker on both signals disclosed a clearly audible difference in timbre.

Hansen and Madsen’s second experiment [14] used a very narrow spectrum with three bars, as shown in Fig. 2.12 for a listening test.

 

Figure 2.12

Fig. 2.12. Time function giving a three-bar spectrum. [V. Hansen and E. R. Madsen, "On Aural Phase Detection: Part II," J. Audio Eng. Soc., vol. 22, pp. 783-788 (1974 Dec.), pp. 784, Fig. 3.]

Seven mutually overlapping ranges were selected, each of them containing three harmonics:

  1. 50 — 150 Hz
  2. 100 — 300 Hz
  3. 200 — 600 Hz
  4. 400 — 1200 Hz
  5. 1 — 3 kHz
  6. 2 — 6 kHz
  7. 5 — 15 kHz

The quantity h = A/B, where the listeners had a chance of switching time functions as shown in Fig. 2.13, could take the following values:

      h1 = 19/21
      h2 = 18/22
      h3 = 17/23
      h4 = 16/24
      h5 = 15/25

Figure 2.13

Fig. 2.13. Test signals. [V. Hansen and E. R. Madsen, "On Aural Phase Detection: Part II," J. Audio Eng. Soc., vol. 22, pp. 783-788 (1974 Dec.), pp. 783, Fig. 1.]

The greater the difference between A and B, the greater was the phase difference, as shown in Fig. 2.14.

Figure 2.14

Fig. 2.14. Phase relationship between harmonics for different h values. [V. Hansen and E. R. Madsen, "On Aural Phase Detection: Part II," J. Audio Eng. Soc., vol. 22, pp. 783-788 (1974 Dec.), pp. 784, Fig. 4.]

A test was conducted with a Quad electrostatic loudspeaker in a standard living room. Average results for all listeners and the resulting plots of permissible phase distortion levels and phase deviations are shown in Figs. 2.15 and 2.16. The five curves on Fig. 2.15 represent the various quantities of phase difference ratio h = A/B used as the plot parameter. The five curves on Fig. 2.16 represent the relative minimum sound pressure levels for just noticeable detection of phase change between the signals. As an interesting side note, it was found that the tests revealed noticeably increased phase sensitivity with loudspeaker tests in reverberant environments as compared to headphone tests. This increased phase sensitivity may be due to reflections, or standing waves converted into amplitude shift present in the reverberant room.

Figure 2.15

Fig. 2.15. Average results obtained from listening test in a reverberant environment with phase as a parameter. [V. Hansen and E. R. Madsen, "On Aural Phase Detection: Part II," J. Audio Eng. Soc., vol. 22, pp. 783-788 (1974 Dec.), pp. 787, Fig. 7.]

Figure 2.16

Fig. 2.16. Permissible phase deviation (living room) with sound pressure as a parameter. [V. Hansen and E. R. Madsen, "On Aural Phase Detection: Part II," J. Audio Eng. Soc., vol. 22, pp. 783-788 (1974 Dec.), pp. 787, Fig. 8.]

Suzuki et al. [15] conducted a phase distortion perception experiment with transient signals of short duration as shown in Fig. 2.17. The time interval T0 for each signal was chosen so that T0 = 2/f0, where f0 is the 90° phase shift frequency of an analog phase-lag type all-pass filter defined by

Equation 2.1
(2.1)

 

Figure 2.17

Fig. 2.17. Transient Signals used for hearing test. [H. Suzuki, S. Morita, and T. Shindo, "On the Perception of Phase Distortion," J. Audio Eng. Soc., vol. 28, pp.570-574 (1980 Sep.), pp. 572, Fig. 4.]

These transient signals were then phase shifted by a single-pole phase-lag type all-pass filter. Transient signal intervals were changed according to the frequencies of 300 Hz and 1 kHz, the value of f0 explained previously.

The procedure involving filtered (A) and unfiltered (B) signals were presented as A·A· A· A· A· B· B· B· B· B · A· A· A· A· A · · · B· B· B· B· B, where ‘· ’ denotes a one-second interval. The correct answers percentages for each subject are shown in Fig. 2.18, where f0 = 300Hz, and a loudspeaker in a listening room was used. Fig. 2.19 displays the results of the listening test conducted in an anechoic chamber, where f0 = 300Hz, and the signals S2 and R2 were used.

Figure 2.18

Fig. 2.18. Percentages of correct answers of loudspeaker listening for various signals in a listening room, where f0 = 300Hz. [H. Suzuki, S. Morita, and T. Shindo, "On the Perception of Phase Distortion," J. Audio Eng. Soc., vol. 28, pp.570-574 (1980 Sep.), pp. 572, Fig. 7.]

Certain people who participated in the test clearly heard the phase distortion present in the low frequencies of a single-pole all-pass filter when highly artificial signals were used. In this sense, for high-fidelity reproduction, phase distortion is not permissible. Another conclusion made by Suzuki et al. was that phase effects were highly individual and headphone listening showed much greater sensitivity than loudspeaker listening.

Figure 2.19

Fig. 2.19. Percentages of correct answers of loudspeaker listening for S2 and R2 in an anechoic chamber, where f0 = 300Hz. [H. Suzuki, S. Morita, and T. Shindo, "On the Perception of Phase Distortion," J. Audio Eng. Soc., vol. 28, pp. 570-574 (1980 Sep.), pp. 573, Fig. 8.]

Fincham [16] tested the effect of the reduction in group-delay distortion in the audio record/reproduction chain by means of a minimum phase-shift equalizer in carefully controlled conditions. These effects were clearly heard but quite subtle. In another test, a 8 cycles of a 40-Hz tone burst was used which was cascaded with an all-pass filter with significant group delay around 40-50Hz. Loudspeakers were used. Distinct audible differences in sound quality were observed by most of the lecture theater audience.

Preis et al. [17] conducted the audibility of phase distortion produced by minimum-phase 4-kHz and 15kHz anti-alias filters. In his experiment, group-delay distortion was doubled progressively until 67% mean correct discrimination was attained. Fig. 2.20 shows the mean correct discrimination between phase-distorted (minimum-phase) and undistorted (linear-phase) test signals for three low-pass systems (4-kHz elliptic and Butterworth, 15-kHz elliptic).

Figure 2.20

Fig. 2.20. Average correct discrimination between signals with no group-delay distortion and progressively doubled group-delay distortion. 52 presentations per subject of each of 11 test-signal pairs. 5 subjects; E - elliptic; B - Butterworth. [D. Preis and P. J. Bloom, "Perception of Phase Distortion in Anti-Alias Filters," J. Audio Eng. Soc., vol. 32, pp. 842-848 (1984 Nov.), pp. 844, Fig. 2.]

It was concluded that for the impulsive test signals used and diotic (same signal in both ears) presentation via headphones, the ear is significantly more sensitive in the middle of the audio band (4 kHz) than at the upper edge of the band (15kHz) to group-delay distortion.

Some considerations of the experiment conducted by Preis et al. include the fact that no attempt was made to determine the detailed dependence of the perceptual threshold on frequency, the peak and width of the group delay characteristic, or signal intensity, or signal polarity. Secondly, other test signals, such as speech and music, were not used. Finally, other methods of irradiation, such as loudspeakers in non-reverberant or reverberant environments, were not tried.

Deer et al. [18] investigated and established perceptual thresholds for the audibility of phase distortion in all-pass filters with broadband impulsive test signals presented diotically over headphones. An analog second-order all-pass filter at 2kHz (assumed to be most sensitive) that had adjustable peak value as well as bandwidth of group-delay distortion was used. Fig. 2.21 displays the mean correct discrimination between phase-distorted and undistorted (linear-phase) test signals for the second-order all-pass systems. Fig. 2.21 reveals that there exists a statistically significant effect perceived when peak group delay at 2 kHz is greater than 2 msec.

Figure 2.21

Fig. 2.21. Average correct discrimination between signals with no group-delay distortion and progressively doubled peak group-delay distortion. 26 presentations per subject, 6 subjects. [J. A. Deer, P. J. Bloom, and D. Preis, "Perception of Phase Distortion in All-Pass Filters," J. Audio Eng. Soc., vol. 33, pp. 782-786 (1985 Oct.), pp. 783, Fig. 1.]

Fig. 2.22 displays the result of the listening test of the all-pass filter having a fixed peak group delay at 4 msec and the delay bandwidth varied. The pair [4A, 4C], with Q difference of 0.71, was distinguished as possessing a statistically significant difference.

Figure 2.22

Fig. 2.22. Average correct discrimination between signals with identical peak group delays and differing relative bandwidth. 26 presentations per subject, 6 subjects. [J. A. Deer, P. J. Bloom, and D. Preis, "Perception of Phase Distortion in All-Pass Filters," J. Audio Eng. Soc., vol. 33, pp. 782-786 (1985 Oct.), pp. 785, Fig. 4.]

A new and important contribution made here was the interrelationship between group delay distortion (a frequency-domain measure) and the corresponding impulse response (time-domain measure) for each all-pass filter, particularly the envelope of the impulse response, and perceived phase distortion.

A consideration of this research was that for the all-pass filter, only 2 kHz for fc was used, because this frequency was assumed to provide maximum sensitivity of the ear. Effects of increased or decreased signal of polarity were not determined. Finally, loudspeakers in non-reverberant or reverberant environments were not tested

Preis et al. [19] has also conducted Wigner distribution analysis on various filters with perceptible phase distortion. The responses of several anti-alias and all-pass filters were displayed jointly in time and frequency using the Wigner distribution. The Wigner distribution of a signal contains four useful properties:

     1) Frequency response
     2) Group delay
     3) Instantaneous power
     4) Instantaneous frequency.

These properties can each be estimated visually by taking a "slice" of the elevation contours of the Wigner distribution parallel to the horizontal time axis or the vertical frequency axis in the time-frequency plane. Fig. 2.23 and 2.24 display the interplay between the Wigner distribution and impulse response, frequency response, envelope power or energy-time curve, group delay, and instantaneous frequency for a 4 kHz low-pass filter and a 2 kHz all-pass filter with a peak group delay of 8 msec, respectively.

This chapter introduced the audibility of phase distortion in audio signals. The physiology of the human ear was examined. The time-based processes of sound perception, such as timbral sensation in the human auditory process, were discussed. The phase-locking property of the auditory process was presented. Finally, previous research investigating the audibility of phase distortion was presented. The physiological foundation and previous research of phase distortion detection may provide further insight in the human temporal auditory process and assist in the relevant design of further research. The results from this chapter will aid in partial formulation of the listening test used in the thesis research, as will be shown in the next chapter.

Figure 2.23

Fig. 2.23. Interplay between Wigner distribution and impulse response, frequency response, envelope power or energy-time curve, group delay, and instantaneous frequency for a 4 kHz anti-alias filter. [D. Preis, F. Hlawatsch, P. J. Bloom, and J. A. Deer, "Wigner Distribution Analysis of Filters with Perceptible Phase Distortion," J. Audio Eng. Soc., vol. 35, pp. 1004-1012 (1987 Dec.), pp. 1011, Fig. 9]

Figure 2.24

Fig. 2.24. Interplay between Wigner distribution and impulse response, frequency response, envelope power or energy-time curve, group delay, and instantaneous frequency for a 4 kHz anti-alias filter. [D. Preis, F. Hlawatsch, P. J. Bloom, and J. A. Deer, "Wigner Distribution Analysis of Filters with Perceptible Phase Distortion," J. Audio Eng. Soc., vol. 35, pp. 1004-1012 (1987 Dec.), pp. 1011, Fig. 10]


[Chapter 1][Table of Contents][Chapter 3 ]