CONTENTS
1. Auditory Grouping Phenomena . 32-2 1.1. Parsing of Sounds of
Complex Spectral
Composition, 32-3
1.1.1.
Harmonicity of Spectral Components, 32-3 1.1.2. Time-Variant Relationships,
32-4
1.1.3. Familiarity, 32-5
1.2.
Grouping of Sound Sequences in Space, 32-5 1.2.1. Auditory
Illusory Conjunctions, 32-5 1.2.2. The Scale Illusion, 32-6
1.2.3.
Grouping of Nonsimultaneous Sound Sequences, 32-7
1.2.4.
The Hypothesis of a Slow Switching Mechanism, 32-8
1.2.5. The Octave Illusion, 32-9
1.2.6. Grouping of Phase-Shifted Tones, 32-14 1.3. Grouping of Rapid Sound
Sequences, 32-16 1.3.1. Grouping by Frequency, 32-16
1.3.2. Grouping by Frequency Proximity, 32-16 1.3.3. Temporal Coherence as a
Function of Frequency Proximity and Tempo, 32-16
1.3.4.
Grouping by Frequency Proximity in Relation to Repetition, 32-16
1.3.5.
Frequency Proximity and the Perception of Temporal Relationships, 32-17
1.3.6. Grouping by Good Continuation, 32-18 1.3.7. Grouping by Sound
Quality, 32-19 1.3.8. Grouping by Amplitude, 32-19
1.3.9. Grouping by Temporal Position, 32-19 1.3.10. Grouping by Spatial
Location, 32-19 1.3.11. Closure: The Auditory Continuity Effect, 32-19
1.4. Grouping and Selective Attention, 32-20
1.4.1.
Voluntary and Involuntary Grouping, 32-20 1.4.2. Consequences of Attention
Focusing, 32-20
2. Shape Analysis for Pitch Structures 32-21
2.1. Auditory
Shape Analysis as a Multileveled
Process, 32-21
2.2.
Passive Versus Active Processing, 32-21 2.3. Feature
Abstraction, 32-21
2.3.1. Octave Equivalence, 32-21
2.3.2. Interval and Chord Equivalence, 32-21 2.3.3.
Categorical Perception of Musical Intervals, 32-22
2.3.4.
Global Cues, 32-22 2.3.5. Interval Class, 32-22
2.4. Higher-Order Abstractions, 32-25
2.5. Hierarchical Encoding of Pitch Sequences, 32-27 2.6.
The Influence of Short-Term Memory on Perception of Pitch Patterns, 32-29
2.6.1.
Interference Effects in Short-Term Memory for Pitch, 32-29
2.6.2.
Facilitation Through Repetition in Short Term Memory
for Pitch, 32-31
2.6.3. The Influence of Relational Context on Pitch Comparison
Judgments, 32-31 2.7. Contour as a Cue in Recognition of Pitch Patterns, 32-32
2.8.
Scale and Key Structure in Recognition of Pitch Patterns, 32-32
2.9.
Memory for Hierarchically Organized Pitch Patterns, 32-33
3. Analysis of Timbre 32-34 3.1. Timbre and Fourier Analysis, 32-34 32-2
3.2.
Investigation of Timbre by Analysis and Synthesis, 32-35
3.3.
Multidimensional Models of Timbre, 32-35 3.4. Role of Context in Timbre
Perception, 32-36
Perception of Temporal Relationships 32-37 4.1. Perception of
Temporal Order, 32-37
4.1.1. Modes of Order Perception, 32-37
4.1.2. Perception of the Order of Two Events, 32-37 4.1.3.
Perception of the Order of Three or More Events, 32-37
4.1.4.
Order Perception in Continuously Cycling Sound Patterns, 32-37
4.1.5.
Theories of Order Perception, 32-38 4.2. Perception of Rhythm, 32-38
4.2.1. Subjective Rhythmic Grouping, 32-38 4.2.2. Grouping
by Temporal Proximity, 32-38 4.2.3. Grouping by
Accent, 32-39
4.2.4. Grouping by Other Principles, 32-39 4.2.5.
The Run Principle and the Gap Principle, 32-39
4.2.6.
Rhythmic Hierarchies,
32-40
5. Summary
32-43 Notes 32-43 References 32-44
Research
on hearing has traditionally been concerned with simple detection,
discrimination, and scaling tasks. However, the last decade has seen a
flowering of interest in higher-level mechanisms concerned with auditory
grouping, shape percep tion, memory, and so on. This new development has been
due largely to technological advances that have enabled researchers to generate
complex auditory stimuli with precision and flex ibility. Those entering the
field have been rewarded by the discovery of an elaborately structured and
highly differentiated system that possesses some remarkable properties.
Two
major influences on research into auditory pattern recognition may be
identified. The first stems from related work in perceptual and cognitive
psychology. For example, the multi leveled
approach to auditory shape perception has been strongly motivated by
theoretical and experimental work on the per ception
of visual shape. As another example, research into memory for sound structures
has been influenced by findings on memory for verbal materials.
A
second major influence derives from music theory. Fun damental concepts such as
octave equivalence and interval equivalence have been in the mainstream of
traditional music theory since the time of Pythagoras. Several developments in
contemporary music theory have also provided input. For example, the theory of
12-tone composition, developed by Schoen berg, is based on an implied theory of
shape analysis for pitch structures. Another example is the hierarchical theory
of tonal music, developed early in this century by Schenker, which has points
of similarity with the theory of transformational grammar developed later by
Chomaky. In addition, composers of electronic and computer music have provided
the major impetus to recent experimental work on the perception of sound
quality or timbre, an area of research with broad implications for auditory per
ception in general.
This
chapter is divided into four main sections. In the first, auditory grouping
phenomena are investigated. This section deals with questions concerning the
perceptual fusion and sep aration of components of a
complex sound spectrum, the grouping of sound elements emanating from different
spatial locations, and the grouping of sounds that occur in rapid succession. The
second section is concerned with the perception and recognition of patterns
formed of pitch combinations. The third section deals with the perception of
timbre or sound quality. The fourth section is concerned with the perception of
temporal order and of rhythm. The final section summarizes the findings in
these different subfields.
1. AUDITORY GROUPING PHENOMENA
We may
distinguish two basic but interrelated questions in considering how the
auditory system groups stimuli into perceptual configurations. The first
involves the stimulus dimen sions along which grouping principles operate. When
presented with a complex signal, the auditory system may group elements
according to some rule based on frequency, on amplitude, on temporal or spatial
position, or on some multidimensional at tribute such as timbre. As will be
shown, any of these attributes may serve as a basis for grouping, and further,
there are complex and rigid rules determining which attribute will be used. Such
rules can often be well interpreted in terms of strategies that are most likely
to lead to the correct conclusions in interpreting our auditory environment. Second,
we may enquire into the principles that govern grouping along any given
dimension. The Gestalt psychologists proposed that we form groupings on the
basis of certain simple principles, such as proximity, good continuation,
similarity, and common fate (Wertheimer, 1923). As described elsewhere in this Handbook,
these have been shown to be important descriptive principles for grouping
in vision. We shall show here that this is true for hearing also. It may
plausibly be argued that grouping in conformity with such principles enables us
to interpret our environment most effectively (Bregman, 1978; D. Deutsch,
1975c; Gregory, 1970; Hochberg, 1974; Sutherland, 1973). Sounds that are
similar are likely to be coming from the same source, and sounds that are
dissimilar are likely to be coming from different sources. A sequence of sounds
is more likely to be coming from a single source if it contains frequency
transitions that are gradual rather than abrupt. Components of a sound spectrum
that modulate in synchrony are more likely to be coming from a single source
than those that modulate out of synchrony.
The view of auditory grouping as a process of unconscious inference
may be traced to Helmholtz (1859/1954) (see Note 1). He speculated how, given
the complex, time-variant spectrum produced by several musical instruments
playing simultaneously, the listener reconstructs the auditory environment so
that some components of the spectrum fuse perceptually to produce the
impression of a single sound, while others are heard as separate melodic lines
sounding in parallel. He wrote:
Now there are many circumstances which assist us
first in separating the musical tones arising from different sources, and
secondly, in
keeping together the partial tones of each separate source. Thus when one
musical tone is heard for some time before being joined by the second, and then
the second continues after the first has ceased, the separation in sound is
facilitated by the succession of time. We have already heard the first musical
tone by itself and hence know immediately what we have to deduct from the compound effect for the effect of
this first tone. Even when several parts proceed in the same rhythm in
polyphonic music, the mode in which the tones of different instruments and voices
commence, the nature of their increase in force, the certainty with which they
are held and the manner in which they die
off, are generally slightly
different for each ... but besides all this, in good part music,
especial care is taken to facilitate the
separation of the parts by the ear. In polyphonic music proper, where
each part has its own distinct melody, a principal means of clearly separating
the progression of each part has always consisted in making them proceed in
different rhythms and on different divisions of the bars.... All these helps fail in the resolution of musical tones into their constituent partials. When a
compound tone commences to sound, all its partial tones commence with the same
comparative strength; when it swells, all of them generally swell
uniformly; when it ceases, all cease simultaneously.
Hence no opportunity is generally given for hearing them separately and
independently. (pp. 59-60)
1.1. Parsing of Sounds of Complex Spectral
Composition
A
basic task for auditory theory is to determine the relationships between
elements of an ongoing sound spectrum that give rise to the perception of a
single sound and those that give rise to the perception of several simultaneous
sounds. Without these processes of fusion and separation, intelligible
listening would not be possible. Presumably mechanisms have evolved that cause
us to fuse together those elements of the sound spectrum that are likely to be
coming from the same source and to separate out those elements that are likely
to be coming from different sources. Three factors will be considered here. The
first is harmonicity of spectral components; the second is synchronicity; the
third is familiarity with certain sound complexes.
1.1.1. Harmonicity of Spectral Components.
It has been
argued from various lines of evidence that harmonic sounds are more likely to
be perceived as fused than are nonharmonic sounds (see Note 2). Stringed and
blown instruments have partials that are harmonic or nearly harmonic, and such
partials unite to produce the impression of a single tone. In contrast, bells
and gongs have partials that are nonharmonic, and these produce more diffuse
sound impressions (Mathews & Pierce, 1980). De Boer (1976) has shown that
harmonic complexes tend to produce, unitary and unequivocal pitch sensations,
whereas certain types of nonharmonic complex do not merge, but instead produce
multiple pitch sensations. Since most forced vibration systems such as the
voice have partials whose frequencies are harmonic or close to harmonic, such
findings are as expected on the hypothesis that our auditory system has evolved
to interpret sound patterns in terms of the sources from which they emanate.
We may
next enquire whether the phase relations between the partials of a tone affect
the fusion of its image. This question was investigated by Kubovy (Note 3). He
created a set of harmonically related sinusoids, all of equal amplitude, and
all be ginning with a positive zero-crossing and
therefore having a common zero-crossing at the frequency of the fundamental. One
of these sinusoids was then moved out of phase for a few hundred milliseconds. It
was then moved back into phase, while another was moved out, and so on. A
perceptual segregation was produced by these means, so that a melody was heard
that corresponded to the out-of-phase sinusoids.
Later,
Kubovy and Jordan (1979) constructed stimuli consisting of the third to
fourteenth harmonics of a 200-Hz fundamental, which were played in the sine
phase. At intervals of roughly 300 msec, the phases of all components but one
were reset to 0°ree;, and the phase of the
remaining component was set to a different phase angle. The out-of-phase
components formed a scale that either ascended or descended, and subjects
judged the direction of this scale. The results are shown in Figure 32.1. It
can be seen that for phase shifts greater than 40°ree; subjects showed
near-perfect identification of scale direction. These experiments therefore
demonstrate the perceptual effect of phase relationships on the fusion of
single tones composed of harmonically related complexes: Phase shifting a
component of the complex results in its perceptual segregation.
Tones
whose fundamental frequencies are related by simple ratios fuse more readily
than tones that are not so related. In a demonstration of this phenomenon,
Rasch (1978) presented two chords in succession. The lower tones of each chord
were identical, and the higher tones formed a sequence that either ascended or
descended. The subjects' task was to judge whether the higher tones formed a
"low-high" sequence or a "high-low" sequence. Detection
thresholds were taken as the measure of the extent to which the subjects could
separate out the component tones of each chord. The lower tones all had a
fundamental frequency of 250 Hz. The higher tones had fundamental frequencies
that either were 500 and 750 Hz or deviated slightly from these values.
These results of the experiment are shown in Figure 32.2. It can be seen that, as the relationships formed by the fundamental frequencies of the higher and lower tones deviated from simple ratios, detection performance gradually improved, indicating a decreased tendency to fuse together the higher and lower components of the chords.

Figure 32.1. Percentage of correct
identification of phase-shifted target tones as a function of phase shift in
degrees. The stimuli consisted of the third to fourteenth harmonics of a
200-Hz fundamental, which were played in the sine phase. At intervals of around
300 msec, the phases of all components except one were reset to 0°ree;, and the phase of the remaining component was set to a
different phase angle. The out-of-phase components formed a scale that either
ascended or descended, and subjects identified the direction of the scale. Near-perfect
identification was shown for phase shifts greater than 40 deg. (From M. Kubovy
& R.

Figure 32.2.
Detection thresholds for higher tones in the presence of
lower tones. Two chords were presented in succession. The lower tones of
the chords were both at 250 Hz, and the higher tones formed either a
"low-high" sequence or a "high-low"
sequence. Either higher tones were at 500 and 750 Hz, or they deviated
slightly from these values. Subjects judged whether a "low-high"
sequence or a "high low" sequence had been presented. Detection
thresholds fell gradually with increasing deviation from the 500-Hz and 750-Hz
values, in roughly symmetrical fashion.
1.1.2. Time-Variant Relationships. One factor that
may be hypothesized to contribute to the impression of a single fused sound is
coordinated modulation in the steady state. In forced vibration systems, any
perturbation of the driving force will result in perturbations of components of
the spectrum that are proportional to their frequencies. Thus a complex of sinusoids that is modulated
in correlation is likely to be emanating from a single source. McNabb and
Chowning (quoted by McAdams, 1982) have demonstrated informally that a harmonic
tone com plex with a spectral power distribution conforming to that of a vowel
produces only a weak vocal sensation, and only weak perceptual
fusion. However,
if a small amount of frequency modulation is superimposed on all the spectral
components simultaneously, they sound strongly fused. Similar observations
have been reported informally by McAdams (1982). -
By the same token, if we hear a complex of sinusoids
with uncorrelated modulation functions, the likelihood is that the components
of the complex are emanating from different sources. McAdams (1982) reports an
informal experiment employing a complex stimulus in which a transition was made
from perfectly correlated to two uncorrelated frequency modulation functions. For
harmonic tone complexes, the listener's percept shifted from a single fused
image to two distinct images. The effect was uncertain for inharmonic tone
complexes.
A related finding was obtained by Rasch (1978),
using the sequence detection task described in Section 1.1.1. He showed that,
when the higher tones of the chords were frequency modulated while the lower
tones remained unmodulated, detection of whether the chords formed a
"low-high" sequence or a "high low" sequence was enhanced,
so that uncorrelated modulation resulted in decreased fusion of the
simultaneously presented tones.
How does onset asynchrony of two simultaneous tones
affect perceptual fusion? Rasch (1978) used the same detection task to study
the effect of delaying the lower tones of the chords relative to the higher
tones. As shown in Figure 32.3, detection performance was strongly influenced
by this manipulation. Each 10 msec of delay was associated with roughly a 10-dB
downward shift of threshold. For a delay of 30 msec, threshold for perception
of the high tone was close to that for the high tone presented alone.
Rasch further noted that the phenomenological
effect of . asychrony was
very strong. Whereas in the synchronous con- ". ditions a single "sound object" was perceived, in the asynchronous conditions
the two tones stood apart very clearly. However, the onsets of the two tones
were not separately audible, so that they were perceived as two separate but
simultaneous sounds:
This is
an example of the continuity effect. (See Section 1.3.11.).
A related finding was obtained by Bregman and Pinker
(1978). These authors presented a two-tone complex in alter nation with a third
tone and introduced various conditions of onset-offset asynchrony between the
simultaneous tones in the complex. As the degree of asynchrony increased, the
likelihood also increased that one of the simultaneous tones would form a
melodic stream with the third tone. Bregman and Pinker argued that the
asynchrony of the simultaneous tones resulted in a decreased tendency for these
tones to be treated as coming from the same source and so facilitated a sequential
organization by frequency proximity between one of these simultaneous tones and
the alternating tone.

Figure 32.3.
Detection thresholds for higher tones in the presence
of lower tones. The paradigm used was as described in Figure 32.2. The lower tones
were at 250 Hz,
and the higher tones were at 500 Hz and 750 Hz. Either the higher tones ended
simultaneously with the lower tones (solid line), or they ended immediately
following onset of the lower tones (dashed line). Thresholds were virtually
unaffected by amount of overlap but were strongly affected by delay of the
lower tones. Each 10 msec of delay produced roughly a 10-dB downward shift in
threshold. (From R. A. Rasch, The perception of simultaneous notes such as in
polyphonic music, Aeustica,1978, 40. Reprinted with permission.)
Dannenbring
and Bregman (1978) investigated the effects of several variables on the
tendency of one component of a complex tone either to fuse with the other
components or al ternatively to be pulled out into a different melodic stream. The
stimuli consisted of a complex of three pure tones (at 500, 1000, and 2000 Hz)
that alternated repeatedly with a single "captor" pure tone (at 500,
1000, or 2000 Hz). The amplitudes of the components of the complex tone either
were equal or increased or decreased with frequency. The amplitude of the
"captor" tone was always equal to that of the "target"
component of the frequency with which it alternated. The relative onsets and
offsets of the components of the complex tone were also varied. Subjects judged
the repetition rate of the captor tone. If this rate was judged to be slow, the
components of the complex tone were considered to be fused into a single unit. However,
if this rate was judged to be fast, the target component of the complex tone
was considered to have been pulled into the same stream as the captor.
Various
findings emerged from this study. First, the tendency for the formation of
melodic streams was found to be greater when the repeating tone was at 500 Hz
than when the tone was at one of the other two frequencies. Second, the
tendency to fusion was greatest for tones in which the relative amplitudes of
the components decreased with frequency, a situation most like that commonly
encountered in the natural environment. Third, when the target components led
the other components of the complex tone at onset, there was an increased
tendency to produce melodic streams. This was also true when the target
component lagged the other components at offset. However, when the target
component lagged the others at onset or led them at offset, no such effect
occurred.
The
effects of fusion and separation of two gliding tones were studied by Steiger
and Bregman (1981). Here the tones glided in parallel on a log frequency scale,
and the glides were repeatedly presented in alternation with a pure tone
"captor" glide. Subjects judged whether the stimulus was
"fused" (i.e., whether the sequence appeared as an isochronous
alternation of a pure tone with a rich tone) or "decomposed" (i.e.,
whether the sequence appeared to contain three tones in each cycle). The
tendency for the stimulus to be judged as decomposed was enhanced when the
captor and target glides were in the same frequency range, and also when the
captor and target glides had the same orientation.
A
sudden change in the amplitude of a component of a tone complex can cause this
component to stand out perceptually. This was demonstrated by Kubovy (Note 4).
He presented sub jects with an eight-tone chord whose components were successively
turned off abruptly for 80 msec and then restored to full amplitude. This
manipulation occurred at a rate of three per second. The subjects perceived a
melody that corresponded to the order in which the tones were subjected to this
momentary amplitude disparity. For this pitch segregation effect to occur, it
was necessary that the frequency spacing between successive tones be greater
than the critical band.
1.1.3.
Familiarity. Sounds with familiar spectral shapes, such as human voices and
musical instrument tones, appear to fuse more readily than sounds with
unfamiliar spectral shapes. Informal observations show that the percept of a
particular vowel is lost when its spectral envelope is shifted slightly in
frequency, even though the relative amplitudes are preserved. Other factors such
as the relative growth and decay of individual partials also appear to
contribute to familiarity. Unfortunately no quantitative data on the issue are
available at present.
1.2. Grouping of Sound Sequences in Space
A
useful technique for studying grouping phenomena in hearing is to present two
different pitch sequences in parallel, one to the left of the listener, and the
other to the right. In most experiments, stimuli have been presented
dichotically via head phones; however, in some experiments stimuli have been
presented via spatially separated loudspeakers. This technique enables
different stimulus dimensions to be set in opposition to each other as bases
for grouping. Thus, for example, grouping by frequency or by amplitude may be
opposed to grouping by spatial location. At the same time, different principles
governing grouping along any given dimension may be set in opposition to each
other. For example, grouping by proximity may be op posed to grouping by good
continuation. This section describes findings obtained with this technique and
discusses their theoretical implications.
1.2.1. Auditory Illusory Conjunctions. When two
sequences of tones emanate simultaneously from different regions of space, and
the onsets and offsets of these tones are synchronous, striking perceptual
illusions are generally produced. We may characterize a tonal stimulus as a
bundle of attribute values, that is, as having a pitch, a location, a loudness, and a timbre. In the situation just outlined,
these bundles of attribute values fragment and recombine, so that illusory
conjunctions result. (See also Treisman, Chapter 35.)
This anomalous recombination suggests that all auditory stimuli are at some
stage in the processing system fragmented into their separate attributes and
that this process of fragmentation is followed by a process of perceptual
synthesis in which the different attribute value are recombined. Under most
circumstances the stimuli are re constructed correctly; however, we should not
assume that this
necessarily occurs.
Striking
individual differences are manifest in the types of illusion that are produced
in this situation. Further, these differences correlate strongly with
handedness and may be re lated to patterns of cerebral dominance. This implies
that they have an innate basis.
1.2.2. The Scale Illusion. One example of the
creation of strong illusory conjunctions is provided in the scale illusion (D.
Deutsch, 1975c, 1975e). The configuration that produced the illusion is
illustrated in Figure 32.4(a). It can be seen that this consisted of a major
scale (see Note 5), which was presented simultaneously in both ascending and
descending form. When a tone from the ascending scale was delivered to one ear,
a tone from the descending scale was simultaneously delivered to the other ear,
and successive tones in each scale alternated from ear to ear. This pattern was
repeatedly presented ten times without pause. All tones were sine waves of
equal amplitude and 250 msec in duration.
When
presented with this configuration, no subject perceived the sequence of tones
that was delivered to one ear or to the other, and none perceived a full
ascending or descending scale. Instead, the successive tones were always
grouped together on the basis of frequency range. All subjects perceived a
sequence of four tones that repeatedly descended and then ascended. Be yond
this, percepts were divisible into two categories. Most subjects also perceived
a second stream of lower tones that repeatedly ascended and then descended. The
second stream moved in contrary motion to the first [Figure 32.4(b)]. This
percept therefore included all the pitches in the configuration; however, these
were separated into two streams on the basis of frequency range.

Table 32.1. Numbers of Right-Handers and Left-Handers Perceiving Both
the Higher and the Lower Pitch Sequences in the Scale illusion ("Both"), and Those Perceiving Only
the Higher Pitches ("Single")
Streams
Handedness Both Single
The
right-handers tended significantly to hear both
streams; however, the left-handers did not show such a tendency (from
D. Deutsch, Two-channel listening to musical scales, journal of the
Acoustical Society of America, 1975, 57. Reprinted with
permission.)
A
minority of subjects perceived instead only one stream of four tones that
repeatedly descended and then ascended. This corresponded to the higher
sequence of tones; little or nothing of the lower sequence was perceived.
Table
32.1 shows the numbers of right-handed and left handed subjects who obtained
these two categories of percept. As can be seen, the two handedness groups
differed significantly on this measure. Further, in considering those subjects
who perceived both streams, significant differences between the two handedness
groups also emerged. Most right-handers obtained an illusion whereby the higher
tones all appeared localized in one ear and the lower tones in the other ear. As
shown in Table 32.2, there was a highly significant tendency to perceive the
higher tones in the right ear and the lower tones in the left ear, and also to
maintain a given localization pattern when the earphone positions were
reversed. The remaining right-handers obtained a variety of idiosyncratic
localization percepts, as did those who perceived only one stream. Most
left-handed subjects who perceived both streams also localized all the higher
tones in one ear and all the lower tones in the other ear. However, as shown in
Table 32.2, these subjects did not display the same localization tendencies as
did the right-handers. The remaining left-handed subjects reported a variety of
idiosyncratic localization percepts.
Table 32.2. Localization Patterns in the Scale Illusion, Displayed for Those
Subjects who Perceived All the Higher Tones in One Ear
and All the Lower Tones in the Other Ear

Figure 32.4.
.
(a) Stimulus configuration that produced the scale illusion. This consisted of
a major scale, presented simultaneously in both ascending and descending form. When
atone from the ascending scale was delivered to one ear, a tone from the
descending scale was simultaneously delivered to the other ear, and successive
tones in each scale alternated from ear to ear. All tones were of equal
amplitude and 250 msec in duration. There were no pauses between tones. (b)
Percept most commonly obtained. This consisted of two melodic lines, a higher
one and a lower one, that moved in contrary motion. The higher tones all
appeared to be emanating from one earphone, and the lower tones from the other
earphone. (From D. Deutsch, Two-channel listening to musical scales, Journal
of the Acoustical Society of America, 1975, 57. Reprinted
with permission.)
To
summarize these findings, in considering what attribute was used as a
basis for grouping, organization by spatial location never occurred; rather
organization was always on the basis of frequency (see also Kubovy, 1981). Second,
in considering which principle was used, organization was always on the
basis of frequency proximity. Either listeners heard two melodic lines, one
corresponding to the higher tones and the other to the lower tones, or they
heard the higher tones alone. Third, there were substantial individual
differences in the way that this configuration was perceived, both in terms of what
was perceived and in terms of where the sounds appeared to be coming
from. These individual differences correlated strongly with handedness.
Auditory
illusory conjunctions have been shown to occur under broader circumstances
also.
This
powerful illusion appears as a good example of un
conscious inference in perception. Our auditory environment is very complex,
and the assignment of sounds to their sources is rendered difficult by the
presence of echoes and reverberation. So when a sound mixture is presented such
that both ears are stimulated simultaneously, we cannot judge from first-order
localization cues (see Note 7) alone which components of the total spectrum
should be assigned to which source. We therefore need to utilize other cues in
making such judgments. One such cue is similarity of frequency spectrum.
Similar sounds are likely to be coming from the same source, and different
sounds from different sources. It is therefore reasonable for the listener to
conclude that tones in one frequency range are coming from one source, and that
tones from a different frequency range are coming from another source. The
tones are therefore perceptually reorganized in space in accordance with this
interpretation (D. Deutsch, 1975c).
1.2.3.
Grouping of Nonsimultaneous Sound Sequences. If the above line of reasoning is correct, we
should expect that perceptual grouping of parallel pitch sequences would be
strongly influenced by the salience of the first-order localization cues. If,
in contrast to the conditions just described, such cues were strong and
unambiguous, channeling by spatial location would be expected to take
precedence over channeling by frequency range. One can produce such a situation
by employing sequences in which the tones at the two ears are clearly separated
in time.
To
examine this hypothesis, perceptual grouping was examined as a function of the
temporal relationships between the signals arriving at the two ears (D.
Deutsch, 1979a). Subjects were asked to identify rapid melodic patterns whose
component tones switched from ear to ear. In one set of conditions, input was to
one ear at a time; in another set, input was to both ears simultaneously. It
was predicted that when input was to one ear at a time identification of the
melody should be difficult, reflecting perceptual grouping by spatial location.
However, when both ears receive input simultaneously, identification of the
melody should be much easier.
Subjects
were presented with sequences of pure tones. Each sequence consisted of ten
repetitions of a basic eight-tone melody. All tones were of equal amplitude and
30 msec in duration, with tones within a melody separated by 100-msec pauses. Two
such melodies were employed, and the subjects identified on each trial which of
these had been presented.
The
experiment employed four conditions, which are illustrated in Figure 32.5. In
Condition A, all tones of the melody were presented simultaneously to both
ears. In Condition B, the component tones of the melody were distributed in random fashion between the ears. Condition C was
identical to Condition B except that the melody was accompanied by a drone. Whenever
a tone from the melody was presented to the right ear, the drone was
simultaneously presented to the left ear, and vice versa. Condition D was
identical to Condition C except that the drone was always presented to the same
ear as the tone from the melody.
The
percentages of correct identifications of the melodies in the different
conditions of the experiment are shown on Figure 32.5. It can be seen that
excellent performance was obtained in Condition A, in which the melodies were
presented binaurally. In contrast, performance in Condition B, in which the
tones from the melodies were distributed between the ears, was very poor. The
procedure of switching the tones from ear to ear thus produced a considerable
decrement in identification performance. However, in Condition C, in which a
contralateral drone was presented so that input was to both ears
simultaneously, the performance level was again very high. This finding cannot
be attributed to processing the harmonic relationships between the drone and
the melody because in Condition D, in which the drone was presented to the same
ear as the melody component, performance was below chance. In this last
condition, input was no longer to the two ears simultaneously.
This
experiment demonstrates that temporal relationships between tones emanating
from different spatial locations are important factors in determining how the
tones are perceptually grouped. When signals are emanating from two locations
si multaneously, as in Condition A and C, it is easy to integrate the
information arriving at the two ears into a single perceptual stream. However,
when the signals coming from the two locations are clearly separated in time,
as in Conditions B and D, grouping by spatial location is so powerful as to
prevent the listener from combining the tones to produce an integrated percept.
We may next enquire what happens in the intermediate case, where inputs to the two ears overlap but are not strictly synchronous. This condition brings us closer to normal listening. and also to the case where streams of speech are presented in parallel to both ears. A second experiment investigated the effects of onset-offset asynchrony between the components of the melody and the contralateral drone. In the asynchronous conditions, all tones were again 30 msec in duration, and th drone either led or lagged the melody components by 15 msec

Figure 32.5. Percentage of errors in
identification of melodic patterns when the component tones of the patterns
switched between ears. On each trial, ten repetitions of a basic
eight-tone pattern were presented. All tones were 30 msec in duration, and
tones within a pattern were separated by 100-msec pauses. Two such melodies
were employed, and subjects identified on each trial which of these had been
presented. In Condition A (melody presented
binaurally) excellent performance was obtained. In Condition B (melody distributed
between ears) performance was very poor. In Condition C (contralateral drone
accompanying melody) performance levels were again high. In Condition D
(ipsilateral drone accompanying melody) performance was below chance. (From D. Deutsch, Binaural integration of melodic patterns,
Perception and Psychophysics, 1979, 25. Reprinted with permission.)
or the right ear tones led or lagged the left ear tones by 15 msec. Performance
levels in these conditions were significantly lower than when the melody components
and the drone were strictly synchronous, and they were also significantly
higher than when the melody components switched between ears without an
accompanying drone. This is as expected on the present line, of reasoning.
A
similar experiment was performed by Judd (1979). Two repeating stimulus
patterns were constructed, from four square wave tones, each 100 msec in
duration. The two patterns were as shown on Figure 32.6. It can be seen that,
taking each channel separately and treating the patterns as cyclically
repeating, the tones in the two patterns were identically ordered. However,
when the channels were combined, two different melodic patterns emerged
instead. Subjects were presented with pairs of these patterns and were required
to judge whether the members of each pair were the same or different. On half
of the trials, the silent gaps between the tones were replaced by noise. It was
found that performance was better in the noise-filler condition than in the
silent gap, condition. Judd interpreted this finding as due to the noise
degrading the localization information, which encouraged grouping of successive
tones on the basis of frequency range rather than spatial location.
Schubert
and Parker (1956) performed an experiment that may be interpreted similarly. These
authors measured the amount of interference in speech perception that was
produced by switching the signal from ear to ear. They found that adding noise
to the contralateral ear reduced this interference effect (Figure 32.7). It may
plausibly be argued that the ongoing speech-noise signal was interpreted by the
listener in terms of two sources, one emitting noise and the other emitting
speech, whereas the ongoing speech-silence signal was interpreted by the
listener in terms of two independent speech sources.
1.2.4. The Hypothesis of a Slow Switching Mechanism. The problem of degradation of processing
when information is ' switched from ear to ear has been addressed in other
contexts. For instance, Cherry and
A
related paradigm involves recall of lists of digits that are dichotically presented.
When two such dichotic lists were delivered at fast rates, recall was found to
be better by ear than by temporal order, the latter task requiring switching
between ears (Broadbent, 1954, 1958).
Figure 32.6. Stimulus configurations employed to investigate the effect of contralateral noise on the ability to discriminate melodic patterns whose component tones alternated between ears. Tones were 100 msec in duration, with fundamental frequencies of (1) 912 Hz, (2) 1024 Hz, (3) 1150 Hz, and (4) 1290 Hz. Discrimination performance was enhanced when the gaps between the tones were replaced by noise. (From T. Judd, Comments on Deutsch's musical scale illusion, Perception and Psychophysics, 1979, 26. Reprinted with permission.)

Further,
subjects showed poorer recall of successive lists of digits when these were
presented alternately to the two
ears than when they were presented binaurally (A. Treisman,1971). This finding
cannot be ascribed to perceptual interference with the basic units of speech,
since there was no disruption of the verbal items in these experiments. Some
difficulty in the ability to switch attention between the ears was therefore
hypothesized.
In
contrast to the above arguments for a switching limitation, powerful general
arguments may be made against the idea that information from the two ears
cannot be dealt with in rapid succession.
Figure 32.7. Percentages of words correctly repeated as a function of rate at which the speech signal was switched from ear to ear. The lower curve shows the results for trials with silence in the contralateral ear. The upper (dotted) curve shows the results for trials in which noise was delivered to the contralateral ear. The contralateral noise resulted in enhanced speech intelligibility, especially at switching rates of around 4 Hz, where intelligibility was otherwise substantially reduced. (From E. D. Schubert & C. D. Parker, Addition to Cherry's findings on switching speech between two ears, Journal of the Acoustical Society of America, 1956, 27. Reprinted with permission.

In
everyday listening, the information arriving at the two ears is never
identical, and the running cross correlations performed on
this information are very important for several functions. One such
function is localization, and the other is the suppression of echoes and
reverberation (Haas, 1951; Tobias & Schubert, 1959; Wallach, Newman, &
Rosen zweig,1949). The auditory elements that are
compared for such functions may be separated by only a few microseconds. Such
an ability to utilize information entering the two ears in rapid succession is
not consistent with the notion of a slow switching mechanism.
Two
conflicting sets of phenomena have therefore been re ported, one arguing for a
decrement in processing information where rapid switching between ears is
involved, and the other arguing against such a decrement. We may resolve this
conflict on the following line of reasoning. An important function of our
auditory system is to separate out the signals emanating from different
sources. If such perceptual separations were not accomplished we would not know
which elements of the acoustic spectrum to link with, so as to form high-order
abstractions. It is necessary, therefore, that there exist mechanisms that
inhibit the formation of higher-order linkages between acoustic elements that
are likely to be emanating from different sources. Since our acoustic
environment is very complex, such mechanisms must be flexible and employ
multiple criteria. Thus certain configurations involving input to the two ears
would be inter preted as coming from the same source,
so that integration of this information should be easy. Yet other
configurations would best be interpreted as emanating from different sources,
so that integration should be difficult. According to this hypothesis, when a
decrement in integrating information arriving at the ears occurs, this is due
not to capacity limitation, but rather to a mechanism that we have evolved to
prevent confusion in monitoring our auditory environment (see Bregman,1978,1981, for an analogous argument based on findings
involving various monaural tasks).
1.2.5.
The Octave Illusion. In the experiments described in Section 1.2.2, when tones
were presented to both ears si multaneously with synchronous onsets and
offsets, sequential grouping by frequency proximity was the rule. Grouping by
ear of input occurred only when there were temporal separations between the
stimuli presented to the two ears. We now turn to an examination of certain
situations in which grouping by ear of input occurs even though such input is
strictly simultaneous. It will be seen that this happens only under special
conditions of frequency relationship between the tones presented in sequence at
the two ears.
One
such situation is illustrated in Figure 32.8(a). This shows the stimulus
pattern that gives rise to the octave illusion (D. Deutsch, 1974, 1975c). It
can be seen that two tones that were spaced an octave apart (400 and 800 Hz)
were repeatedly presented in alternation. The identical sequence was delivered
to the two ears simultaneously; however, when the right ear received the high
tone the left ear received the low tone and vice versa. So in fact the listener
was presented with a single, continuous, two-tone chord, but the ear of input
for each component switched repeatedly.
This
configuration produced a number of illusory percepts, the most common of which
is illustrated in Figure 32.8(b). It can be seen that this consisted of a
single tone that alternated from ear to ear, and whose pitch simultaneously
alternated from one octave to another in synchrony with the localization shift.

When
the earphones were placed in reverse position, most listeners found that the apparent
locations of the high and low tones remained fixed. Thus it seemed to these
listeners that the earphone that had been producing the high tones was now
producing the low tones, and that the earphone that had been producing the low
tones was now producing the high tones.
If we
assume that there are two separate brain mechanisms, one for determining what
pitch we hear and the other for de termining where the sound is located, we are
in a position to advance an explanation for this illusion. The model is diagrammed
in Figure 32.9. To determine the perceived pitch, the information arriving at
one ear is followed, and the information arriving at the other ear is
suppressed. However, each tone is localized in the ear receiving the
higher-frequency signal, regardless of which frequency is in fact perceived (D.
Deutsch, 1975c). The combined output of these two mechanisms, for the case of
the listener whose pitch percept corresponds to the frequencies presented to
the right ear, should result in the percept of a high tone to the right
alternating with a low tone to the left. For the case of the listener whose
pitch percept corresponds to the frequencies presented to the left ear instead,
the resultant percept should be that of a high tone to the left alternating
with a low tone to the right.
This
model received confirmation in a further experiment (D. Deutsch & Roll,
1976). Subjects were presented with the basic pattern shown in Figure 32.10(a).
This again employed tones standing in octave relation. It can be seen that one
ear received three high tones followed by two low tones, while simultaneously
the other ear received three low tones followed by two high tones. This basic
pattern was repeatedly presented ten times without pause.
As
expected from the model, most subjects perceived a pat tern of pitches that
corresponded to the frequencies presented either to the right ear or to the
left ear. In other words, they heard a repeating sequence consisting either of
three high tones followed by two low tones, or of three low tones followed by
two high tones. However, each tone was localized in the ear that received the
higher frequency. This is illustrated in Figure 32.10(b). When Channel A was
presented to the right ear and Channel B to the left, the listener heard a
repeating sequence of three high tones to the right followed by two low tones
to the left. When, however, Channel A was presented to the left ear and Channel
B to the right, the listener now heard a repeating sequence of two high tones
to the right followed by three low tones to the left.
Most subjects in the D. Deutsch (1974) experiment perceived a single high tone in one ear alternating with a single low tone in the other ear.
Figure 32.9. Diagram showing how the outputs of the pitch and localization mechanisms combine to produce the octave illusion. Filled boxes indicate high tones (800 Hz) and unfilled boxes indicate low tones (400 Hz). The pitch mechanism follows the sequence of frequencies presented to one (dominant) ear rather than to the other. However, the localization mechanism follows the higher-frequency signal, regardless of whether the higher or the lower frequency is perceived. The outputs of these two mechanisms combine to produce the percept of a high tone in one ear alternating with a low tone in the other ear. (From D. Deutsch, The octave illusion and auditory perceptual integration, in j. V. Tobias & E. D. Schubert (Eds.), Hearing research

Figure 32.10. Stimulus patterns and percepts in experiment to test hypothesized basis for the octave illusion. Filled boxes represent tones of 800 Hz and unfilled boxes represent tones of 400 Hz. The basic patterns shown were presented ten times without pause. In accordance with the hypothesis, most subjects reported the pattern of pitches that was presented to the right ear; yet all subjects localized each tone to the ear receiving the higher-frequency signal. (From D. Deutsch & P. L. Roll, Separate 'what' and 'where' decision mechanisms in processing a dichotic tonal sequence, Journal of Experimental Psychology: Human Perception and Performance, 2. Copyright 1976 by American Psychological Association. Reprinted with permission.)

However,
some subjects instead perceived a single tone that alternated from
ear to ear, whose pitch either did not change or changed only slightly with a
shift in its apparent location. Other subjects heard more complex patterns,
such as two low tones that alternated from ear to ear with an intermittent high
tone in one ear. Such patterns were usually unstable, exhibiting frequent
changes with continued listening.
The
individual differences in perception of this illusion were found to correlate
with handedness. As shown in Table 32.3, the proportion of subjects reporting
complex percepts was substantially higher in the left-handed than in the
right-handed population (see also Craig, 1979). A second handedness correlate concerned the localization
patterns for the high and low tones. As shown in Table 32.4, most right-handers
heard the high tone on the right and the low tone on the left, regardless of
the positions of the earphones (see also Geffen & Reynolds, 1982; McClurkin
& Hall, 1981). In contrast, the left-handers did not show a significant
tendency to localize the high and low tones
Table 32.3.

Percentages
of right-handers and left-handers are displayed. "Octave"
indicates the percept of a single tone that alternates from ear
to ear, whose pitch simultaneously alternates from one octave to
the other. "Single Pitch" indicates the percept of a single
tone that alternates from ear to ear, whose pitch either does
not change or shifts slightly with a change in localization. "Complex"
comprises a number of different complex percepts. The proportion of
subjects obtaining complex percepts was considerably higher among
left-handers than among right-handers. (from D. Deutsch,
An auditory illusion, Nature, 151. Copyright 1974 by Macmillan Journals Ltd. Reprinted with permission.)
Table 32.4.

Each subject was given two
presentations of the sequence, for 20 sec each time, with earphones
placed first one way and then the other. The numbers of
right-handers and left-handers obtaining a given localization pattern
are displayed. RR: High tone localized in the right ear and low tone
in the left on both presentations. LL: High tone localized in the
left ear and low tone in the right on both presentations. Both: High tone
localized in the right ear and low tone in the left on one
presentation; and high tone localized in the left ear and low
tone in the right on the other. Right-handers tended strongly to hear
the high tone in the right and the low tone in the left; however,
left-handers did not display this tendency either way, and showed a
greater tendency to change their localization patterns.
Given the strong correlates
with handedness in perception of the octave illusion, it is interesting to
consider the neurological differences on which such correlates might be based. The
over whelming majority of right-handers are left-hemisphere dom
inant, but this is true of only about two-thirds of left-handers. Further, the
majority of right-handers have a clear dominance of the left hemisphere;
however, a substantial proportion of left-handers have some bilateral
representation (Goodglass & Quadfasel, 1954; Hdcaen & de Ajureaguerra,
1964; Hhcaen & Piercy, 1956; Milner, Branch, & Rasmussen, 1966;
Subirana, 1969; and Zangwill, 1960). It appears reasonable to assume
that these
patterns of dominance are reflected in percepts of the octave illusion in two
ways. First, the localization of the high tone on the right and the low tone on
the left reflects left hemisphere dominance, with the localization of the high
tone on the left and the low tone on the right reflecting right-hemi sphere
dominance. Second, unambiguous localization patterns reflect clear dominance,
with complex percepts reflecting more cerebral equipotentiality.
Localization
patterns have been shown to correlate not only with handedness, but also with
familial handedness back ground. In a study by D. Deutsch (1983b), subjects
with left or mixed-handed parents or siblings were found less likely to
localize the high tone on the right and the low tone on the left than were
subjects without left- or mixed-handed parents or siblings. This was found true
for right-handed, mixed-handed, and left-handed populations.
A
further question of interest is whether the interactions underlying the
localization and pitch effects in the octave illusion occur between pathways
conveying information from the two ears, or whether instead pathways conveying
information from different regions of auditory space are involved. To
investigate this question, the stimuli were presented through spatially separated
loudspeakers rather than earphones (D. Deutsch, 1974, 1975c). An analogous
illusion was obtained under these conditions: The subjects perceived a high
tone that appeared to be coming from one speaker, which alternated with a low
tone that appeared to be coming from the other speaker. This effect was
obtained even with the two speakers placed side by side, facing the listener,
which shows that highly specific regions of auditory space were involved here.
We
shall now consider only what sequence of pitches is perceived in the octave
illusion and leave aside the issue of where the tones appear to be located. In
the octave illusion, channeling of pitch sequences was always on the basis of
spatial location. However, in the scale illusion, channeling was always on the
basis of frequency proximity instead. Yet the stimuli producing these two
illusions were in several ways very similar. In both cases, repeating sequences
of sine-wave tones at equal amplitudes and durations were presented, with
synchronous onsets and offsets. Also in both cases, the frequencies presented
to one ear always differed from the frequencies simultaneously presented to the
other ear. Nevertheless, radically different channeling strategies arose in
response to these two stimulus patterns. It is particularly noteworthy that,
when two tones standing in octave relation were simultaneously presented in the
scale illusion, both these tones were generally perceived. But when two tones
standing in octave relation were simultaneously presented in the octave
illusion, only one of these tones was generally perceived. Such differences in
channeling strategy must therefore arise from differences in the patterns of
frequency relationship between successive tones.
Another
characteristic of the stimulus producing the octave illusion was that the
frequency emanating from one side of space was always the same as the frequency
that had just emanated from the opposite side. It therefore seemed plausible to
hypothesize that this sequential relationship was responsible for producing
channeling by spatial location. A further set of experiments was performed to
test this hypothesis (D. Deutsch, 1980a,1981).
In the first experiment, listeners were presented with se quences consisting of 20 dichotic chords. Two conditions were compared, using the basic patterns illustrated in Figure 32.11(a).
Figure 32.11.
(a)
Configurations used in first experiment examining effects of sequential
interactions on ear dominance. Each sequence consisted of 20 dichotic chords.
In Condition 1, the two ears received the same frequencies in succession;
however, this was not true in Condition 2. (b) Percentage of following of
nondominant'ear in these two conditions, as a function of amplitude differences
between the tones at two ears. In Condition 1, the dominant ear was followed
until a critical level of amplitude relationship was reached, and the
nondominant ear was followed beyond this level. However, there was no following
on the basis of ear of input in Condition 2. (From D. Deutsch, Ear dominance
and sequential interactions, journal of the Acoustical Society of

The pattern
in Condition 1 consisted of the repetitive presentation of a single chord. The
tones comprising this chord stood in octave relation and alternated from ear to
ear in such a way that when the high tone was in the right ear the low tone was r in the left ear and vice versa. Here
the two ears received the :H same frequencies in
succession. The sequence presented to the . . right ear began with the high tone and ended with the low
tone ,: on half of the trials, while this order was reversed on the other _7
half. The subjects were asked to judge whether the sequence began with the high
tone and ended with the low tone or whether ": it began with the low tone
and ended with the high tone. It was thus possible to infer which ear was being
followed for pitch.
In Condition 2, the basic pattern consisted of the repetitive presentation of
two dichotic chords in alternation. The tones comprising the first chord formed
an octave and the second a ``1
minor third; thus the entire four-tone combination constituted a major triad.
Note that here the two ears did not receive the same frequencies in succession.
The right ear received the higher tone of the first chord and the lower tone of
the last chord on half of the trials. The order was reversed on the other half
of the trials.
The relationship between
the amplitudes of the tones presented to the two ears was varied systematically
across trials, and plots were made of the extent to which each ear was followed
as
a function of these amplitude relationships. The results are displayed in
Figure 32.11(b). It is evident that in Condition 1 the frequencies presented to
one ear were followed until a critical level of amplitude relationship was
reached, and the frequencies presented to the other ear were followed beyond
this level. However, there was no following on the basis of ear of input in
Condition 2, even when the signals presented to the two ears differed
substantially in amplitude. Subjects instead followed on the basis of frequency
proximity: Three of the subjects consistently followed the low tones, and one
subject consistently followed the high tones. This result is in accordance with
the assumption that channeling by spatial location here occurs when the same
frequencies emanate in succession from different regions of auditory space.
In a
second experiment only two dichotic chords per trial were presented. The
comparison was again between two conditions. These employed the basic patterns
shown in Figure 32.12(a). In Condition 1, the basic pattern consisted of two
presentations of the identical chord. The component tones of this chord formed
an octave, in such a way that one ear received first
the high tone and then the low tone, while simultaneously the other ear
received first the low tone and then the high tone.
Figure 32.12. (a) Configurations used in second experiment examining effects of sequential interactions on ear dominance. Only two dichotic chords were presented on each trial. In Condition 1, the two ears received the same frequencies in succession, but this was not true in Condition 2. (b) Percentage of following of nondominant ear in these two conditions, as a function of amplitude differences between the tones at the two ears. In Condition 1, the dominant ear was followed until a critical level of amplitude relationship was reached, and the nondominant ear was followed beyond this level. However, there was no following on the basis of ear of input in Condition 2. (From D. Deutsch, Ear dominance and sequential interactions, journal of the Acoustical Society of America, 1980, 67. Reprinted with permission.)
Fig.32.12 [a,b]

Throughout
this condition the identical frequencies were employed. The basic pattern in
Condition 2 consisted of two dichotic chords. In each case the component tones
of the chord formed an octave, but the tones in the two chords were of
different frequencies. Two pairs of chords were employed, and trials employing
these different chord pairs occurred in strict alternation. In this way, any
given chord was repeated only after a substantial time period during which
several other chords had been interpolated.
The
results are displayed in Figure 32.12(b). This again shows the extent to which
each ear was followed as a function of the amplitude relationships between the
tones at the two ears. In Condition 1, following was clearly on the basis of
ear of input. But such following did not occur in Condition 2, even when there
were substantial amplitude differences between the tones at the two ears. Instead,
these sequences were consistently followed on the basis of their overall
contour: The subjects' patterns of response indicated an ascending sequence
when the second chord was higher than the first, and a
descending sequence when the second chord was lower than the first. Such a
result held even when the tones at the two ears differed substantially in
amplitude.
Thus
in both experiments when the same frequencies emanated successively from
different spatial locations, channeling by spatial location always occurred. Otherwise,
channeling was on the basis of frequency range. It is noteworthy that relative
amplitude turned out not to be an important factor in either experiment. Following
by frequency proximity or contour occurred despite large amplitude differences
between the signals arriving at the two ears. When following was by ear of
input, a shift from following one ear to the other occurred not at the point
where the amplitude balance shifted from one ear to the other, but at some
other different level of amplitude relationship that varied from subject to
subject (see Note 8). This finding lends support to Kubovy's (1981)
"Theory of Indispensable At tributes," in which it is argued that the
auditory system will organize stimuli on the basis of frequency, as opposed to
other attributes such as location or amplitude.
A
further question is whether the absence of following by ear of input in the
second condition of these two experiments was due to the delay between
successive presentations of the same frequencies to the two ears or to the
interpolation of different frequencies. A further experiment was performed to
study the effect of interpolated frequencies. The patterns employed are shown
in Figure 32.13(a). These two patterns were identical except that in Condition
2 a single tone was interpolated between the two presentations of the dichotic
chord. Listeners were asked to ignore this tone. As can be seen from Figure
32.13(b), following of the preferred ear was less pronounced in the condition
where the extra tone was interpolated than in the condition where there was no
interpolated tone.
To investigate the effect of temporal delay, the time interval between onsets of the successive tones was varied. Two methods of varying this temporal parameter were used. Either the durations of the tones were altered, or gaps were interpolated between them [Figure 32.14(a)]. The results, shown in Figure 32.14(b), demonstrated that the degree of following of the pre ferred ear lessened with increasing time between onsets of the identical frequencies at the two ears. Whether such a time increase was produced by lengthening the durations of the tones or by interpolating silent gaps between them did not matter. Thus channeling by preferred spatial location was shown to be reduced both by interpolated information and by temporal delay.
FIG.32.13[a,b]

(b) Figure 32.13. (a) Configurations used in
third experiment examining the effects of sequential interactions on ear
dominance. Conditions 1 and 2 were identical except that in Condition 2 a
single binaural tone was interpolated between the two dichotic chords and
subjects were asked to ignore this tone. (b) Percentage of following of
nondominant ear in these two conditions, as a function of amplitude differences
between the tones at the two ears. The interpolation of a single tone in
Condition 2 significantly reduced the size of the ear dominance effect. (From
D. Deutsch, Ear dominance and sequential interactions, journal of the
Acoustical Society of
We may
ask how a system producing such a set of perceptual phenomena could be useful
to us. It may be that these phenomena are of value in permitting us to follow
new, ongoing auditory information with a minimum of interference from echoes
and reverberation. Under natural conditions, when we hear the same frequency
emanating in close temporal succession from two regions of auditory space, the
second occurrence is in all probability an echo. This explanation becomes less
probable as the delay between two such occurrences is lengthened. Further, if
different frequencies are interpolated between two occurrences of the same
frequency, this interpretation also becomes less probable. It seems, therefore,
that the effects we have found are based on a mechanism that serves to
counteract misleading effects in our auditory environment (see Note 9). Another
such mechanism is the precedence effect, as described by Wallach, Newman, and
Rosenzweig (1949) and by Haas (1951). Here a single auditory image may be
obtained when the same frequency emanates from two different spatial locations,
with onset dis parities of less than around 70 msec.
1.2.6.
Grouping of Phase-Shifted Tones. Another approach to the issue of grouping by
frequency and by spatial location was developed by Kubovy and his co-workers. Kubovy,
Cutting, and McGuire (1974) presented a set of eight simultaneous and
continuous sine-wave tones to both ears.
All
tones were at equal amplitude, and their frequencies
were such that they comprise, a major scale (see Note 5). The tones were
interaurally phasc shifted in sequence, with the result that a melody was hearc
that corresponded to the phase-shifted tones. However, tht melody was not
detected when the stimulus was presented tc either ear
alone. At the phenomenological level, the melody,, was heard as inside the head
but displaced to one side of tht midline, while a background noise was heard as
displaced t, the other side, so it appeared to the listener as though a sourcE
in one spatial location was generating the melody and another source in a
different spatial location was generating the noise. A diagrammatic
illustration of this experimental situation is shown in Figure 32.15, taken
from Kubovy (1981). This effect is analogous to the Julesz stereogram (Julesz,
1971).
As pointed out by Kubovy (1981), there are two potential interpretations of this effect. On one hand, the segregation of the melody from the noise could be based on concurrent-difference cues; that is, the target tone may appear segregated because its interaural disparity differs from that of the background tones. Alternatively the segregation effect could be based on successive difference cues; that is, the target tone may appear segregated because it has changed its interaural disparity whereas the others had not.
Figure 32.14.

(a) Configurations used in
experiment to investigate the effects of temporal delay on ear dominance. (b)
Percentage of following of non dominant ear in the different conditions of this
experiment. The strength of ear dominance was reduced with increasing time
between onsets of the same frequencies at the two ears. There was no effect
depending on whether the differences between onsets were produces by
lengthening the durations of the tones or by interpolating silent gaps between
them. (From D. Deutsch, The octave illusion and auditory perceptual
integration, in J. V. Tobias & E. D. Schubert (Eds.), Hearing research
and theory, Vol. 1, Academic Press, Inc.. 1981. Reprinted with permission ]
Figure 32.15. FIG.32.16

Stimulus configuration such as employed by Kubovy,
Cutting, and McGuire (1974) to demonstrate grouping of phase-shifted tones. Each
slab represents a segment of sound about 300 msec in duration. The abscissa
represents interaural time disparity, which produces a shift in the
localization of the phase-shifted tone. The ordinate represents frequency on a
logarithmic scale. When presented with such a configuration, the listener
perceives a descending scale. (From M. Kubovy, Concurrent pitch-segregation and
the theory of indispensable attributes, in M. Kubovy
& J. Pomerantz (Eds.), Per ceptual organization, Lawrence Erlbaum
Associates, 1981. Reprinted with permission.)
Two
further configurations were therefore devised to determine which of these two
factors was responsible. The first is illustrated in Figure 32.16, which
displays a sequence of stimuli in which the target is distinguished from the
back ground only by concurrent-difference cues. Essentially, all changes occur
in the background tones; the target tone itself does not change. The second
configuration is illustrated in Figure 32.17, which displays a sequence of
stimuli that contain only successive-difference cues. The first stimulus
consists of four tones to the right and three to the left. The second stimulus
is identical to the first except that one of the tones has shifted from right
t9 left. The third stimulus is identical to the second except that one of the
tones has shifted from left to right.
Subjects
were presented with these three types of stimulus configuration, consisting of
confounded, successive-, and con current-difference cues. For each type, either
an ascending or a descending scale was presented, and subjects identified on
forced choice the direction of the scale. The pure successive difference cues
were found to be as effective as the confounded cues. The concurrent-difference
cues were less effective, though with these stimuli, performance levels were
still above chance. (The poorer performance was here attributed to the fact
that, in order to generate concurrent-difference cue stimuli, successive
difference cues were necessarily applied to the background, thus providing
contradictory information.)
In
another experiment, Kubovy and Howard (1976) studied the effect of
interpolating silent intervals between temporally adjacent chords in which
successive-difference cues had been introduced. The purpose of the experiment
was to measure the amount of time for which the auditory information in the
first chord persisted, so that it could be compared with that in the next
chord.
Figure 32.16. Stimulus configuration that has
only a concurrent-difference cue. All changes occur in the background
tones; these have their phases interaurally shifted. The target tone itself
does not change. The target tone appears segregated because its interaural disparity
differs from that of- the background tones. (From M. Kubovy, Concurrent
pitch-segregation and the theory of indispensable attributes, in M. Kubovy
& 1. Pomerantz (Eds.), Per ceptual organization,
Lawrence Erlbaum Associates, 1981. Reprinted with permission.)
TIME DISPARITY
Figure 32.17.

Stimulus configuration that has only a
successive-difference cue. The first stimulus consists of four tones to
the right and three to the left. The second stimulus is identical to the first,
except that one of the tones has been interaurally phase shifted from left to
right. The target tone appears segregated because it has changed its interaural
disparity whereas the others have not. (From M. Kubovy, Concurrent
pitch-segregation and the theory of indispensable attributes, in M. Kubovy
& 1. Pomerantz (Eds.), Perceptual organization, Lawrence Erlbaum
Associates, 1981. Reprinted with permission.)
Figure 32.18.

Stimulus configuration in experiment to study effect
of in terpolating silent intervals between temporally adjacent chords in which
successive-difference cues had been introduced. Each tone had a different
interaural time disparity, and a variable pause was introduced between successive
tones. In this example, the listener perceives an ascending scale. An estimate
of roughly 1 sec for the persistence of this type of memory was obtained,
though there was considerable individual variation. (From M. Kubovy, Concurrent
pitch-segregation and the theory of indispensable attributes,
in M. Kubovy & J. Pomerantz (Eds.), Perceptual organization, Lawrence
Erlbaum Associates, 1981. Reprinted with permission.)
All
chords consisted of six simultaneous tones, around 300 msec in duration, and
presented at equal amplitude. Each tone had a different interaural disparity, as
shown in Figure 32.18. A variable pause was introduced between successive
chords. Subjects judged whether an ascending or a descending scale had been
presented. The experiment yielded an estimate of roughly 1 sec for the
persistence of this type of memory, though considerable individual differences
were observed.
1.3. Grouping of Rapid Sound Sequences
1.3.1. Grouping by Frequency. In the auditory mode,
frequency
appears to be the most sensitive dimension along which grouping principles
operate. This is particularly well illustrated in experiments involving rapid
sequences of tones. The next four sections investigate the consequences of
grouping by prox imity along the frequency dimension and then describe evidence
for grouping by good continuation.
1.3.2.,_ Grouping by
Frequency Proximity. When a rapid sequence of tones is drawn from more than one
frequency range, the sequence tends to
split apart perceptually so that two or more melodic lines are perceived
in parallel. This phenomenon is exploited in musical composition by the
technique of pseudopolyphony, or compound melodic line. Here a single
instrument plays a rapid sequence of tones that are drawn from different pitch ranges, with the result that more than one melodic stream is
perceived in parallel. Figure 32.19(a) shows a segment of music that exploits
this principle. In Figure 32.19(b) the same segment of music is depicted, with
log frequency and time mapped into two
dimensions of visual space. It is interesting to note that grouping by
proximity clearly emerges in the visual analogue, just as it does in the
perceived music.
G. A.
Miller and Heise (1950) performed one of the first experimental demonstrations
of this grouping effect. They pre sented subjects with a sequence that
consisted of two tones alternating at a rate of 10 sec-1.
When the frequencies of these tones differed by less than
15%, the sequence was heard as a single coherent string. However, as the
frequency disparity between the alternating tones increased, the sequence was
heard instead as two repeating and unrelated tones. This phenomenon has come to
be termed fission. Heise and G. A. Miller (1951) examined the perception
of rapid sequences of tones that were composed of several different
frequencies. When the frequency of one of these tones differed sufficiently
from the rest, it was heard as in isolation from them.
Later
Dowling (1973a) presented two well-known melodies with their component tones
alternating at a rate of 8 sec-1. He found that recognition of these melodies
was very difficult when their pitch ranges overlapped; however, recognition was
easy when their pitch ranges differed. He explained this finding in terms of
the operation of the principle of proximity. When the components of the two
melodies were proximal in pitch, they were perceptually combined into a single
stream, with the result that they were difficult to disentangle. However, this
problem did not arise when the melodies were in different pitch ranges.
1.3.3. Temporal Coherence as a Function of Frequency
Proximity
and Tempo. The term temporal coherence is employed to describe the
subjective impression that a tonal sequence forms a connected series. In an
experiment to study the conditions giving rise to this effect, Schouten (1962)
varied the frequency relationships between successive tones in a sequence and
also varied their presentation rate. He found that with an increase in the
frequency separation between successive tones a reduction in presentation rate
was required to maintain the impression of a coherent stream.
Later
Van Noorden (1975) presented subjects with sequences of alternating tones and
instructed them either to attempt to hear temporal coherence or to attempt to
hear fission. He determined two boundaries by this method. The temporal
coherence boundary defined the threshold frequency separation as a function of presentation rate required for the subject to hear the sequence as
coherent. The fission boundary defined the threshold frequency
separation as a function of presentation rate required for the subject to hear
two disconnected series. These two boundaries are shown in Figure 32.20. It can
be seen that, when the subjects were attempting to hear coherence, decreasing
the presentation rate substantially increased the range of frequency separation
within which the sequence was heard as a single stream. However, when the
subjects were attempting to hear fission, decreasing the presentation rate had
little effect on threshold. In the region between these two boundaries, sub
jects could alter their listening strategies at will and so hear either fission
or coherence.
A
later experiment by Bregman and Bernstein (quoted in Bregman, 1978) confirmed
the interaction between frequency separation and presentation rate for
judgments of temporal coherence and found that this effect was maintained
throughout a considerable frequency range.
1.3.4. Grouping by Frequency Proximity in Relation
to Repetition. It has been demonstrated in several experiments that the splitting of
tonal sequences into separate streams on the basis of frequency proximity
develops with repetition. Van Noorden (1975) determined the temporal coherence
boundaries for two-tone, three-tone, and long repetitive sequences. In the case
of three-tone sequences, the frequency change was either unidirectional or
bidirectional. The results are shown in Figure 32.21. For unidirectional
three-tone sequences, temporal coherence occurred at rates that were faster
than for two-tone sequences. However, for bidirectional three-tone sequences,
the upper limit for temporal coherence was lower than for two-tone sequences. For
long repetitive sequences, the upper limit was lower still.
Figure 32.19.

Grouping of melodic stimuli on the basis of frequency
proximity. Two parallel melodic lines are perceived, each in a
different frequency range (passage is from Beethoven's Six Variations on the Duet
"Nel cor.piu non mi sento" from Paisiello's La Molinara. (a)
The passage in musical notation. (b) The passage with log frequency and time
mapped into two dimensions of visual space. (From D. Deutsch, The processing of
pitch combinations, in D. Deutsch (Ed.), The psychology of music,
Academic Press, Inc., 1982. Reprinted with permission.)
Bregman
(1978) presented subjects with sequences that consisted of two "high"
tones and a single "low" tone. At rapid presentation rates this
sequence split into two streams such that the upper stream appeared as an
alternation of the two high tones and the lower stream as the repeating
occurrence of the low tone. The number of tones packaged between 4-sec silent
periods was varied, and subjects adjusted the presentation rate to correspond
to the point at which splitting into two streams occurred. The results are
shown in Figure 32.22. It can be seen that as the size of the package increased
the threshold presen tation rate for splitting the sequence into two streams
decreased. This is in accordance with the results of Van Noorden (1975)
described earlier. Bregman interpreted these findings as follows. Stream
segregation (or fission) may be viewed as the result of a mechanism that
"parses" the auditory environment; that is, it groups together
components of the acoustic spectrum so as to reconstruct the original sources
of the sounds. It is reasonable to expect that such a mechanism would
accumulate evidence over time, and also with repeated presentation.
Further
evidence for this view has been provided in an experiment by Bregman and
Rudnicky (1975). Here two test tones were embedded in a four-tone pattern and
so were flanked by two "distractor" tones. Subjects were required to
judge the order of the test tones, and it was found that the presence of the
distractor tones made this task difficult. However, when
another stream of tones, termed "captor" tones, was moved close in
frequency to the distractor tones, the distractors com bined with the captors
to form a single stream. The test tones were therefore left in a stream
of their own. As a result, judgment of their order was facilitated. The authors
argued that in this situation the subject is presented with two simultaneous
streams of tones, and that the distractor can belong to either one of these,
but not to both simultaneously. This is in accordance with the hypothesis of an
auditory parsing mechanism: It is unlikely that any single tone would be
emanating from more than one source simultaneously.
The
cumulation of effect over time reported by Bregman (1978) is analogous to
cumulation effects in the octave illusion. Here the strength of tendency to
follow the frequencies presented to one side of space rather than to the other
also cumulates with repeated presentation, and cumulates more rapidly as
repetition rate increases. The strength of tendency to localize the perceived
sound toward the source of the higher-frequency signal in this illusion also
builds with repetitive presentation (D. Deutsch, 1976, 1978c). Such findings
may also be well in terpreted in terms of evidence accumulation.
1.3.5. Frequency Proximity and the Perception of
Temporal Relationships. When a rapid sequence of tones splits into two
separate streams, judgment of temporal relationships between elements of the
different streams is impaired. Bregman and
Figure 32.20. Figure 32.22

Temporal coherence boundary and fission boundary as a
function of frequency relationship between alternating tones, and of presen
tation rate. When the subject was attempting to hear coherence,
decreasing the presentation rate substantially increased the range of frequency
separation within which the sequence was heard as a single stream. However,
when the subject was attempting to hear fission, decreasing the presentation
rate had little effect on threshold. In region A, the sequence could be heard
only as two streams. In region C, it could be heard only as a single stream. In
region B, the subject could choose to hear the sequence either way. (From L. P.
A. S. Van Noorden, Temporal coherence in the perception of
tone sequences. Unpublished doctoral dissertation, Technische
Hogeschoel,
of six tones, three taken from a high-frequency range and three from a low-frequency
range. They found that when these tones occurred at a rate of 10 sec-1 subjects had difficulty in perceiving a pattern of high and
low tones that was embedded in the se quence. Dannenbring and Bregman (1976)
later reported that, when two tones alternated at high speeds so that they
split into two perceptual streams, the tones appeared to be overlapping in time.
Figure 32.21.

Temporal coherence boundary for two-tone, three-tone
uni directional, three-tone bidirectional, and continuous sequences. For unidi
rectional three-tone sequences, temporal coherence occurred at rates that were
higher than for two-tone sequences. However, for bidirectional three tone
sequences, the upper limit for temporal coherence was lower than for two-tone
sequences. For long repetitive sequences the upper limit was higher still. (From
L. P. A. S. Van Noorden, Temporal
coherence in the perception of tone sequences. Unpublished
doctoral dissertation, Technische Hogeschoel,
Figure 32.22.
Threshold stream segregation as a function of number of
tones in a temporal group or "package." Two
"high" tones were presented in alternation with a single
"low" tone, in temporal groups or packages. As the size of the
package increased, the threshold rate for splitting the sequence into two
streams decreased. Thus the mechanism that produces stream seg regation
accumulates evidence with repeated presentation. (From A. S. Breg man, The
formation of auditory streams, in ). Requin (Ed.), Attention and performance (Vol. 7), Lawrence
Erlbaum Associates, 1978. Copyright by International Association for the Study of Attention and Performance. Reprinted with
permission.)
In addition, Fitzgibbon, Pollatsek, and
Thomas (1974) studied the perception of temporal gaps between rapidly pre
sented tones. When these tones were in the same frequency range, the
interpolation of a 20-msec gap was easily detected. However, when this same gap
was interpolated between tones in different frequency ranges, its detection was
considerably impaired.
Van
Noorden (1975) examined the detection of temporal displacement of a tone that
continuously alternated with another tone of different frequency. He found that
as the presentation rate increased the threshold for detection of temporal
displace ment also increased. As shown in Figure 32.23, the greater the
frequency separation between the alternating tones, the greater the increase in
threshold.
This
deterioration in temporal processing resulting from . frequency disparity is not confined to continuous sequences
but ' occurs with two-tone sequences also. Divenyi and Hirsh (1972)
investigated the discrimination of size of a temporal gap between '' a pair of
tones and found that performance deteriorated with increasing frequency
separation between the tones. Williams and Perrott (1972) investigated the
minimum temporal gap detectable between tone pairs. For tones of 100- and
30-msec duration, the detection threshold rose with increasing frequency
separation between the tones. On the other hand, Van Noorden (1975) has
demonstrated that such deterioration of temporal processing is considerably
greater for continuous sequences than for two-tone sequences and may thus be
considered to result from the formation of separate perceptual streams. This is
il lustrated in Figure 32.23.
1.3.6. Grouping by Good Continuation. The principle
of
good
continuation has been shown to influence the grouping of tones that occur in
rapid succession. Bregman and Dannenbring (1973) presented subjects with a
repeating sequence consisting of a high
tone alternating with a low tone.
Figure 32.23.

Open
circles-represent the just noticeable temporal displacement OT/T of the
second tone of a two-tone sequence as a function of frequency separation in
semitones. Closed circles-represent the just noticeable temporal displacement 3T/T
of one tone in a continuous sequence of alter
nating tones as a function of frequency separation in semitones. The greater the frequency separation between successive tones, the
higher the threshold for perception of temporal displacement. The effect
was more pronounced with continuous sequences than with two-tone sequences. (From
L. P. A. S. Van Noorden, Temporal coherence in the perception of
tone sequences. Unpublished
doctoral dissertation, Technische Hogeschoel,
When the frequency disparity between these tones was
such that they tended to segregate into two streams, segregation was reduced
when the tones were connected by frequency glides. Also I. V. Nabelek, A. K.
Nabelek, and Hirsh (1973) studied perception of complex tone bursts and found
that when frequency glides were interpolated between the initial and final
tones of the burst there was more pitch fusion than when such glides were not
interpolated. Divenyi and Hirsh (1974) investigated identification of the order
of three-tone sequences. For sequences in which the frequency changes were
unidirectional, order perception was superior to that for sequences in which
frequency changes were bidirectional. Similar findings were obtained by
Nickerson and Freeman (1974), R. M. Warren and Byrnes (1975), and McNally and
Handel (1977), using four-tone sequences.
1.3.7. Grouping by Sound Quality. The formation of
perceptual groupings on the basis of sound quality is an example of the
application of the principle of similarity. A striking demonstration of this
phenomenon was created by R. M. Warren, Obusek, Farmer, and R. P. Warren
(1969). These authors constructed sequences of four unrelated sounds that were
repeatedly presented without pause. The sounds were a high tone (1000 Hz) a
hiss (2000-Hz octave band noise) a low tone (796 Hz sine wave) and a buzz
(400-Hz square wave). All sounds had a duration of 200 msec. Subjects were
found unable to name the orders in which these sounds occurred; however,
correct ordering was possible when the duration of each sound was increased
beyond 500 msec. This effect is discussed in detail in Section 4.1.4.
Grouping
by sound quality was also demonstrated infor mally in an experiment by Wessel
(1979). A repeating three tone ascending line was presented with two
alternating timbres. When the timbral difference between the adjacent tones was
small, percepts were dominated by the
ascending pitch lines. However, as the difference in spectral energy
distribution between adjacent tones increased, percepts were transformed into two
streams based on timbre. As a result, two interwoven descending lines were
formed, each with its own timbral identity.
1.3.8. Grouping by Amplitude. Grouping by amplitude
has
been
shown to occur in the perception of rapid sequences of tones. Dowling (1973a)
found that, when melodies were inter leaved in time,
loudness differences between them enhanced the ability to hear the melodies as
separate. Van Noorden (1975) investigated the perception of sequences of tones
that were of identical frequency but whose amplitudes alternated between two
values. When the amplitude difference between the alternating tones was less
than 5 dB, a single coherent stream was perceived. However, as the amplitude
difference increased, two parallel streams of different loudness were perceived
instead. With substantial amplitude differences between the alternating tones,
the auditory continuity effect was obtained (Section 1.3.11).
1.3.9. Grouping by Temporal Position. Sound
sequences may be divided into subsequences on the basis of temporal position. Such
grouping is most readily achieved by interpolating gaps between subsequences,
and the evidence on this issue is described in Section 4.2.2.
1.3.10. Grouping by Spatial Location. Rapid sound se
quences are under certain conditions grouped by spatial location. The evidence
on this issue is discussed in detail in Section 1.2.3. Temporal relationships
between the sounds in the different locations can be important in determining
whether or not grouping by spatial location occurs, as can frequency
relationships between tones that occur in sequence at the different locations.
1.3.11. Closure: The Auditory Continuity Effect. Several
of the findings discussed above have demonstrated that the auditory system
reorganizes sound sequences in accordance with expectations derived from our
knowledge of the auditory environment. It has further been demonstrated that
sounds that are not actually present in a stimulus configuration may be
perceptually synthesized in accordance with such expectations.
When two sounds of differing amplitude are presented
in alternation, the weaker sound may be perceived as continuing through the
louder one (G. A. Miller & Licklider,1950; Thurlow, 1957; Vicario, 1960). Furthermore,
when a phoneme in a sentence is replaced by a noise of greater amplitude, the
missing phoneme may be perceptually synthesized by the listener (R. M. Warren,
1970; R. M. Warren, Obusek, & Ackroff, 1972). Similar findings have been
obtained with nonverbal sounds. This "auditory induction effect"
occurs only under stimulus conditions in which one might reasonably conclude
that the substituted sound had masked the missing one (Plomp, 1981; R. M.
Warren, 1982).
Dannenbring (1976) presented a sine-wave tone whose
frequency repeatedly glided up and down. He then substituted a loud noise burst
for a portion of this ongoing tone and found that the tone still appeared to
glide through the noise. Dannenbring and Bregman (1976) report further that, if
the amplitude of the tone changed just before the noise burst, the tendency to
perceive the tone as continuing through the noise was reduced. As the authors
point out, the change in amplitude produced evidence that something had
happened to the tone itself rather than its simply being masked, so as to make
a masking hypothesis less likely (see, however, Steiger & Bregman, 1981).
Grouping and Selective Attention
1.4.1. Voluntary and Involuntary Grouping. In normal
listening,
we have the impression that we can direct our attention at will to any feature
of the auditory environment. However, this impression may often be illusory,
and the conditions under which attention is indeed under voluntary control
require careful examination. Two issues need to be separated in this
discussion. First, we may enquire into the role of voluntary attention in the
initial division of an auditory configuration into groupings. Second, once such
groupings have been established, we may enquire into the role of voluntary attention
in determining which grouping is attended to.
Concerning
the first issue, several configurations have been described in which a
particular grouping principle is so powerful that the listener is generally
unaware of alternative organizations. Thus most listeners when presented with
the scale illusion (D. Deutsch, 1975e) form groupings so strongly on the basis
of frequency proximity that they mislocalize the tones on this basis. As a
result, when attending to the higher or the lower melodic line, they believe
that they are attending to one spatial location rather than to another,
although this is not in fact the case (D. Deutsch, 1975e). The same phenomenon
exists with the contrapuntal patterns (see Note 6) devised by
However,
once such groupings have been established, we find that voluntary factors play
a prominent role in determining which of two parallel groupings is attended to.
Thus when listeners hear the scale illusion as two melodic lines in parallel
(D. Deutsch, 1975e) they can direct their attention at will to either the
higher or the lower one. The same holds for the contrapuntal patterns devised
by
Strong
involuntary factors are also involved in the formation of separate groupings
from rapid sequences of tones. Thus, for example, voluntary attention focusing
cannot readily overcome the difficulty in perceiving temporal relationships
between ele ments that belong to different groupings, when these are con
figured as in Bregman and Campbell (1971), R. M. Warren et al. (1969), or D.
Deutsch (1979a). In these experiments, the configurations were such as to
induce powerful grouping on the basis of frequency proximity, sound type, or
spatial location.
For
configurations in which groupings are only weakly induced, voluntary attention
focusing can exert an influence. For example, Van Noorden (1975) showed that,
within a given range of frequency relationships between two alternating tones,
and at certain presentation rates, the listener may direct his or her attention
at will so as to hear either a single grouping or two separate groupings. This
region of ambiguity is shown in Figure 32.20. The composer Robert Erickson
addressed this issue with regard to grouping by pitch or by timbre in a
composition entitled LOOPS (Note 10). A repeating melodic configuration was per
formed by five instruments, with each instrument playing a different note in
the manner of a hocket. Each pitch was therefore eventually played by every
instrument. Although no formal data were collected, it is clear that on
listening to this piece one can direct one's attention at will and so form
configurations on the basis either of instrumental timbre or of pitch (see also
Erickson, 1975).
As
with sequences of simultaneous tones, once groupings are formed from rapid
sequences, the listener may voluntarily switch attention from one grouping to
another (e.g., see Van Noorden, 1975).
We may
conclude that the initial division of a configuration into groupings is often
outside the listeners' voluntary control, though
ambiguous situations may be generated in which attention focusing can play a
role. However, once such groupings have been established, voluntary attention
is important in determining which of these is attended to. We may note that
such a division of the attentional process into two stages corresponds in
certain ways to the stages termed preattentive and postattentive by
Neisser (1967) and by Kahneman (1973). These terms are generally taken to imply
different depths of analysis at the two stages; however the question of depth
of analysis is as yet un settled (J. A. Deutsch &
D. Deutsch, 1963; Keele & Neill, 1979).
1.4.2. Consequences of Attention Focusing. Finally,
we
consider
the consequences of voluntary attention focusing on the processing of
unattended material. Cherry (1953) and Cherry and Taylor (1954) presented
subjects with two streams of speech, one to each ear, and asked them to shadow
one of the streams. Subjects were able to report very little about the speech
that had been presented to the nonattended ear (see also Kahneman, 1973). The
present author has informally obtained an analogous result using melodic
stimuli instead of speech. Two familiar melodies were generated on a piano, and
these were simultaneously presented, one to each ear. Subjects were asked to
shadow the melody presented to one ear by singing and later to report what had
been presented to the other ear. It was found that the subjects were unable to
name the second melody and could describe very little about it. Thus for
nonverbal sound sequences also, attention focusing may have the effect of
suppressing the unattended material from conscious perception.
It is particularly
interesting that such a result should have been obtained for the case of music,
since, in contrast to speech, music often consists of several streams of
information in parallel. The important question therefore arises as to the
extent to which the unattended signal is processed under these conditions. Broadbent
(1958) originally proposed that in selective listening a filter selects out
elements of a simultaneous configuration on the basis of gross physical
characteristics, such as frequency range or spatial location. Stimuli that
share a characteristic, so defining a relevant "channel,"are then analyzed
further, the other stimuli being filtered out. This theory ran into
difficulties on experimental grounds. For example, semantic content may be a
basis for channel selection (Gray & Wedderburn, 1960; A. Treisman,1960). To handle such findings, A. Treisman (1960, 1964)
suggested a modification of Broadbent's theory, in which the unattended message
is not completely filtered out, but rather attenuated. J. A. Deutsch and D.
Deutsch (1963) proposed alternatively that all input is perceptually analyzed by
the nervous system, whether attended to or not. The analyzed information is
weighted for pertinence, the weightings being determined both by long-term
factors and by the current situation. On this model, the information
with the highest pertinence weighting controls awareness. Recent studies have
provided evidence for the latter view (e.g., Corteen & Wood, 1972; Lewis, 1970; Shims in &
Schneider, 1977); however, the issue remains controversial.
SHAPE ANALYSIS FOR PITCH STRUCTURES
2.1. Auditory Shape Analysis as a Multileveled
Process
The
analysis of auditory shape may be considered at several stages of abstraction. In
the case of shapes built of pitch structures, we may first enquire into the
types of abstraction that give rise to local features, such as intervals,
chords, and tone chroma. Such features may be considered analogous to orientation
and size of angle in vision. Other low-level abstractions give rise to global
features such as contour, overall pitch range, general distribution of interval
sizes, the proportion of ascending compared with descending intervals, and so
on. Such low-level features are combined at a higher level to form more complex
configurations, which are themselves abstracted so as to give rise to perceptual
equivalences and similarities. At the highest levels of analysis, pitch
structures are organized as hierarchies. Since sequential patterns of pitches
are spread out in time, short-term memory mechanisms play an important role in
determining how such patterns are perceived.
2.2. Passive Versus
Active Processing
The
multileveled approach to auditory shape analysis does not imply that such
analysis proceeds serially from the lowest to the highest level; indeed, we
shall see that this is very unlikely to be true. Investigations into mechanisms
of visual shape analysis have led to a distinction between an early process, in
which many low-level abstractions occur in parallel, and a later process, in
which questions are asked of these low-level abstractions, based on hypotheses
about the scene to be analyzed (Hanson & Riseman, 1978). Such a distinction
between "bottom up" and "top-down" processing is of
importance to auditory shape analysis also; indeed perhaps of greater
importance, since the auditory system is more prone to gross error than the
visual system and therefore relies more heavily on extraneous cues.
2.3. Feature Abstraction
2.3.1.
Octave Equivalence. Tones whose fundamental frequencies stand in the ratio of
2:1 are said to be in octave relation. Such tones possess a strong perceptual
similarity, which is evidenced in various ways. In western musical notation, a
tone is represented by a letter name, which specifies its position within the
octave, together with a number, which specifies the octave in which it occurs. For
example, the symbols C2, C3, and C4 represent tones that stand in octave
relation. In one version of Indian musical notation, a tone is also represented
by a letter, which specifies its position within the octave, together with a
dot or dots, which specify its octave placement. Thus the symbols m, m, m, m ,and * in
represent tones that are separated by octaves. Indeed, octave equivalence
appears to be commonly assumed in most musical systems (Burns & Ward,
1982).
People
with absolute pitch (i.e., those who can identify musical notes by letter name
on hearing them) often place such notes in the wrong octave. This provides
further evidence for octave equivalence at the perceptual level (Bachem,1954; Baird, 1917). Additional evidence comes from
conditioning studies, in which generalization of response to tones separated by
octaves has been observed both in humans (Humphreys, 1939) and in animals
(Blackwell & Schlosberg, 1942). Yet further evidence for octave equivalence
comes from the finding that certain interference effects that operate in pitch
recognition (D. Deutsch 1972a, 1973a) also occur when the interference tones
are displaced to different octaves (D. Deutsch, 1973b).
In
similarity rating paradigms in which a large number of pitch values are
employed, octave equivalence effects are not necessarily apparent (Allen, 1967;
Kallman, 1982; Thurlow & Erchul, 1977). However, when explicit musical
contexts are provided to the subjects, tone pairs that are separated by octaves
are judged as closely similar (Krumhansl, 1979; Krumhansl & Shepard, 1979).
Because
of the existence of octave equivalence effects, both psychologists and music
theorists have argued that pitch should be analyzed in terms of at least two
dimensions, the first rep resenting overall pitch level, and the second
defining the position of a tone within the octave. These two dimensions have
been termed tone height and tone chroma by psychologists (Bachem.
1948; Burns & Ward, 1982; M. Meyer, 1904, 1914; Revesz, 1913; Ruckmick,
1929; Shepard, 1964), and pitch and pitch class by music
theorists (Babbitt, 1960; Forte, 1973; Westergaard, 1975).
The
subjective octave is slightly larger than the 2:1 ratio of the physical octave
(Stumpf & M. Meyer, 1898). In an ex periment by Ward (1954), subjects were
repeatedly presented with two pure tones in succession and asked to adjust the
fre quency of one until it was exactly an octave above the other. The subjects'
adjustments produced ratios that were reliably greater than 2:1. Further, the
amount of deviation from the physical octave increased in the higher frequency
ranges. Similar findings were obtained by Sundberg and Lindquist (1973) using
complex tones. Burns (1974b) obtained analogous results with professional
Indian musicians as subjects, showing that the phenomenon of octave stretch is
not confined to our culture. A basis for this phenomenon was proposed by
Terhardt (1971). He suggested that it is acquired early in life as a result of
exposure to complex sounds, such as speech. Due to a mutual masking effect, the
pitches of neighboring partials in such com plex sounds move away from each
other slightly, and Terhardt argued that we generalize from experience with
such sounds in making octave judgments. Dowling (1973b) has suggested that
octave stretch might alternatively simply reflect innate properties of the
auditory system.
2.3.2. Interval and Chord Equivalence. A musical
interval is perceived when two tones are presented either simultaneously or in
succession. Futher, intervals are perceived as the same in size when the
fundamental frequencies of their component tones stand in the same ratio. The
traditional western musical scale is based in part on this principle. The
semitone is the smallest unit of this scale, and it corresponds to a frequency
ratio of approximately 1:1.059. Intervals that comprise the same number of
semitones are given the same name. Thus, an interval consisting of four
semitones is called a major third; an interval consisting of seven semitones is
called a perfect fifth; and so on (Figure 32.24). The perceptual equivalence of
intervals corn posed of tones whose fundamental frequencies stand in the same
ratio is also assumed by contemporary music theorists Rabbit 1960; Forte, 1973;
Westergaard, 1975).
32-22
PERCEPTUAL ORGANIZATION AND COGNITION
|
Approximate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ratio |
|
|
|
|
|
|
|
|
|
|
|
|
|
Number of |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
2 |
3 |
4
|
5
|
6 |
7 |
8 |
9
|
10
|
11 |
12 |
|
Musical interval
|
Minor
|
Major
|
Minor
|
Major
|
Perfect |
Tritons
|
Perfect
|
Minor
|
Major
|
Minor
|
Major
|
Octave
|
|
|
second |
second
|
third
|
third |
fourth
|
|
fifth |
sixth
|
sixth
|
seventh
|
seventh
|
|
log F1 -log F2
Figure 32.24.
The interval size continuum. This figure gives the
number of semitones corresponding to each musical interval, together with the
approximate frequency ratio to which it corresponds.When three or more tones are presented
simultaneously, there results the perception of a chord. One may characterize a chord in
terms of its component intervals. For example, the major triad is composed of
intervals corresponding approxi mately to the freqency ratios 2:3, 4:5, and
5:6, that is, to 7, 4, and 3 semitones, respectively. Such a characterization
may however lead one to assume that two chords are perceptually equivalent when
in fact they are not. Thus the minor triad is composed of the same set of
intervals as the major triad (Figure 32.25), yet the major and minor triads
sound quite different. It is therefore of perceptual importance that, in the
major triad, the upper components form a ratio of 5:6 and the lower components
a ratio of 4:5, while the reverse is true in the minor triad. It is interesting
that certain contemporary music theorists do characterize chords that
contain the same set of intervals as equivalent (Babbitt, 1960, 1965; Forte,
1973). This characterization has, however, been challenged on perceptual
grounds by other music theorists (e.g., Browns, 1974).
2.3.3. Categorical Perception of Musical Intervals. Although
musical
intervals vary continuously in size, they are sometimes perceived categorically. Categorical perception is operationally defined according to
three criteria. The first is the presence of distinct labeling categories
separated by sharp boundaries. The second is the presence of peaks in
discrimination performance near
category boundaries, with poor discrimination performance within
categories. The third is a close correspondence between the discrimination
functions that are obtained experimentally and those that are predicted on the
hypothesis that stimuli will be discriminated to the extent that they can be
identified as different (Studdert-Kennedy, Liberman, Harris, & Cooper, 1970). Initially, categorical perception was
thought to occur only in the case of:consonants in speech; however, more
recent experiments have demonstrated its occurrence with nonspeech sounds also
(Burns & Ward, 1974; Locke & Kellar, 1973; J. R. Miller, Wier, Pastore,
Kelly, & Dooling,1976; Zatorre & Halpern, 1979).
Category
scaling identification functions have been obtained for melodic intervals over
ranges of two to five semitones (Burns & Ward, 1974,1978; Rakowski, 1976; J. A. Siegel & W. Seigel, 1977a, 1977b). Figure 32.26(a) displays the
results from a typical subject with musical training. It can be seen that
identification functions show sharp category boundaries such as are characteristic
of category scaling data for speech materials. Figures 32.26(b), 32.26(c), and
32.26(d) show discrimination functions obtained from the same subject, together
with those predicted from the identification functions assuming categorical
perception. The agreement between the obtained and predicted discrimination
functions is here comparable to that found for speech materials (Burns &
Ward, 1978).
In
contrast to findings from musically trained subjects, untrained subjects may
show inconsistent identification functions. Further, a large effect of shifting
the stimulus range has been obtained for subjects with musical training, while
this effect was virtually nonexistent for those without musical training. These
findings are evidence that categorical perception depends on training (Burns
& Ward, 1978; J. A. Siegel & W Siegel, 1977a).
Listeners appear unable to categorize stimuli
reliably in less than semitonal increments (Burns, 1977; Rakowski, 1976). This difficulty has been found not to be
confined to western listeners but to be true of Indian musicians also, despite
the fact that Indian scales theoretically include microtones (Burns, 1974a,1977). Such findings may reflect a fundamental limit to the
number of discrete interval categories within an octave that listeners can
handle.
In
addition to melodic intervals, categorical perception has been found to occur
for harmonic intervals and triads (Locks & Kellar, 1973; Zatorre &
Halpern, 1979).
2.3.4.
Global Cues. Global cues are employed in the recognition of pitch sequences. These
include overall pitch range, the distribution of the sizes of simultaneous and
successive intervals, the proportion of simultaneous compared with successive
intervals, the proportion of ascending compared with descending intervals, and
so on. Contour has been particularly well documented as a cue in the
recognition of melodies (Dowling, 1978; Dowling & Fujitani, 1971; Kallman
& Massaro, 1979; Werner, 1925; White, 1960). Such work is described in
Section 2.7. It should here be noted that birds are able to discriminate rising
from falling pitch patterns (Hulse, Humpal, & Cynx, 1984), showing that
sensitivity to contour is not confined to the human case.
2.3.5. Interval Class. The principles of octave
equivalence and interval equivalence have led certain music theorists to
Figure 32.25.

The C-major triad (a) and C-minor triad (b). The triads
contain the identical set of intervals: the major third, the minor third, and
the perfect fifth. However, in the major triad, the major third lies below the
minor third, while the reverse is true of the minor triad. Since these triads
are perceptually distinct, the ordering of their intervals is of perceptual
importance. (From D. Deutsch, Musical recognition, Psychological
Review, 76. Copyright 1969 by American Psychological Association. Reprinted with
permission.)
AUDITORY PATTERN RECOGNITION
(c) (d) Figure 32.26.

(a) Identification
functions obtained from a musically trained subject for category scaling of
isolated melodic intervals. Sharp category boundaries are apparent. (b), (c),
and (d) Discrimination functions (solid lines) obtained from the same subject
in a roving-level melodic interval discrimination experiment for interval
separations of 25, 37.5, and 50 cents. (Percentage of correct discrimination is
plotted at the mean value of the two intervals in a discrimination trial.) Also
shown are the discrimination functions (dashed lines) predicted from the
identification functions assuming categorical perception. There is good
agreement between the obtained discrimination functions and those predicted
from the identification functions assuming categorical perception. (From E. M.
Burns & W. D. Ward, Categorical perception Phenomenon or epiphenomenon:
Evidence from experiments on the perception of melodic musical intervals, journal
of the Acoustical Society of
In
traditional western music theory, harmonic intervals whose components have
reversed position by being placed in different octaves are termed inversions
(Piston, 1948). Thus a harmonic interval of n semitones is considered
perceptually equivalent in certain respects to a harmonic interval of 12 n
semitones. Laboratory evidence for the perceptual similarity of inverted
intervals has been obtained by Plomp, Wagnaar, and Mimpen (1973). Subjects were
asked to identify intervals formed by simultaneous pairs of tones. Confusions
were found to occur between intervals that were related by inversion. Further
evidence for such equivalence has been provided by D. Deutsch and Roll (1974).
For
the case of melodic intervals, the issue of perceptual equivalence based on
interval class is complicated. D. Deutsch (1969) has proposed a neural network
for the abstraction of octave information, and of interval and chord
information, which predicts that such equivalence should not be directly appre
hended. In this network, information travels first to a unidimensional array of
"tone height" and is then processed along two separate and parallel
channels. Along the first channel there is convergence of input from neural
units that underlie tones that are separated by octaves (see Note 11). The
output of this channel results in octave equivalence effects for single tones
and also in the harmonic equivalence of chords that are related by inversion. The
patterns of input along the second channel are such as to mediate transposition
of intervals and chords (see Note 12); however, there is no convergence of
input based on the octave relation along this channel.
The
two-channel model predicts that octave equivalence effects should occur for
single tones, and also for simultaneously presented tones. Supporting
behavioral evidence for this pre diction has been described in Section 2.3.1,
and earlier in the present section. However, the model also predicts that,
where melodic intervals are concerned, octave equivalence effects should not
directly operate. More specifically, it leads to the prediction that listeners
should experience difficulty in recognizing well-known melodies in which
interval class is preserved.
but in which the pitches of the tones are placed randomly
in different octaves. (This prediction does not hold for listeners who know the
identity of the presented melody, or who are given cues on which to base
hypotheses. Such listeners should be able to perform the recognition judgment
by confirming the individual pitch classes, so utilizing the first channel of
the model.)
As a
test of this prediction, D. Deutsch (1972c) presented subjects with the first
half of the melody "Yankee Doodle," with the tones distributed
randomly across three adjacent octaves, while preserving pitch class. The
subjects were asked to identify the melody but were given no clues on which to
base a hypothesis. Recognition was found no better than in a control condition
in which the rhythm was retained but the pitch information re moved entirely. However,
when the subjects were later told the identity of the melody and heard it
again, recognition was greatly facilitated. This result is in accordance with
the two-channel model and shows that interval class cannot be considered, a
perceptual invariant.
Further
supporting evidence for this view comes from an experiment by D. Deutsch
(1979b) on consolidation of memory for melodic patterns. Subjects were
presented with a standard melody that was followed by a comparison melody, and
they judged whether the two were the same or different. The comparison was
always transposed up from the standard. On half the trials this transposition
was exact, and on the other half two of the tones were permuted. The experiment
consisted of four conditions. In the first, the standard melody was presented
once, followed by the comparison melody. In the second, the standard melody was
repeated six times and then followed by the comparison melody. In the third,
the standard melody was again repeated six times, but on half the repetitions
the melody was transposed intact to the next-higher octave, and on the other
half it was transposed intact to the next-lower octave. In the fourth
condition, the standard melody was again repeated six times, but on each
repetition the individual tones were displaced alternately to the next-higher
and the next-lower octaves. So in this last condition, interval classes were
preserved, though exact intervals were not preserved.
Exact
repetition resulted in substantial improvement in recognition performance, and
an improvement also occurred when the standard melody was repeated intact in
the next higher and the next-lower octaves. However, when the melody was repeated
in such a way that its tones were placed alternately in the next-higher and the
next-lower octaves, performance was significantly worse than when the melody
was not repeated at all. This experiment again demonstrates that interval class
cannot be treated as a perceptual invariant. Repetition of a set of intervals
resulted in consolidation of memory for these in tervals; however, repetition
of a set of interval classes did not produce memory consolidation.
Idson
and Massaro (1978) have proposed an alternative explanation for the
"Yankee Doodle effect." They pointed out that the octave randomizing
procedure results in an alteration in melodic contour and argued that this
altered contour provides the listener with misleading information and so
actively in terferes with the recognition process. The authors found experimentally
that, when the individual tones of a melody were displaced to different octaves
but contour was preserved, recognition performance was enhanced relative to
conditions in which contour was not preserved. A similar result was obtained by
Dowling and Hollombe (1977).
PERCEPTUAL ORGANIZATION AND COGNITION
The
above line of reasoning is problematical, however, be cause contour alone can
serve as a salient cue for melody recognition (Section 2.7). If, then, subjects
are able to hypothesize the identity of a melody on the basis of contour alone,
they can then confirm the hypothesized melody by reference to the individual
pitch classes, and so without direct processing of interval class. So the
finding that preservation of contour results in an improvement in recognition
performance is in accordance with the two-channel model also (see, in addition,
D. Deutsch, 1978d, 1982b).
Idson
and Massaro (1978) proposed alternatively that melody recognition depends on
two factors: first, recognition of individual pitch classes, and second,
recognition of contour. If this hypothesis were correct, then there should be
no difference in recognition performance for melodies that are presented
without transformation, compared with those in which octave placement is varied
but pitch class and contour are retained. However, Kallman and Massaro (1979)
found that recognition performance was significantly better in the former case
than the latter. This finding is in accordance with the two-channel model but
cannot be explained on the hypothesis advanced by Idson and Massaro.
Additional
evidence comes from comparing the findings of Idson and Massaro (1978) with
those of Kallman and Massaro (1979). In the former study, subjects were
furnished with the names of a small set of melodies and were tested for
recognition of these melodies under various transformations for hundreds of
trials. In contrast, subjects in the latter study were presented with each test
melody only once and were not informed of their names. Recognition performance
under octave displacement was considerably poorer in the latter study than in
the former. This finding is in accordance with the two-channel model, but it
cannot be accommodated on the hypothesis that recognition of an ordered set of
pitch classes and contours is sufficient to identify a melody.
The
possibility still remains, however, that alterations in contour could actively
interfere with melody recognition, and thus could play some role in the
"Yankee Doodle effect." The extent of such interference cannot be
determined with the use of a recognition paradigm, because when contour is
altered, this could lead to active interference, yet when contour is pre
served, melodies could be recognized on this basis alone.
To
circumvent this difficulty, an experiment was performed in which musically
literate subjects listened to novel melodic patterns and recalled in musical
notation what they had heard. Since no comparison was made with other melodic
patterns, the issue of contour as a cue could not arise (D. Deutsch &
Boulanger, 1984).
Examples
of patterns employed in the different conditions of this experiment are shown
in Figure 32.27. Each pattern consisted of a random ordering of the first six
notes of the C major scale (see Note 5). In the first, "higher
octave," condition all tones were in the octave beginning on C5. In the
second, "lower octave," condition all tones were in the octave
beginning on C4. In the third, "across octaves," condition, the
individual tones in the melody alternated between these two adjacent octaves. In
this last condition, roughly two-thirds of the melodic intervals were larger
than an octave; the remaining one-third spanned less than an octave.
Also shown in Figure 32.27 are the percentages of tones correctly notated in the correct serial positions in the different conditions of the experiment.
Figure 32.27.

Examples
of sequences employed in the different conditions of experiment on the effect
of octave jumps on recall of melodic patterns. Also shown are the percentages
of tones correctly recalled in the correct serial positions in these different conditions. Recall accuracy was substantially lower for
melodic patterns that contained octave jumps than for those that did not. (From
D. Deutsch & R. C. Boulanger, Octave equivalence and the immediate recall
of pitch sequences, Music perception. Copyright 1984 by The Regents of the
Performance
in the "across octaves" condition was substantially poorer than that
in the other two. Further analyses showed that errors in which the correct
pitch class was notated but octave placement was incorrect were virtually
absent in all conditions, so that the poorer performance in the "across
octaves" condition could not be attributed to errors in octave placement. The
decrement in recall for melodic patterns involving octave jumps is as predicted
on the two-channel model but cannot be explained on the alternative hypothesis
that melodic processing occurs through identification of an ordered set of
pitch classes together with contour.
2.4. Higher-Order Abstractions
This
section is concerned with the ways in which low-order features based on pitch
are combined so as to give rise to perceptual equivalences and similarities. In
the visual mode, shapes are recognized as equivalent when these differ in size
or position in the visual field and (at least under some conditions) in orientation.
This leads us to enquire whether auditory analogues of such visual shape
abstractions also exist.
Early
theorists have speculated concerning possible analogies between relationships
in the pitch domain and relationships in visual space (Helmholtz, 1859/1954;
Koffka, 1935; Mach, 1906/1959). Recent theoretical discussions have focused par
ticularly on the mapping of one dimension of visual space into fzequency and
the other into time (Julesz & Hirsh, 1972; Kubovy, 1981). Various grouping
phenomena in the perception of sound sequences have clear visuospatial
analogues (Bregman, 1978; D. Deutsch, 1975c; Divenyi & Hirsh, 1978; Van
Noorden,1975), as exemplified by the musical passage
in Figure 32.19.
In his
seminal paper on shape perception, the Gestalt psy chologist Von Ehrenfels
(1890) pointed out that melodies retain their perceptual identities when they
are transposed to different pitch ranges, provided that the relationships
between the suc cessive tones in the melodies are unchanged. He argued that in
this respect melodies are akin to visual shapes, which preserve their
identities when they are translated to different regions of the visual field. This
leads us to enquire whether further equivalences may be demonstrated for
auditory shapes which have counterparts in vision.
Most
relevant to this question is the system of 12-tone composition, developed by
Schoenberg, which is based on a theory of shape analysis for pitch structures. This
theory is in turn based on an intermodal analogy in which one dimension of
visual space is mapped into pitch and another into time. Schoenberg argued that
transformations that are analogous to rotation and reflection in vision give
rise to perceptual equiv alences for structures in pitch-time also. As he put
it:
THE
TWO-OR-MORE DIMENSIONAL SPACE IN WHICH MUSICAL IDEAS ARE PRESENTED IS A UNIT ... The elements of a musical idea are
partly incorporated in the horizontal plane as successive sounds, and partly in
the vertical plane as simultaneous sounds.... The unity of musical space demands an absolute and unitary perception. In this space ... there is no absolute
down, no right or left, forward or backward.... To the imaginative and creative faculty, relations in the
material sphere are as independent from directions or planes as material
objects are, in their sphere, to our
perceptive faculties. Just as our mind always recognizes, for instance,
a knife, a bottle, or a watch, regardless of its position, and can reproduce it
in the imagination in every possible position, even so a musical creator's mind
can operate subconsciously with a row of tones, regardless of their direction, regardless of the way in which a
mirror might show the mutual relations, which remain a given quantity. (Schoenberg,
1951,p.229)
Schoenberg
thus argued that a tone row, defined as a given ordering of the 12 tones of the
chromatic scale, retains its perceptual identity under the following
transformations: When it is transposed in pitch (transposition), when the
directions of the successive intervals are reversed (inversion), when the tones
are presented in reverse order (retrogression), and when the two latter
transformations are both applied (retrograde-inversion). He further assumed
that a perceptual equivalence exists for tones in the same pitch class and for
intervals in the same interval class. These assumptions are illustrated in
Figure 32.28.
Schoenberg's
theory has served as the basis for much musical system building. The
group-theoretic approach of Milton Babbitt and his followers has been
particularly influential here. The elements of the group are 12-tone sets,
which are represented as permutations of pitch or order numbers, and the
operation is the multiplication of permutations (Babbitt, 1960, 1965; see also
Perle, 1972, 1977).
Figure 32.28.

Schoenberg's concept of "musical space." Sequences of
pitches are considered equivalent at an abstract level when they are transposed
to a different pitch range (transposition), when all ascending intervals become.
descending intervals, and vice versa (inversion), when the tones are presented
in reverse order (retrogression), when they are transformed by both these
operations (retrogression-inversion), and when the component tones of the
sequence are placed in different octaves. (From A. Schoenbeg,
Sty/e and idea, Williams & Norgate, 1951. Reprinted
with permission.)
This
leads us to enquire whether the equivalence relations defined in 12-tone theory
are perceived by the listener. The issue of interval class has been discussed
at length, and it has been shown that this cannot be taken to be a perceptual
in variant. Concerning inversion and retrogression, we may note that there is a
clear evolutionary advantage to mechanisms that enable us to recognize a visual
object when it is presented in a different orientation. However, there is no
analogous ad vantage to recognizing a sequence of sounds presented in reverse
order, or whose pitch relationships are inverted.
The
experimental evidence on the issue of equivalence under retrogression and
inversion is equivocal. The ability of listeners to identify well-known
melodies presented in retrogression was studied by White (1960). He found that
identification performance was here no better than when the pitch information
was removed entirely, with rhythm serving as the only cue. Dowling (1972)
employed a short-term paradigm to study recognition of melodies under
retrogression, inversion, and retrograde-inversion. Subjects were presented
with a standard melody, followed by a comparison melody. In one set of
conditions, the comparison was either unrelated to the standard or transformed
by trans position, retrogression, inversion, or
retrograde-inversion. In a second set of conditions, the comparison melody was
further transformed so that its contour was preserved but the interval sizes
were altered. Subjects were found to perform no better on exact transformations
than on those that preserved contour alone. In a later study, Dowling (1978)
provided evidence for an interference effect on recognition of exact intervals,
resulting from the listener's projecting the pitch information onto over learned
musical scales (see also D. Deutsch, 1977).
From
analysis of tonal music, it would appear that retrogression and inversion are
indeed perceived in short-term situations, provided that the memory load is not
too heavy (e.g., see L. B. Meyer, 1973). However, inversion here takes place
along highly overlearned pitch alphabets such as diatonic scales or triads. Rather
than assuming a perceptual equivalence based simply on frequency ratios, it
would appear that such operations are performed at a level of abstract encoding
equivalent to the level that enables us to invert an overlearned alphanumeric
sequence (D. Deutsch & Feroe, 1981; Simon & Sumner, 1968).
A
considerable body of contemporary music theory is concerned with defining
equivalence and similarity relations be tween sets of
pitches. These theories assume equivalence under retrogression and inversion,
as well as interval class identity (Chrisman, 1971; Forte, 1973; Howe, 1965;
Lewin, 1960,1962;
Perle, 1972, 1977). A detailed examination of such theories is, however,
beyond the scope of the present chapter.
A
different approach to the structuring of pitch relationships stems from the
classical theory of tonality (Helmholtz, 1859/ 1954; Rameau, 1722/1971) and
treats as fundamental the in tervals of the octave, the perfect fifth and the
major third. Debates concerning tuning systems have utilized two-dimensional
arrays in which tones lying adjacent along one dimension are separated by major
thirds, and tones lying adjacent along the other dimension are separated by
perfect fifths. An evaluation of different schemes for tuning and temperament
based on such arrays is provided by Hall (1974, 1980).
Longuet-Higgins
(1962a, 1962b, 1978) has hypothesized that such two-dimensional arrays may form
the basis of key attribution. As shown in Figure 32.29, the notes in a diatonic
scale (see Note 5) form a compact group in this two-dimensional space, so that
a key can be defined as a neighborhood in the space. Longuet-Higgins suggested
that when a musical passage is presented listeners select a given region of
this space, so attributing a key. If, however, their choices force them to make
large jumps in this space, they select instead a different region where the
tones are more compactly represented. A different key is thus attributed.
Considerations
of octave equivalence have formed the basis of yet another approach to the
description of pitch structures. As described by Drobisch (1846/1929), the
perceptual similarity of tones standing in octave relation can be accommodated
by deforming the unidimensional scale of pitch into a helix, with tones
separated by octaves lying most proximal within each turn of the helix. Shepard
(1964) has provided experimental evidence for such a helical representation in
a harmonic setting. He generated a set of tones, each of which consisted of
many sinusoidal components separated by octaves. The amplitudes of these
components differed according to a fixed bell-shaped envelope. When such tones
were presented in monotonically ascending steps, listeners perceived a sound
that constantly ascended in pitch and never descended.
Shepard
(1982) later proposed an elaboration of the helical model in which pitch is
represented as a five-dimensional helical structure. Along one dimension, tones
are ordered in accordance with pitch height. Two further dimensions accommodate
the circular representation of tone chroma, and two more dimensions accommodate
the circle of fifths. This model is illustrated in Figure 32.30. Shepard
further demonstrated a simple affinity (in the mathematical sense) between this
space and the space based on perfect fifths and major thirds.
Figure 32.29.

Two-dimensional
array proposed for the representation of "tonal space." Tones lying
adjacent along one dimension are separated by fifths; tones lying adjacent
along the other dimension are separated by major thirds. (From
H. C. Longuet-Higgins, The perception of music, Interdisciplinary Science
Reviews, 1978, 3. Reprinted with permission.)
Figure 32.30.

Representation of pitch as a double helix wrapped
around a helical cylinder in five dimensions. (From R. N. Shepard,
Structural representations of musical pitch, in D. Deutsch (Ed.), The psychology of music, Academic Press, Inc., 1982. Reprinted with permission.)
One
problem with specifying invariant geometric structures for pitch comes from
evidence showing that a set of notes played in a particular musical key will
induce an internal framework that is specific to that key. The internal
representation of pitch relationships will thus be expected to differ depending
on the key that is attributed. Risset (1978) has pointed out that the same
melodic interval may be quite differently perceived when presented in different
contexts. There is evidence that performing musicians will accordingly produce
a given interval as larger or smaller in size depending on its tonal function
(Schackford, 1961, 1962; Small, 1936). Krumhansl (1979) performed an ex
periment to determine the effects of tonal context on the perception of pitch
relationships. Subjects were presented with a set of context tones, followed by
two tones in succession. The context tones consisted of either the C-major
triad or the C major scale. On each trial, subjects judged the similarity of
the first to the second tone in this context. Multidimensional scaling of the
subjects' judgments yielded a three-dimensional conical structure around which
tones were arranged according to pitch height. The tones of the C-major triad
formed a closely related cluster near the vertex of the cone, and the remaining
tones of the C-major scale formed a less closely related subset further from
the vertex. Tones not in the C-major scale were still
further dispersed (Figure 32.31). In addition, tones less central to the key
were judged more similar to tones more central to the key than the reverse. These
findings may be related to the tendency described by music theorists for less
"stable" tones in a key to "resolve" to more
"stable" tones (see also Krumhansl, 1983; Krumhansl, Bharucha, &
Kessler, 1982; Krumhansl & Kessler, 1982).
2.5. Hierarchical Encoding of Pitch Sequences
Figure 32.31.

Three-dimensional representation of the interrelations
between the tones of the chromatic scale spanning an octave, when presented in
C major context. (From C. L. Krumhansl, The psychological representation of musical pitch in a tonal context,
Cognitive Psychology, 1979, 11. Reprinted with
permission.)
In
general, when observers are presented with artificial serial patterns that may
be hierarchically encoded, they form encodings that reflect the ways these
patterns are structured (Bjork, 1968; Kotovsky & Simon, 1973; Restle, 1970;
Restle & Brown, 1970; Simon & Kotovsky,1963;
Vitz & Todd, 1967,1969). Such
findings have given rise to models of serial pattern representation in terms of
hierarchies of operators (D. Deutsch & Feroe, 1981; Greeno & Simon,
1974; Leeuwenberg,1971; Restle,1970; Simon, 1972;
Simon & Kotovsky, 1963; Simon & Sumner, 1968; Vitz & Todd, 1967,
1969).
A good
example of experimental evidence for such encoding has been provided by Restle
(1970), and Restle and Brown (1970). Subjects were presented with arrays of
lights that flashed on and off in repetitive sequence, and their task was to
predict which light would flash on next. To illustrate the type of pattern
employed, take the basic subsequence X = (123). The operation X ("transposition
+1 of X") produces 1 2 3 2 3 4, the operation R ("repeat of
X") produces the sequence 1 2 3 1 2 3, and the operation M ("mirror
image of X") produces the sequence 1 2 3 6 5 4. Recursive
application of such operations can generate long sequences that have compact
structural descriptions. Thus the sequence 12 3 12 3 6 5 4 6 5 4 can be described
as M(R (X)). This example corresponds to the structural tree shown in
Figure 32.32.
Restle
and Brown found with such sequences that the probability of error in prediction
increased monotonically with the level of transformation along the structural
tree. Thus the highest probability of error in the present example would occur
at locations 1 and 7, and the next highest at locations 4 and 10. It was
concluded from these and other findings that observers organize information in
accordance with such structures.
The
sequences in this study, however, were structured in such a way as to allow for
only one parsimonious encoding. It is difficult, therefore, to generalize from
findings on artificial serial patterns to the encoding of sequences that do not
have such special characteristics. The same argument applies to the other
experimental work cited above.
In considering how serial
patterns may in general be hierarchically encoded, it is instructive to
consider the organization of tonal music. Such music is strongly hierarchical
in nature (Keiler, 1983; Lerdahl & Jackendoff, 1983a, 1983b; L. B. Meyer,
1973; Narmour,1977, 1983; Salzer,1962; Schenker,
1956,1973), and it is reasonable to assume that its structure has evolved to
make optimal use of our processing mechanisms. This is particularly the case
where the structure of pitch sequences is concerned.
D.
Deutsch and Feroe (1981) have proposed a model that takes the structure of
tonal music into account and also shows how its characteristics can be
exploited so as to arrive at representations that are parsimonious and also
capitalize on general tendencies of our processing systems. The model assumes
that pitch sequences are represented as hierarchies, at each level of which
elements are organized as structural units in accordance with laws of figural
goodness (e.g., proximity; good continuation), and which tend to be of optimal
chunk size. Elements at any given level are elaborated by further elements so
as to form structural units at the next-lower level until the lowest level is
attained.
A simplified set of rules for the system is as follows:
1. A structure
A of length n is notated as (Ao, A1, . . . ,
At- 1, *, A1+1, . . ., An- 1),
where Aj is one of the operators n, p, s, n` or p`. (A string of length k of an
operator A is abbreviated kA.)
2.
Each structure has associated with it an alphabet, a. The combination of
a structure and an alphabet defines a sequence.
This, in combination with the reference element r, produces
a
sequence of notes.
3. The
effect of an operator is determined by that of the operator next to it, but on
the same side as *. So, for instance, the operator n refers to moving one step
up the alphabet as associated with the structure. The operator p refers to
moving one step down the alphabet. The operator s refers to remaining in the
same position. The operators n` and p` refer to traversing up or down i steps
in the alphabet respectively.
Figure 32.32.

Structural tree corresponding to a particular
sequence, ac cording to Restle (1970). The sequence illustrated
is of six events, which occur in the order 1 2 3 1 2 3 6 5 4 6 5 4. According
to the theory, the basic subsequence is X = (1 2 3). The operation R
("repeat of X") produces 1 2 3 1 2 3. The operation M ("mirror
image of X") produces 1 2 3 6 5 4. The sequence in figure can be described
as M(R(X)). When subjects are asked to predict the next event, they make the
largest number of errors at locations 1 and 7, and the next largest at
locations 4 and 10, as expected from the theory. (From D. Deutsch & P. L.
Roll, Separate 'what' and 'where' decision mechanisms in processing a dichotic
tonal sequence, journal of Experimental Psychology: Human Perception
and Performance, 2. Copyright 1976 by American Psychological Association. Reprinted with permission.)
The
values of the sequence of notes (Ao, A1, .. . , *, ... , A"_ 1), a, r, are obtained
by taking the value of * to be that of r. Given two sequences s4 = (Ao, A1, . .
. , *, . . , A"_1), a, and @ = (Bo, B1, . . , *, . . . , Bm _ 1),
[3, define the compound operator pr (prime). ,sA.[pr]a;r refers to assigning
values to the notes produced from (Bo, B1, . . . , *, . . . , Bm_1),
such that the value of * is identical to the value of Ao, when the sequence A
is applied to r. Values are then assigned to the notes produced from (Bo,
B1, . . . , *, . . , Bm_ 1), so that the value of * is identical to the
value of A2, and so on. This produces a sequence of length n x m. Other
compound operators are analogously defined, such as inv (inversion) and ret
(retrograde).
The
sequence shown in Figure 32.33(a) provides an example to illustrate the model. One
could theoretically describe this sequence in terms of steps ascending the
chromatic scale (see Note 13). One may state that a basic subsequence
consisting of a step up this scale is presented four times in succession, the
second presentation being four steps up from the first, the third being three
steps up from the second, and the fourth being five steps up from the third. In
terms of the present formalism, such a representation would take the cumbersome
form of {{(*, n, R3, nr R2, nr n4, n); Cr}; 7}C.
However,
this description does not relate the key elements of the four subsequences in
any meaningful fashion. Musical analysis would instead describe this sequence
as represented on two hierarchical levels. On the higher level, shown in Figure
32.33(b), there is an arpeggiation of the C-major triad (the notes C-E-C-C). On
the lower level, each note of the triad is preceded by a neighbor
embellishment, so forming a two-note pattern. This representation is
illustrated in the tree diagram on Figure 32.33(c).
Figure 32.33.

Pitch sequence (a), represented as on two hierarchical levels. The higher
level (b) consists of an arpeggiation of the C-major triad. At the lower level,
each note of the triad is preceded by a neighbor note, so that a two-note
pattern is formed. This structure corresponds to the tree diagram in (c). (From
D. Deutsch & ). Feroe, The
internal representation of pitch sequences in tonal music, Psychological Review,
88. Copyright 1981 by American Psychological Association. Reprinted with permission.)
Such a
representation has multiple cognitive advantages. First, since two alphabets
are utilized, only single steps are employed at each level, in accordance with
the principle of proximity. Second, this representation involves two
structures, and taken as chunks of information, the first is a four-element
chunk, and the second is a three-element chunk. The represen tation is
therefore in terms of chunks of optimal size (Estes, 1972; Wickelgren, 1967). Third,
since notes that are present at the higher level are also present at the lower
level, the higher level notes are given prominence, through redundancy of rep
resentation, and so serve to cement the lower-level notes to gether. (This
contrasts with Restle's formalism, in which the reverse is the case.) These and
other processing advantages are discussed at length in Deutsch and Feroe
(1981).
Many
useful insights into cognitive processing of pitch pat terns may be gained from
detailed examination of various music theoretic accounts of hierarchical
structure. (Since these ac counts rely on familiarity with music theory, they
will not be described here, but the interested reader is referred to the fol
lowing sources: Keiler, 1983; Lerdahl & Jackendoff, 1983a, 1983b; L. B.
Meyer, 1956, 1960, 1973; Narmour, 1977, 1983; Schenker, 1956, 1973).
2.6.
The Influence of Short-Term Memory on Perception of Pitch Patterns
The
accuracy with which a pattern of pitches is perceived depends on the accuracy
with which the individual pitches in the pattern can be related to each other. The
most simple relational judgment that can be made here is whether two tones
occurring in the pattern are the same or different. As will be shown, such judg
ments are heavily dependent on a number of variables.
2.6.1. Interference Effects in Short-Term Memory for
Pitch.
When two tones that occur in succession are to be judged as the same or
different in pitch, recognition accuracy declines very slowly over a silent
retention interval (Bachem, 1954; Harris, 1952; Koester, 1945; Wickelgren,
1966, 1969a). In con trast, the interpolation of a sequence of extra tones
during the retention interval results in a substantial decrement in per
formance. This is true even when the subjects are instructed to ignore the
interpolated tones. The disruptive effect due to the interpolated tones is
specific in nature and is not based on an overloading of some general
limited-capacity storage system. The interpolation of a sequence of spoken
digits instead does not cause a performance decrement, even when recall of these
digits is required (D. Deutsch, 1970).
The
interference effect of a tone that forms part of an in terpolated sequence
depends on the pitch relationship between this tone and the first test tone. D.
Deutsch (1972b) demonstrated this phenomenon using the paradigm illustrated in
Figure 32.34. Subjects were presented with a first test tone, which was
followed by a sequence of interpolated tones, and then by a second test tone. Either
the test tones were identical in pitch or they differed by a semitone. The subjects
were asked to ignore the interpolated tones and to judge whether the test tones
were the same or different. The relationship between the tone
in the second serial position of the interpolated sequence (the "critical
tone") varied in increments of one-sixth of a tone between identity and a
whole tone separation.
Figure 32.34.

Representation of paradigm to
examine the effect on pitch recognition accuracy of a critical tone that formed
part of a sequence that was interpolated between two test tones. Either
the test tones were identical in pitch or they differed by a semitone, and
subjects judged whether they were the same or different. The relationship
between the critical tone and the first test tone varied in increments of
one-sixth of a tone between identity and a whole tone separation. (From D.
Deutsch, The organization of short term memory for a single acoustic attribute,
in D. Deutsch & 1. A. Deutsch (Eds.), Short term memory, Academic
Press, Inc., 1975. Reprinted with per mission.)
The
results of the experiment are shown on Figure 32.35. When the first test tone
and the critical tone were identical in pitch, memory was facilitated. As the
pitch separation between these tones increased, the error rate also increased,
then peaked at a separation of two-thirds of a tone and returned to baseline at
a separation of roughly a whole tone.
This
pattern of results may be explained by assuming that pitch memory is the
function of an array whose elements are activated by tones of specific pitch. These
elements are tono topically organized on a log frequency continuum. Inhibitory
interactions take place between elements along this array that are a function
of the distance separating them. These interactions are assumed to be analogous
to recurrent lateral inhibitory interactions in systems that handle sensory
information at the incoming level (Ratliff, 1965). It is assumed that when
these memory elements are inhibited they emit weaker signals, so that an
increase in recognition errors results.
Figure 32.35.

Percentage of errors in pitch comparisons, plotted as
a function of the separation in pitch between a critical interpolated tone and
the first test tone. The critical tone was always in the second serial
position of a sequence of six interpolated tones. The error rate was maximal at
a separation of two-thirds of a tone. (From D. Deutsch, Mapping of interactions
in the pitch memory store, Science, 1972, 175. Reprinted
with permission.)
PERCEPTUAL ORGANIZATION AND COGNITION
This
hypothesis is strengthened by two further lines of evidence. First, the
relative frequency range over which the disruptive effect operates corresponds
well with the range over which centrally acting lateral inhibition has been
found in physiological studies of the auditory system (Klinke, Boerger, &
Gruber, 1969, 1970). Second, the error rate in this pitch recognition task
cumulates when two critical tones bearing an inhibitory relationship to the
first test tone are interpolated, one higher than the first test tone and the
other lower (D. Deutsch, 1973a). Analogously, in lateral inhibitory networks
there is also accumulation of inhibition from stimuli that are placed on either
side of the test stimulus (Ratliff, 1965).
If a
recurrent lateral inhibitory network were indeed involved here, we should also
expect to find the phenomenon of disinhibition (see Note 14). More precisely,
we should expect that, if a tone that was inhibiting memory for another tone
was itself inhibited by a third tone, memory for the first tone should return.
Accordingly,
D. Deutsch and Feroe (1975) performed the following experiment. Subjects
compared two test tones for pitch with the tones separated by a sequence of six
interpolated tones. In the second serial position of the interpolated sequence
there was always placed a tone that was two-thirds of a tone removed from the
first tone (i.e., in a maximally inhibiting relationship to the first test
tone). Errors were plotted as a function of the pitch of a further tone. This
tone was placed in the fourth serial position, and its relationship to the tone
in the second serial position varied between identity and a whole tone
separation. The results are shown in Figure 32.36. It can be seen that the
predicted effect was indeed obtained. The error rate in sequences in which the
second critical tone was identical to the first was significantly higher than
baseline. Further, the error rate in sequences in which the two critical tones
were separated by two-thirds of a tone was significantly lower than baseline. In
a companion experiment using subjects selected on the same criterion as for the
disinhibition study, a first-order inhibitory function was obtained. The
theoretical disinhibition function was then calculated from this first-order function.
The two are plotted in Figure 32.36, and it can be seen that there is a very
good correspondence between the disinhibition function derived experimentally
and the function derived theoretically. Strong evidence is therefore provided
for the hypothesis that pitch memory elements are arranged as recurrent lateral
inhibitory ` networks, similar to those observed in systems handling sensory
information at the incoming level.
In
summary, this set of studies demonstrates that a sequential pattern of pitches
will be perceived to a greater or lesser degree of accuracy, depending in a
precise and systematic fashion on the relationships between the individual
pitches in the pattern. Further, such accuracy can be predicted in part from a
mathematical model based on that describing lateral
inhibitory interactions in sensory systems (see D. Deutsch & Feroe,
1975, for a detailed description of the present model).
There
is a further effect that operates to cause disruption of pitch comparison
judgments. When two test tones are separated by a sequence of interpolated
tones, and a critical tone is interpolated that differs in pitch from the first
test tone but that is identical in pitch to the second test tone, there is an
increased tendency to misrecognize the pitch of the second test tone as
identical to the first.
Figure 32.36.

Percentage
of errors in pitch recognition as a function of the pitch relationships between
the first test tone and two critical interpolated tones. The dotted line plots
percentage of errors in an experiment that varied the relationship between the
first test tone and the critical tone. (The horizontal dotted
line at right shows percentage of errors where no tones were interpolated
within the critical range.) The solid line displays percentage of errors
in an experiment in which a tone that was two-thirds of a tone removed from the
first test tone was always interpolated. Errors are plotted as a function of
the relationship between this tone and a second critical tone that was further
removed along the pitch continuum. The dashed line displays percentage of
errors for the same experimental conditions predicted theoretically from the
lateral inhibition model. (The horizontal solid and dashed lines at right show
percentage of errors obtained experimentally and assumed theoretically where no
further critical tone was interpolated.) When the second critical tone was identical
in pitch to the first, Errors were significantly enhanced compared with the
baseline condition where no further critical tone was interpolated. When the
second critical tone was two-thirds of a tone removed from the first, errors
were significantly reduced compared with this baseline condition. (From D. Deutsch & J. Feroe, Disinhibition in pitch memory,
Perception and Psychophysics, 1975, 17. Reprinted with permission.)
This
tendency is substantially greater when the critical tone is placed early in the
interpolated sequence
rather than late (D. Deutsch, 1972a). To explain this phenomenon, it was
hypothesized that memory for the pitch of a tone is laid down both on a pitch continuum and on a temporal or
order continuum. The distribution of this memory trace spreads in both
directions as time proceeds, but particularly along the temporal or order
continuum. Because of this spread, when a tone of the same pitch as the second
test tone is included in the
interpolated sequence, the subject sometimes concludes that this had been the
first test tone. In other words, errors of misrecognition result from the
subject's recognizing that a tone of identical
pitch to the second test tone had occurred, but not being certain when it had
occurred (D. Deutsch, 1972a). Further experiments have provided supporting
evidence for this view (D. Deutsch, 1975d).
2.6.2. Facilitation Through
Repetition in Short-Term Memory for Pitch. The effect of a critical
interpolated tone on pitch comparison judgment need not be disruptive, but may
instead be facilitatory. For example, if a tone whose pitch is identical to
that of the first test tone is included in the interpolated sequence,
comparison performance is enhanced. Subjects judge more accurately both that
the two test tones are identical in pitch and also that they differ. This was
demonstrated in an experiment by D. Deutsch (1975b) in which there were three conditions. In the first, two test tones were
compared for pitch when these were separated by a sequence of six interpolated
tones. In the second, a sequence of four tones was interpolated instead. In the
third, six tones were again interpolated, and a tone of identical pitch to the
first test tone was placed in the second serial position of the interpolated
sequence. The error rate was lowest in this third condition; indeed, it was
significantly lower than in the condition where only four tones were interpolated.
A
companion experiment showed that this facilitation effect was sensitive to the
serial position of the repeated tone, being substantially greater when the
repeated tone was presented early in the interpolated sequence than when it was
presented late. For this and other reasons, it was concluded that the facilitation
effect results from the same process as causes the errors of misrecognition
described above, that is, a spread of memory distribution along a temporal or
order continuum. It was hypothesized that when two such distributions overlap
their overlapping portions sum, so that a stronger memory trace results. In any
event, we can see that there is a strong perceptual advantage to repeating a
tone in a pattern, particularly if the two occurrences of the tone are closely
spaced.
2.6.3. The Influence of Relational Context on Pitch
Comparison judgments. A substrate for short-term memory for intervals was
hypothesized by D. Deutsch (1975a). Such memory was assumed to be based on an
array whose elements are ac tivated by the presentation of simultaneous or
successive pairs of tones. Tone pairs whose fundamental frequencies
stand in the same ratio project out the same elements in the array, and tone
pairs whose fundamental frequencies stand in closely sim ilar ratios project
out neighboring elements. Interactive effects take place along this
array that are analogous to those occurring in the system that retains absolute
pitch values. Such effects include facilitation through repetition and
similarity-based interference.
This
hypothesis received support from an experiment by D. Deutsch (1978b), which
also showed that interval information can affect comparison judgments
concerning the absolute pitches of tones with which the intervals are
associated. In this experiment, subjects compared the pitches of two test tones
that were both accompanied by tones of lower pitch. The test tones either were
identical in pitch or differed by a semitone. The tone ac companying the first
test tone was always identical in pitch to the tone accompanying the second
test tone. The test tones were separated by a sequence of six interpolated tones.
The tones in the second and fourth serial positions of the interpolated
sequence were also accompanied by tones of lower pitch.
It was
found that, when the intervals formed by the interpolated combinations were
identical in size to the interval formed by the first test combination, the
error rate was lower than when the sizes of the interpolated intervals were
chosen at random. This shows that the system retaining interval information
exhibits facilitation through repetition in the same way as the system
retaining absolute pitch information. Further, when the intervals formed by the
interpolated combinations differed in size by one semitone from the interval
formed by the first test tone combination, the error rate was higher than when
the intervals formed by the interpolated combinations were chosen at random. This
shows that the system retaining interval information is also subject to
similarity-based interference, analogous to that underlying memory for absolute
pitch values, and is consistent with the presence of the interval size array
whose characteristics were hypothesized above.
The
systems retaining absolute pitch information and interval information interact
in determining pitch comparison judgments. For example, judgments of sameness
or difference in the pitches of two test tones are biased by a sameness or
difference in the harmonic intervals with which the test tones are associated. If
the test tones differ, but are associated with the identical harmonic
intervals, there is an increased tendency to misjudge them as identical. Similarly,
if the test tones are identical, but are associated with different harmonic
intervals, there is an increased tendency to misjudge them as different (D.
Deutsch & Roll, 1974). An analogous effect holds for melodic intervals also
(D. Deutsch, 1982a).
Pitch
comparison judgments are influenced by relational context in yet another way. When
two test tones are presented and separated by a sequence of interpolated tones,
the entire configuration forms a framework of pitch relationships to which the
test tones are anchored. Thus the firmer the processing of the melodic
intervals within the configuration, the more accurate should be comparison
judgments involving the test tones.
One
would expect from findings described in Section 1.3 that melodic intervals
would be more securely processed when these are of small size rather than
large. It was therefore hypothesized that pitch comparison judgments would
become more accurate as the average size of the melodic intervals formed by
successive tones in the interpolated sequence was reduced. In an experiment to
examine this prediction, two test tones were presented, separated by a sequence
of six interpolated tones. In the first condition, the interpolated tones were
chosen at random from a range of one octave, and they were also ordered at
random. In the second condition, the interpolated tones were again chosen at
random from a range of one octave, but they were arranged in monotonically
ascending or descending order, so that the average size of the melodic
intervals was reduced. In the third condition, the interpolated tones were
chosen at random from a range of two octaves and were also ordered at random. In
the fourth condition, the interpolated tones were chosen at random from a range
of two octaves but were arranged in monotonically ascending or descending
order. It was found, in accordance with the hypothesis, that
as the average interval size formed by successive tones decreased the error
rate in pitch comparison judgment also decreased (D. Deutsch, 1978a).
2.7.
Contour as a Cue in Recognition of Pitch Patterns
Various
studies have shown that melodies may be recognized on the basis of contour
alone. Werner (1925) presented subjects with familiar melodies that were
transformed onto very small scales, and he called these micromelodies. The
subjects were able to recognize the micromelodies despite the fact that the
interval sizes were drastically altered. White (1960) presented listeners with
familiar melodies that were transformed by setting all the intervals to one
semitone, so that recognition was mediated solely by the sequence of directions
of pitch change. He found that above-chance performance was obtained under
these conditions, showing that even preservation of relative interval size was
not essential for melody recognition.
Dowling
and Fujitani (1971) confirmed the role of contour in melody recognition using
the following paradigm. Subjects were presented with a standard melody followed
by a comparison melody. The comparison melody either was identical to the
PERCEPTUAL ORGANIZATION AND COGNITION
standard, or had the same contour but was composed of
different intervals, or was entirely different. The comparison melody either
began on the same pitch as the standard or was transposed to a different pitch
level. The authors found that, when the comparison melody was not transposed
from the standard, recognition of a difference between the standard and
comparison melodies was at a high level, both when contour was preserved and
when it was not preserved. However, when the comparison melody was transposed
from the standard, the subjects' performance levels did not differ depending on
whether the trans position was exact or whether
contour alone was preserved. The authors concluded that the subjects were
basing their recognition judgments on contour rather than on a sameness or
difference in interval size (see also Idson & Massaro, 1978; Kallman &
Massaro, 1979).
2.8. Scale and Key Structure in Recognition of Pitch
Patterns
A
number of studies have shown that subjects will utilize their knowledge of
scale structure in making recognition judgments concerning melodies.
A
related study was performed by Dowling (1978). He presented subjects with a standard
melody, followed by a comparison melody, and required them to judge whether or
not the comparison was a correct transposition of the standard. The comparison
melody was related to the standard in any of four ways. The first type of
comparison melody was an exact transposition of the standard. The second
employed notes in the same diatonic scale as the standard, and for each
successive relationship the number of steps up or down this scale was
preserved, but the sizes of the intervals were sometimes altered in
consequence. In the third type, the contour of the melody was preserved but the
intervals between successive pitches were randomly selected without regard to
scale. In the fourth type, the intervals between successive pitches were also
randomly selected, and in addition . the contour differed from that of the standard.
Dowling found that discrimination of a difference between ' the melodies
was at a high level when the comparison melody ,
consisted of intervals selected without regard to scale, particularly when
contour was altered. However, subjects showed a strong tendency to judge as a
correct transposition one in which the number of steps up or down the scale was
preserved but the interval sizes were altered in consequence. This finding again
shows that subjects were basing their judgments in part on a sameness or
difference in scale membership.
In
classical western music, when a melodic line is repeated at a different pitch
level but remains in the same key, this repetition typically preserves the
number of steps up or down the scale, with the result that exact interval sizes
are often altered. This type of transposition generally appears correct to the
listener, whereas an exact transposition that results in a departure from the
scale appears incorrect (D. Deutsch, 1977).
AUDITORY PATTERN RECOGNITION
This
musical convention is likely to have evolved to exploit the tendency of the
perceptual system to process pitches in terms of memberships of restricted,
highly overlearned sets.
The
influence of overlearned pitch structures on memory was demonstrated in another
way by Kubovy (1979). Subjects compared the pitches of two test tones that were
separated by a sequence of eight interpolated tones. On half the trials the
interpolated sequences were constructed so as to suggest a key. The results
showed that, for such sequences, when the first test tone was in the same
diatonic scale as the interpolated tones, recognition performance was better
than when the first test tone was not in this scale. Krumhansl also noted that,
in a tonal context, when the second test tone was not in the same diatonic
scale as the interpolated tones, it was more frequently confused with a first
test tone that was in this scale than when the reverse was true. She concluded
that there is an instability of memory representation
for tones outside a scale that has been established for the listener, so that
these tones tend to become assimilated to tones that are in the scale.
Bharucha
and Krumhansl (1983) studied memory for chords that were presented in sequence.
They found that memory for sequences of chords that were chosen at random was
poorer than that for sequences of chords that were drawn from a single key and
that formed conventional harmonic progressions (see Note 15). Further, when the
sequences were all in the same key, the substitution of one chord in the key
for another was difficult to detect. However, a change from a chord in the key
to a chord outside the key was easy to detect. In addition, more errors of
confusion occurred when a chord outside the key was changed to a chord inside
the key than when the opposite change occurred. This is in accordance with the
idea that elements outside an established key are represented in memory in a
less stable fashion and so tend to become assimilated to elements inside the
key.
Differences
between short-term and long-term memory in the processing of melodic
information have been noted. Attneave and Olson (1971) showed that, when
subjects were asked to transpose unfamiliar melodies to different pitch levels,
per formance was very poor, at least for those who were musically untrained. However,
when a familiar sequence was employed instead, excellent performance was
obtained. The finding that exact interval information is well retained in
long-term memory is in accordance with general observation. Bartlett and
Dowling (1980) also found experimentally that recognition of exact in tervals
was at a very high level for familiar melodies. In a further experiment,
Dowling and Bartlett (1981) compared im mediate with delayed recognition tests
for a set of short, novel melodies containing tones that were not all in the
same key. Although both interval and contour information were more difficult to
retrieve following a delay, discrimination of exact transpositions from inexact
transpositions that preserved contour did not decline with delay.
2.9. Memory for Hierarchically Organized Pitch
Patterns
This
section is concerned with memory for pitch information that is projected onto
overlearned pitch alphabets or scales and is organized in the form of
hierarchies (Section 2.5). It has been found using verbal materials that, when
information was hi erarchically structured and the observer was able to
capitalize on this structure to produce a more efficient encoding, memory was
enhanced. However, if the hierarchically structured information was presented
so as to prevent encoding in accordance with structure, an enhancement of
memory did not result (Bower, 1972).
An analogous phenomenon was demonstrated in an experiment on memory for hierarchically structured tonal sequences (D. Deutsch, 1980b). Musically trained subjects listened to sequences of tones and recalled what they heard in musical notation. Examples of the presented sequences are shown in Figure 32.37.
The sequence in Figure 32.37(a) consists of a higher-level
subsequence of four elements that acts on a lower level subsequence of three
elements. The sequence in Figure 32.37(b) was constructed from the same set of
tones as in Figure 32.37(a) but arranged in haphazard fashion. The sequences
were each presented in three temporal configurations. In the first the tones
were spaced at equal intervals, in the second they occurred in four groups of
three (so that segmentation was in accordance with tonal structure), and in the
third they occurred in three groups of four (so that segmentation was in
conflict with tonal structure).
Figure
32.38 displays the percentage of tones that were correctly recalled in each
serial position in the different experimental conditions. It can be seen that
large effects of tonal structure and temporal segmentation were produced. For
structured sequences that were segmented in accordance with structure, a very
high level of recall was obtained. For structured sequences that were
unsegmented, the level of recall was again very high, though slightly lower. But
for structured sequences that were segmented in conflict with structure, the
level of recall was considerably lower. For unstructured sequences, recall
levels were lower still, but in the same range as for structured sequences that
were segmented in conflict with structure. From the shape of the serial
position functions, and also from analysis of transitional shift probabilities,
it was demonstrated that the subjects were grouping the tones on the basis of
temporal segmentation, even when this conflicted with tonal structure.
The
experiment therefore demonstrates that listeners are well able to perceive
hierarchical structures that are present in tonal sequences and to use such
structures to produce a more efficient
memory representation. However, temporal segmen tation in conflict with
tonal structure may destroy the capacity to exploit this information.
Figure 32.37.

Examples
of sequences employed in experiment on effect of tonal structure and temporal
segmentation on recall of pitch sequence· Sequence (a) consists of a
higher-level subsequence of four elements tha acts on a lower-level subsequence
of three elements. Sequence (b) consist of the same
set of tones as in sequence (a), but arranged in haphazarc fashion. (From D.
Deutsch, The processing of structured and unstructure tonal sequences,
Perception and Psychophysics, 1980, 28. Reprinted with
permission.)
Figure 32.38.

Results
of experiment on the effects of structure and temporal segmentation on recall of , pitch sequences. The percentages of tones correctly
recalled at each serial position in the different conditions of the experiment
are plotted. 35: Structured in groups of three; segmented in groups of three. 4S:
Structured in groups of three; segmented in groups of four. OS: Structured in
groups of three; unsegmented. 3U: ` Unstructured; segmented in groups of three.
4U: Unstructured; segmented in groups of four. OU: Un structured;
unsegmented. Recall levels were very high for structured sequences, and for
sequences that were segmented in accordance with structure. Recall levels were
substantially lower for unstructured ' sequences, and for structured sequences
that were segmented in conflict with structure. (From D.
Deutsch, The processing of structured and unstructured tonal sequences,
Perception and Psychophysics, 1980, 28.Reprinted with permission.)
3. ANALYSIS OF TIMBRE
Timbre
may be described as that perceptual quality of a sound that distinguishes it
from other sounds, when simple attributes such as pitch and loudness are held
constant. The imprecision of this definition reflects the fact that timbre
perception is a complex and little-understood phenomenon. This section focuses
on the timbre of musical instrument tones, since it is with these that studies
of timbre perception have been mostly concerned. However, the methods developed
in such studies and the results so far obtained should ultimately prove of
importance to understanding the perception of sound quality in general.
3.1. Timbre and Fourier Analysis
The
classical view of timbre perception is that sound quality may be attributed
entirely to the spectrum of the sound in steady state. Fourier's theorem states
that a periodic waveform is defined by the amplitudes and phases of a harmonic
series of spectral components. It was assumed that the ear is capable of
performing such an analysis, except that it is insensitive to phase. However,
others have argued that, given a periodic tone, a change in the phase
relationships between the harmonics of the tone can alter perceived timbre
(Mathes & R. L. Miller, 1947; Plomp & Steeneken,1969), though this
effect is generally weak (Cabot, Mino, Dorans, Tackel, & Breed, 1976;
Schroeder, 1975). Other effects of phase on the perception of tone complexes
are described elsewhere in this chapter (Sections 1.1.1, 1.2.6).
In considering steady-state tones, one issue of importance is whether
timbre is associated with the relationship between the frequency region of a
formant and the fundamental (see Notes 2 and 16) or whether it depends on the
absolute level of the formant, regardless of the frequency of the fundamental. Slawson (1968)
had subjects make similarity judgments between pairs of tones in which were
varied the fundamental frequencies, the two lower formant frequencies, and the
higher formant frequencies. He found that, when the fundamental frequency of
the second tone of each pair was an octave above the first, timbral quality was
best preserved when the two lower formants were transposed by approximately 10%d
of the transposition of the fundamental. Plomp and Steeneken (Note 17)
presented subjects with pulse trains through filters that had different center
frequencies. Tones filtered at fixed frequencies were judged as more similar to
each other than were tones filtered at fre quencies relative to their pulse
rates. Thus timbre appears to be related more to
spectral envelope than to the amplitude relationships between the harmonics.
Other
studies have been concerned with the role of the critical band (see Note 18) in
timbre perception. Plomp and Mimpen (1968) and Plomp (1970) concluded that
partials falling within the same critical band could not be distinguished from
each other. For such reasons, spectra are sometimes displayed so as to take
account of critical bands (Grey & Gordon, 1978; Zwicker, 1961; Zwicker
& Scharf, 1965). Further, when many partials lie within the same critical
band, the resultant sound is harsh (Risset & Wessel, 1982).
The
classical approach assumes that timbre perception de pends essentially on the
spectra of tones in the steady state. A strong argument against this notion is
that such spectra may be radically altered in various ways without much
affecting perceived timbre. This happens, for example, when sounds are
presented through a poor recording. Also, the frequency response of a normally
reverberant room differs at different points in the room, with the result that
sound spectra may be drastically changed. However, perceived timbre does not
change dramat ically as the listener shifts position in a room (Risset &
Wessel, 1982).
For
such reasons, recent studies of timbre perception have been concerned with the
time-variant properties of tonal stimuli. Details of the initial portion of a
tone, known as the attack, have been shown to exert a considerable influence on
perceived timbre (Berger, 1964; Grey, 1975; Risset, 1966; Saldanha & Corso,
1964; Schaeffer, 1966; Wessel, 1973). Fluctuations in the steady-state portion
and characteristics of the decay have also been found to exert an influence
(Risset,1966; Risset & Wessel, 1982; Schaeffer,
1966).
3.2. Investigation of Timbre by Analysis and
Synthesis
Risset
and Mathews (1969) pioneered an important technique in the study of timbre
perception. Here, samples of natural instrument tones are digitized and
analyzed by computer, and a set of physical parameters is thus extracted. Tones
are then resynthesized by computer in accordance with these physical
Parameters. With this technique, the experimenter can vary systematically any
parameters that he wishes and so examine the perceptual effects of these
variations.
Figure 32.39.

(a)
Time-varying amplitude functions derived from heterodyne analysis for a bass
clarinet tone, shown as an amplitude x Frequency x time perspective plot, with
the fundamental harmonic plotted in the back ground. (b) Line-segment
approximation to the functions plotted in (a). Both functions have been
employed to resynthesize the tone, but form (b) provides considerable
information reduction. Data from Grey and Moorer. (From ). C. Risset & D. L. Wessel,
Exploration of timbre by analysis and synthesis, in D. Deutsch (Ed.), Psychology
of music, Academic Press, Inc., 1982. Re printed with permission.)
For
example, when tones
are resynthesized with a line-segment approximation to the time-varying
amplitude and frequency function for the par tials, very little loss of
characteristic perceptual quality results, though there may be considerable
information reduction (Grey & Moorer, 1977; Risset & Mathews, 1969). An
example of a line-segment approximation is given on Figure 32.39.
3.3. Multidimensional Models of Timbre
Geometric
models of subjective timbral space have been provided by multidimensional
scaling techniques and have proved very effective. Subjects are asked to rate
many pairs of tones for similarity, and their data are submitted to
multidimensional scaling programs. J. R. Miller and Carterette (1975) have demonstrated
that musical training affects timbral spaces in a com plex fashion. In one
experiment, the fundamental frequency was one of the dimensions varied. Due to
the overwhelming salience of this dimension no differences were found that de
pended on training. However, when in a second experiment the fundamental
frequency was held constant, differences that de pended on training emerged.
Wessel
(1973) employed tones of identical fundamental fre quency and duration, which
were taken from nine orchestral instruments. He concluded that instrumental
timbre could be ordered along two perceptual dimensions. The first related tc the distribution of energy in the steady state. Tones
with morE energy at high frequencies appeared at one end of this dimension and
tones with more energy at low frequencies appeared at the other end. The second
dimension related to tonal onset patterns. Tones whose low-order harmonics
emerged more rapidly appeared at one end of this dimension, and tones whose
high order harmonics entered more rapidly appeared at the other end.
Grey
(1975) performed an experiment that employed 16 instrument tones that were
resynthesized by computer and equated for pitch, loudness, and duration. His
data were most consistent with a three-dimensional solution. The first
dimension related to the tones' spectral energy distribution. Tones with narrow
bandwidths and a concentration of low-frequency energy appeared at one end, and
tones with wide bandwidths and less concentration of low-frequency energy appeared
at the other end. A second dimension was related to the distribution of energy
in the attack segment. At one end, tones displayed high-frequency,
low-amplitude energy in the attack, and at the other end there was no
high-frequency precedent energy in the attack. For the third dimension, two
alternative interpretations were proposed. The first was that this dimension
related to the form of onset-offset patterns. The second hypothesis was that
this was a cognitive dimension, along which the tonal stimuli were arranged
according to instrument family (e.g., brass, strings, woodwinds).
This three-dimensional space is displayed in Figure 32.40.
3.4. Role of Context in Timbre Perception
The
importance of a cue to timbre perception has been found to depend on the
context in which this cue is embedded. In the perception of trumpet tones,
details of the attack are more important for long tones than for short ones
(Risset, 1966). In the perception of piano tones, the shape of the initiation
of the decay is important to how the attack portion is perceived. Fur ther,
when tones are presented in close succession, their timbres are perceived
differently than when these tones are presented in isolation. This was
demonstrated by Grey (1978), who presented computer-synthesized tones either in
isolation or in single or multivoiced musical contexts. He found that timbre
discrimination was more difficult in multivoiced contexts, and that a
single-voiced context caused a perceptual enhancement of spectral differences
relative to isolated tones, while the presentation of isolated tones allowed
listeners to compare temporal details more clearly.
Figure 32.40.

Three-dimensional display of similarities between
different instrument timbres generated by multidimensional scaling. 01, 02, = oboe; Cl, C2 = clarinets;
X1, X2, X3 = saxophones; EH = English horn; FH = French horn; S1, S2, S3 =
strings; TP = trumpet; TM = trombone; FL = flute; BN = bassoon. The proximities
of the instruments to each other in this three-dimensional space indicate the
extents of their perceived similarity. (From J. M. Grey,
Timbre discrimination in musical patterns, journal of the Acoustical
Society of
4. PERCEPTION OF TEMPORAL RELATIONSHIPS
This
section is concerned with the ways in which the listener abstracts temporal
relationships from patterns of sound. First, we shall examine the perception of
temporal order of two or more events. Second, we shall consider the evidence
concerning grouping mechanisms for temporally patterned stimuli. Third, we
shall examine the encoding of rhythmic patterns.
4.1. Perception of Temporal Order
4.1.1.
Modes of Order Perception. Following Hirsh (1974) and R. M. Warren (1974b), we
may distinguish three basic modes of order perception in hearing, each loosely
associated with a different range of temporal values. First, with very small
time intervals separating the onsets of two events (under 10 msec), there
results a single fused sound. Differences in the quality of this sound then
serve as bases for temporal order judgment. For example, small interaural time
differences between onsets of dichotically presented sounds give rise to
lateralization cues (Babkoff, 1975). Also, with sounds presented monaurally or
dichotically, spectral differences resulting from asynchrony of onset give rise
to changes in sound quality (Patterson & Green, 1970). At somewhat longer
time differences, order judgments may be based on the figural or Gestalt
properties of a sound sequence, while the listener may still be unable to name
the order of individual events within the sequence (R. M. Warren, 1974a). Finally,
when sufficiently long time intervals separate the onsets of successive events,
the listener can make order judgments by an item-by-item analysis of the
pattern components.
Estimates
of the ranges of temporal values associated with these three stages have been
found to vary considerably de pending on the training of the subject, the
experimental paradigm, and the stimulus parameters employed (R. M. Warren,
1982). A major problem in providing such estimates is that one type of judgment
can easily be disguised as another. For example, the subject can learn to
associate one sound quality with the judgment A followed by B, and a
different sound quality with the judgment B followed by A. Similarly,
the subject can learn to attach the label single sound event to one
sound quality, and the label two sound events to a different sound
quality.
4.1.2.
Perception of the Order of Two Events. Hirsh (1959) employed pairs of sounds
drawn from a variety of tones, hisses, and clicks to establish the minimum time
between the onsets of these sounds required for their order to be correctly
reported 75% of the time. He concluded that this minimum was around 20 msec,
with little variation due to the different types of sound presented, or to
their levels. The subjects in this experiment were highly trained, and they
were allowed to listen to the sound pairs as often as they wished before
reaching a decision.
Later
Hirsh and Sherrick (1961) investigated the ability to order two auditory
events, two visual events, and two tactile events. Pairs of stimuli drawn from
two sensory modes were also employed. As in the previous study, trained
observers were used, and they were allowed to inspect the stimulus pairs as
often as they wished. Thresholds of around 20 msec were again obtained,
regardless of stimulus modality. The authors concluded that the value of
approximately 20 msec represents a funda mental limit for the perception of
temporal order when special modality-specific conditions are excluded.
Other
studies have investigated the effect of repeated pre sentations and training on
judgment of the order of tone pairs.
Hirsh
and Fraisse (1964) employed untrained subjects and pre sented them with a
single stimulus pair on each trial. For ac curate identification of order, a
difference of around 60 msec was required when a sound followed a light, and of
around 100 msec when a light followed a sound. Later, Gengel and Hirsh (1970)
found effects of both number of presentations and training. With untrained
subjects, single trials yielded thresholds of about 45 msec, which decreased to
below 30 msec following roughly 10 sessions. Repeated presentations yielded
thresholds of about 25 msec which decreased with training to roughly 18 msec.
Broadbent
and Ladefoged (1959) also reported differences depending on continued
listening. Pairs of sounds presented were buzzes, hisses, or tones, During the first few presentations, correct ordering was not
achieved even at 150 msec durations. However with repeated listening, accurate
perception became possible at 30-msec durations. These authors remark that dis
criminations were made here on the basis of "quality" rather than
"perceived order." R. M. Warren (1974b) has found that. with special training in which subjects were presented -with
sequences that speeded up gradually, correct naming of the temporal order of
spectrally different sounds was possible with separations as low as 5 msec.
4.1.3. Perception of the Order of Three or More
Events Hirsh (1976) studied the perception of the order of three het erogeneous
stimuli: a sound, a light, and a vibrotactile stimulus. In one experiment,
subjects were asked to identify which stim ulus occurred at the beginning of a
pattern. Performance on three-element patterns was found to be poorer than on
two element patterns. Hirsh concluded that the third element had the effect of
impairing judgment as to which of the two prior elements had come first. Analogously,
Divenyi and Hirsh (1975) have found that identification of the temporal order
of three 20-msec tones was depressed when a fourth tone, which was irrelevant
to the task, was added to the sequence.
In a
second experiment, Hirsh asked subjects to identify which of six possible
permutations of a set of sounds had been presented. With 15 msec between onsets
of successive sounds, performance was barely above chance; performance rose
with 45 msec between onsets, and again with 150 msec between onsets.
In a
third experiment, Hirsh studied the effects on order identification of
repeating a stimulus pattern when there were clear breaks between repetitions. This
procedure resulted in an improvement in performance. In a final experiment, the
stimuli were presented in continuously cycling fashion, and performance levels
here were substantially poorer. Phenome nologically, the pattern was perceived
as three distinct trains or streams, corresponding to the three presentation
modes. This finding is comparable to those on continuously cycling sound
patterns to be described in Section 4.1.4.
4.1.4. Order Perception in Continuously Cycling
Sound Pat terns. When several disparate sound elements are cyclicall-, presented,
judgments of the orders of these elements become surprisingly difficult. This
was shown in one situation by Breg man and Campbell (1971). They presented
subjects with re peating sequences of six 100-msec tones, such that tones fror.
a low frequency range alternated with tones from a high fre quency range, with
about 1 '!a octaves separating the ranges Following each sequence, a three-tone
pattern was presente in isolation, and the subjects judged whether these three
ton( had occurred in the same order and spacing within the six-for sequence.
Judgments were above chance only if the three test tones were all in the same
frequency range.
A
substantial difficulty in ordering appears when sounds of differing quality are
presented in continuously cycling fashion. R. M. Warren, Obusek, Farmer, and R.
P. Warren (1969) con structed such a sequence from a high tone, a hiss, a low
tone, and a buzz. At presentation rates of 200 msec, subjects were unable to
name the orders of these sounds. For correct ordering to be achieved, it was
necessary to increase the duration of each sound beyond 500 msec.
R. M.
Warren (1974a) showed, however, that, when subjects were not required to name
the orders of the component sounds but rather to make
"same-different" comparisons between two repeating sequences,
performance levels were considerably su perior. This was true even for
unpracticed subjects. Such se quences can therefore be identified on a
wholistic basis at pre sentation rates at which element-by-element order
judgments cannot be made.
R. M.
Warren (1974b) examined the ability of trained subjects to attach learned
descriptive levels to cycling sequences of four auditory elements. The
sequences were first presented at long durations so that correct naming was
possible and were then gradually speeded up. By a series of such transfers,
subjects were enabled to make correct judgments with durations as short as 10
msec per item. Correct naming was achieved here through a
disguised wholistic pattern recognition. An effect of this sort could
account for the finding that cycling sequences of verbal items are correctly
ordered at considerably faster rates than cycling sequences of unrelated sounds
(Dorman, Cutting, & Raphael, 1975; Thomas, Cetti, & Chase, 1971;
Thomas, Hill, Carroll, & Garcia, 1970; R. M. Warren & R. P. Warren,
1970).
4.1.5. Theories of Order Perception. We may enquire
into the nature of the process that enables the rapid reconstruction of the
order of components of familiar sound sequences. R. M. Warren (1974b) proposed
that such recognition is mediated by a two-stage process. At the first stage,
the sequence is recognized in a wholistic fashion, that is, as a "temporal
compound" that can be distinguished from other compounds without being analyzed
in terms of its components. At the second stage, there occurs an item-by-item
analysis of the components of this com pound together with their orderings.
Another
proposal was made by Wickelgren (1969b, 1976) for the case of speech sounds;
however, it may be applied to nonspeech sounds also. He suggested that the
correct ordering of the components of speech sounds is based on an encoding of
a set of context-sensitive elements that however need not them
selves be ordered. Thus he proposed for example that the word struck is encoded
not as the ordered set of phonemes /s/, /t/, /r/, /u/, /k/, but rather as the unordered set of context-sensitive
allophones /#at/, /str/, /tru/, /r°ree;k/, /uk#/. In this way, each of
these context-sensitive elements contains some local information as to how this
element is ordered in relation to the other elements. Information concerning
the order of these elements can thus be derived from such an unordered set.
Sternberg
and Knoll (1973) have proposed a model of temporal order judgment. This model
deals only with the case in which information arrives via totally independent
channels, and in which no relational information concerning successive stimuli
is involved. According to this model, a "decision function" converts
a difference between central arrival times of two sensory signals into a
temporal order judgment. The psychometric function for order is regarded as a
distribution function and is rep resented additively in terms of the central
arrival latencies and
PERCEPTUAL ORGANIZATION AND COGNITION
the decision function. As the authors point out, the
assumption of channel independence is critical to the model, so that when there
is any interaction between input channels the model
cannot be applied.
4.2. Perception of Rhythm
4.2.1.
Subjective Rhythmic Grouping. When asked to ex ecute a repetitive response
sequence, such as tapping, most individuals will perform the task at a
characteristic rate, termed by Fraisse (1982) the "spontaneous
tempo." This rate is most commonly around 600 msec, though large
individual differences have been found, with values roughly in the range of
200-1400 msec (Fraisse, Pichot, & Clarouin, 1949). Individuals are gen.
erally very consistent in their rates of responding, both within and across
trials.
The
rate at which a sequence of auditory or visual events appears most natural to
the observer is termed by Fraisse (1982) the "preferred tempo." Most
frequently this rate has been de termined to be around 600 msec, and
individuals appear very consistent in their preferences (Fraisse,1982). In a related study, Handel and Oshinsky (1981)
presented subjects with poly rhythms consisting of two conflicting pulse trains
and asked them to tap along with each pattern to indicate the beat (i.e., to
pick one of the two pulse trains as the more salient). An interelement timing
"window" of roughly 200-800 msec for the choice of pulse trains was
observed; only rarely did subjects pick a pulse train that lay outside this
temporal range (see also Handel, 1984).
A
sequence of identical sounds that occur at regular intervals will appear to the
observer as grouped into subsequences, each consisting of an accented element
followed by one or more un accented elements (Bolton,
1894; Woodrow, 1909). Subjective grouping of this nature occurs at presentation
rates ranging from about ten per second to one every 2 sec (Fraisse, 1956,
1982; Vos,1973). This range of temporal values
correlates well with the distribution of melodic tempos in western traditional
music. It is also interesting to note that tempo, measured by the number of
consecutive notes per unit time, appears to be distributed within this temporal
range in the music of widely divergent cultures (Figure 32.41).
4.2.2. Grouping by Temporal Proximity. The division
of a sequence of elements into subsequences is readily achieved by increasing
the size of temporal gap following the last element of each subsequence. Such
grouping has a number of conse quences. Povel and Okkerman (1981) presented
subjects with sequences of pure tones of identical frequency, amplitude,-and
duration but separated by two alternating time intervals. Such sequences were
perceived as repeating groups of two tones each. When the alternating intervals
differed by roughly 5-10%, the first tone of each group was heard as accented. When
this dif ference was increased, the accent was heard instead as on the second
tone of each group, and the latter accent appeared stronger. Under such
conditions, when subjects were asked to adjust the amplitude of the first tone
so that the two tones in the group sounded equally loud, they increased this
amplitude by roughly 4 dB. This effect was however produced only when the
within-group interval was no longer than roughly 250 cosec.
Grouping
by temporal proximity has been shown to exert a strong influence on the
perception of pitch patterns. Handel (1973) investigated the identification of
repeating auditory pat terns that consisted of dichotomous elements differing
in pitch.
Figure 32.41.

Relative frequencies of occurrence of tempos in the songs of
two cultures that diverge extremely in their average tempos. Note that
the shapes of the two distributions are remarkable similar, and that the total
range covered is not much larger than the range over which spontaneous rhythmic
groupings are formed. Data from Kolinski (1959). (From
D. Deutsch, The psychology of music, in E. C. Carterette & M. P. Friedman
(Eds.), Handbook of perception (Vol. 10), Academic Press, Inc., 1978. Reprinted with permission.)
patible segmentation (e.g., an eight-element pattern
segmented into groups of two) produced excellent performance, whereas
incompatible segmentation (e.g., an eight-element pattern segmented into groups
of three) produced poor performance (see also Handel & Yoder, 1975). Dowling
(1973c) presented five tone sequences that were separated by pauses, followed
by single five-tone sequence for recognition. Performance wa
superior when the sequence to be recognized had been present in a single
temporal segment than when it had not. Further, D. Deutsch (1980b) investigated
recall of hierarchically structured tonal sequences. These were segmented by
pauses, and it was found that, when the pauses were in accordance wit tonal
structure, performance levels were high. However, when the pauses conflicted
with tonal structure, performance level dropped considerably. Subjects were
therefore shown to b grouping the sequences on the basis of temporal proximity
rather than tonal structure when the two were placed in conflict. Such results
parallel those obtained by others on recall of strings o verbal materials. When
such strings are temporally segmented recall tends to be in accordance with
their temporal grouping and this effect can be so strong as to mask grouping on
th basis of meaning (Bower & Winzenz, 1969; McLean
& Gregg 1967; Mueller & Schumann, 1894).
When elements are grouped by temporal proximity, eas of processing may
differ depending on the location of element within the group. Divenyi and Hirsh
(1978) presented subject with rapid sequential patterns consisting of three
tones. tones within each pattern could occur in any of
six permutation, and subjects were asked to identify on each trial which pe
mutation had been presented. These three-tone patterns we embedded as
subsequences in longer sequences consisting seven or eight tones. It was found
that identification performan was enhanced when the background and test
patterns were different frequency ranges. In addition, performance leve varied
considerably depending on where in the sequence the test pattern was placed. Highest
performance levels occurred when the test pattern was at the end of the
sequence. Performance levels were relatively high when the test pattern was at
the beginning of the sequence; however, they were close to chance when the test
pattern occurred in the middle of the sequence. Thus both frequency separation
and temporal separation were found to reduce interference from the background
tones. This may be related to an early finding that a single tone embedded in a
sequence was particularly salient when it was the highest or the lowest in
frequency, or when it was in the first or last ' temporal position (Ortmann,
1926).
4.2.3. Grouping by Accent. A second way in which a se quence of elements may be subdivided is by the imposition of accents. An element is perceived as accented when it is marked for attention in some fashion. For example, it might differ from other elements in loudness, in pitch, or in timbre. In general, accented elements combine with adjacent elements to form groupings, and they also combine with each other to form groupings at higher structural levels (see Section 4.2.6).
4.2.4.
Grouping by Other Principles. As described
in detail in Section 1.3, the division of sequences into subsequences has also
been demonstrated along several other lines. For example, there is a strong tendency to group together
sequentially pre sented elements that are proximal in pitch (Bregman, 1978; Bregman
& Campbell, 1971; Dowling, 1973a; Van Noorden, 1975). Further, when
adjacent elements in a sequence combine to form unidirectional pitch patterns,
they are likely to be per ceived as a group. This follows the principle of good
continuation (Bregman & Dannenbring,1973; Divenyi
& Hirsh, 1974; Nick erson & Freeman, 1974; Van Noorden, 1975; R. M.
Warren & Byrnes, 1975). Elements are also perceptually grouped by sim
ilarity of sound quality (R. M. Warren, 1974a; R. M. Warren, Obusek, Farmer,
& R. P. Warren, 1969) and by amplitude a
(Dowling, 1973a; Van Noorden, 1975). Repetition of a subsesquence
within a sequence induces the listener to group the elements of the subsequence
together. This is true even if the ' repetition is at an abstract level; for
example, if a sequence of d pitches is repeated in transposed form (D. Deutsch
& Feroe, 1981; Simon & Sumner, 1968).
4.2.5. The Run Principle and the Gap Principle. Garner s and his associates have examined perceptual organization of e temporal patterns using the following paradigm. A basic pattern r consisting of dichotomous elements (such as a high tone and a low tone) was repeated continuously without pause. This basic pattern thus gave rise to as many specific patterns with different , starting points as were events within the patterns. Thus for example the pattern X X X 0 X 0 X 0 could alternatively be e described as XXOXOXOX, or asXOXOXOXX, and so on (Garner, 1974). The issue of interest was which of these specific patterns the listener would tend to perceive.
Royer and Garner (1966) employed patterns consisting of eight events and
found that the number of specific patterns s perceived differed from one basic
pattern to another, and further runs at both ends of the pattern (such as
2115 or 4113).
The
that the difficulty in pattern perception increased with the
, number of perceived alternatives. Another finding was that r- patterns
beginning or ending with the longest run were always re preferred. Later, Royer
and Garner (1970) provided further of clarification of these organizing
principles, using patterns of nine events. When
these patterns were described in terms of in run lengths (e.g., the pattern 2115
was either X X 0 X 0 O 0 is 0 0 or 0 0 X 0 X X X X X) it was found that the
most preferred e organizations were those that provided the best balance, with organizing
principle was temporal progression of run length (such as 5211 or 1134). Furthermore,
when a specific pattern was preferred, its temporal reversal was also preferred
(e.g., 5211 and 1125). This was true both when the specific patterns and their
reversals came from the same basic pattern and when they came from different
basic patterns.
To
analyze these principles further, Preusser, Garner, and Gottwald (1970)
constructed patterns of events of a single type interleaved with gaps. A
two-element pattern could thus be described as a composite of the two such
one-element patterns. For example, the pattern X X X O O X O O O can be
described as the composite of XXX"X"'and"'00'OOO.Two principles
were found to operate for such patterns: Preferred organizations either began
with the longest run or ended with the longest gap. Further, when the run and
the gap principles were placed in conflict, the gap principle dominated in
deter mining which pattern was perceived. It was further noted that, with
two-element patterns, subjects exhibited strong preferences for one element to
serve as "figure" and the other element as "ground"; thus
preferences for two-element patterns could be interpreted in terms of their
associated one-element patterns (Garner, 1974).
An
effect of presentation rate was also noted in this series of experiments
(Garner & Gottwald, 1968). At slow rates, pat terns starting at
nonpreferred points were considerably more difficult to process than patterns
starting at preferred points. This difference disappeared, however, at high
presentation rates (Figure 32.42). The difficulty in processing patterns at
slow rates was hypothesized to be due to an interference effect pro duced by
verbal encoding of the patterns.
Figure 32.42.

Use of the preferred description of two-element
temporal patterns, as a function of presentation rate. Patterns
started at the beginning of either the preferred description or a nonpreferred
description. For patterns starting at the preferred point, use of the preferred
description was highest at the slowest presentation rate and gradually declined
as the presentation rate increased. For patterns starting at the nonpreferred
point, use of the preferred description was lowest at the slowest presentation
rate and rose as the presentation rate increased. (From W. R.
Garner, The processing of information and structure, Lawrence Erlbaum
Associates, 1974. Reprinted with permission.)
PERCEPTUAL ORGANIZATION AND COGNITION
4.2.6. Rhythmic Hierarchies.
When a listener
sponta neously groups a sequence of regularly recurring events into subsequences,
such organization may occur simultaneously at more than one structural level
(Vos, 1973; Woodrow, 1951). For example, the listener may perceive groups of
four elements with the major accent on the first and a minor accent on the
third. Such spontaneous organization indicates that the system underlying
perception of rhythm is hierarchical in nature.
An
experiment by Perkins (1974) provides further evidence for this view. Subjects
were required to estimate the number of taps occurring in sequences in which
the first of every four taps was stressed, and the first of every 16 taps was
doubly stressed. Errors differed from correct responses-more often by multiples
of four and 16 than by adjacent numbers. Perkins concluded that the subjects
were structuring the sequences hierarchically with reference to the imposed
accents. A related study was performed by Sturges and Martin (1974). Here, subjects
were presented with continuously repeating sequences of 14 or 16 dichotomous
elements. These were patterns of seven or eight events that either repeated
exactly or were altered slightly on repetition. The subjects were asked to
recognize the sequences that contained exact repetitions. Patterns that exhibited
a simple hierarchical structure were better recognized than those that did not.
Further, eight-event patterns that were hierarchically structured were better
recognized than seven event patterns, even though the former patterns contained
more events.
Other
evidence derives from experiments in which subjects were required to generate
temporally patterned sequences. Performance levels on such tasks vary
substantially depending on the type of pattern to be produced. Isochronous
sequences are generally produced at a very high degree of accuracy (N. R.
Bartlett & S. C. Bartlett, 1959; Michon, 1967; M. Treisman, 1963; Wagner,
1971; Wing & Kristofferson, 1973a, 1973b). However, irregular sequences are
typically produced only poorly, with subjects tending to generate interresponse
intervals that are either approximately identical or stand roughly in a ratio
of 2:1 or 3:1 (Fraisse, 1982; Montpellier, 1935). This again is evidence for
hierarchical structuring of temporal relationships. Related evidence comes from
Sternberg, Knoll, and Zukofsky (1982). When highly trained musicians were asked
to subdivide a time interval repeatedly so as to produce a given fraction, high
performance accuracy was achieved for a division of one half, with poor
performance accuracy for other divisions, such as one-eighth, one-seventh, and
one-sixth. This is evidence for the subdivision of time intervals into simple
fractions involving small integers.
Further
supporting evidence for hierarchical structuring of temporal patterns comes
from Summers (1975). Subjects were presented with
continuously cycling sequences of nine lights, and they learned to respond to
these by pressing nine keys. Later, they learned to produce this pattern with
one of two temporal structures. For one group of subjects, this structure consisted
of the repetitive presentation of two short intervals followed by a long
interval. For another group of subjects, this consisted instead of two long
intervals followed by a short in terval. Later still, the subjects were told to
respond as rapidly as possible. Under these conditions, the
"short-short-long group" maintained the acquired temporal pattern in
their responses. However, for the "long-long-short group" the
original temporal pattern gradually disappeared. Since the short-short-long pat
tern had a simple hierarchical description but the long-long ahort pattern had
not, this result may be explained by assuming that the subjects were invoking
hierarchical structure in per forming the task.
Figure 32.43.

Distortion of duration ratios as found in reproduction of simple temporal
patterns. Duration ratios of the stimuli are indicated by arrows pointing to
the abscissa displaying the ratio continuum. The distortion, averaged over
subjects, is indicated by the endpoints of the horizontal arrows. (From D. ).
Povel, Internal
representation of simple temporal patterns, Journal of Experimental
Psychology: Human Perception and Performance, 7. Copyright 1981 by
American Psychological Association. Reprinted with
permission.)
Povel (1981) has studied the reproduction of
repeating temporal patterns consisting of two or more intervals whose durations
stood in various ratios to each other. He obtained substantial differences in
reproduction accuracy depending on these ratios. When simple patterns were
presented, those with durations standing in a ratio of 2:1 were accurately
reproduced; those with durations standing in other ratios were reproduced less
well. Systematic deviations in performance were found, which tended toward a
ratio of 2:1 (Figure 32.43). When more complex patterns were presented, those
that could be simply described in terms of a hierarchical model (to be
described later this section) were well reproduced, and those that could not be
so described were poorly reproduced. Povel concluded that the accurate
production of temporal patterns requires that these be internally represented
in accordance with his model.
On Povel's model, the first step in processing a
temporal sequence consists of segmenting it into equal intervals (which he
termed beat intervals) bordered by events. As a second step,
the beat
intervals may themselves be divided into equal intervals of two, three, or more
units. These smaller intervals may in turn be subdivided into equal intervals,
and so on. An illustration of this model is shown on Figure 32.44.
Povel's
model is based on the formal description of rhythm in tonal music. As stated by
Westergaard (1975), time in tonal music is best conceived of in terms of a set
of equally spaced reference points, or beats. The time span between two primary
beats is called a measure. The measure is itself divided into two or more equal
time spans, bordered by secondary beats. The number of secondary beats dividing
the measure defines the meter. For example, one secondary beat dividing the
measure into equal parts produces duple meter, and two secondary beats produce
triple meter. These smaller time spans may themselves be divided into equal
parts; and so on. The symbols for tones and rests in tonal music were developed
to reflect this hierarchical scheme. Thus -denotes a whole note, J a half note,
J a quarter note, J' an eighth note, ~ a sixteenth note, and so on. It appears
reasonable to assume that the hierarchical structure of rhythm in tonal music
has evolved to capitalize on the char acteristics of our processing mechanisms
(see also Cooper & L. B. Meyer, 1960; Lerdahl & Jackendoff, 1983a;
Yeston, 1976).
An important aspect of the perception of rhythmic
patterns is generalization across tempo. Just as the equivalence of a
transposed melody will be recognized provided that the frequency ratios formed
by successive tones are preserved, so will the equivalence of a rhythm be
recognized in the face of altered presentation
rates provided that the ratios between successive temporal elements are
preserved. Thus for example the pattern (in msec) 400-300-100-800 will
be perceived as equivalent to the
pattern 300-225-75-600. (Such generalization only holds true within a range of
presentation rates; at very fast rates the elements fuse to produce
timbral patterns; and at very slow rates
temporal relationships are not securely perceived, as de scribed in Section
4.2.1.) This invariance may be readily explained on a model of rhythmic
processing that assumes that we encode relationships between temporal values
across different hierarchical levels (Note 19).
Figure 32.44.

Proposed encoding of a temporal sequence. The sequence is segmented into equal intervals (called beat intervals) bordered by events. These intervals may themselves be divided into equal intervals, and so on. The model is based on the formal description of intervals in tonal music. (From D. J. Povel, Internal representation of simple temporal patterns, Journa; of Experimental Psychology: Human Perception and Performance, 7. Copyright 1981 by American Psychological Association. Reprinted with permission.
Given
that hierarchical structures are invoked in processing single rhythmic
patterns, we may ask what happens when two patterns are processed
simultaneously. To take the simplest case, we may ask whether two parallel
isochronous sequences may be processed independently, or whether the observer
synthesizes such sequences into a single hierarchical configuration. From one
standpoint, since isochronous sequences are descriptively very simple, we might
expect that the observer should find no difficulty in processing two in
parallel. On the other hand, the experimental results showing the importance of
hi erarchical relationships in the processing of single sequences indicate that
ease of processing of parallel sequences might also depend on the relationships
between the temporal elements involved.
Let us
take the simplest case, that of a 3-against-2 poly rhythm. If this pattern is
repeated every 1200 cosec, it may be described as consisting of a sequence of
three 400-cosec intervals together with a sequence of two 600-cosec intervals. An
alternative description may be advanced in terms of a hierarchy such as shown
in Figure 32.45(a). Here the 1200-cosec time span is first divided into three
400-cosec segments, and these are each divided into two 200-cosec segments. When
the pattern (R/L, -) (R, L) (R/L, -) is associated with the lowest-level
structure, a 3-against-2 polyrhythm is produced.
If
this hierarchical model is correct, then ease of production of polyrhythms
should depend on the complexities of their hi
erarchical representations. A 5-against-4 polyrhyrthm would thus be represented
as in Figure 32.45(b). This representation is considerably more complex than
the representation for the 3-against-2 polyrhythm, both in terms of the numbers
of struc tures involved and in terms of the sizes of these structures.
An
experiment was performed to test this hypothesis (D. Deutsch, 1983a). Two pulse
trains were presented in parallel through earphones, one to each ear. The
subjects were asked to tap with the right forefinger in synchrony with the
pulse train delivered to the right ear, and with the
left forefinger in synchrony with the pulse train delivered to the left ear. All
pulse trains were isochronous, and the relative durations of the intervals
associated with the right and left pulse trains were systematically varied.
Figure 32.45.

Representation of polyrhythms in terms of hierarchies. (a) A 3-against-2 polyrhythm; (b) a 5-against-4 polyrhythm.
The representation in (a) is considerably more complex than that in (b). This accords with the finding that accuracy in generating
polyrhythms declines with an increase in the complexity of their
representation. (From D. Deutsch, The generation of two
isochronous sequences in parallel, Perception and Psychophysics, 1983,
34. Reprinted with permission.)
A 1200
msec interval between pulse onsets served as the base interval. This interval
was divided by 1 (resulting in a 1200-cosec onset-to-onset interval), by 2
(resulting in a 600-cosec onset-to-onset interval), by 3 (resulting in a 400
msec onset-to-onset interval), by 4 (resulting in a 300-cosec onset-to-onset
interval), and by 5 (resulting in a 240-cosec onset to-onset interval). Thus
when these pulse trains were simultaneously presented, both simple rhythms (1
against 2,1 against 3, and so on) and polyrhythms (2
against 3, 2 against 5, 3 against 4, 3 against 5, and 4 against 5) were
produced.
It was
found that, for each rate of tapping with one hand, accuracy was substantially
dependent on the rate at which the other hand was tapping. When simple rhythms
were produced, standard deviations were very low. They were significantly
higher for the 3-against-2 polyrhythm, higher still for the 5 against-2
polyrhythm, and yet still higher for the more complex polyrhythms. It was
concluded that, in processing two isochronous sequences in parallel, subjects
combined these sequences to produce a single hierarchical configuration, and
that accuracy in performance was a function of the complexity of this
hierarchy.
S. SUMMARY
In
this chapter, auditory pattern recognition has been examined from several
points of view. First explored were the ways in which listeners group together
components of a complex sound spectrum so that they perceive either a single
sound or multiple simultaneous sounds. Next explored were the principles by
which listeners form groupings from a series of single sounds that occur in
rapid succession. The role of selective attention in the formation of auditory
groupings was discussed in this context.
The
second part of the chapter was concerned with the principles whereby auditory
shapes are analyzed by listeners so as to give rise to perceptual equivalences
and similarities. The evidence for low-level perceptual features based on
octave equivalence and interval equivalence was discussed, as was the evidence
for contour as a perceptual feature. The ways in which such low-level features
are combined at higher levels were then
explored
in detail. The encoding of pitch patterns in the form of hierarchies was
examined, as was the involvement of short term and long-term memory in the
recognition of pitch patterns.
The third part of the chapter concerned the
principles underlying recognition of timbre or sound quality. Emphasis was placed on
the time-variant properties of the signal, in contrast with earlier approaches
that attempted to define timbre in terms of the sound spectrum in steady state.
Multidimensional models of timbre perception that emphasize time-variant
characteristics were described.
The
fourth part of the chapter examined the perception of temporal relationships in
patterns of sound. Evidence concerning perception of the order of two or more
sound events was reviewed, as were theories of how such order perception might
be accomplished. Next explored were the ways in which listeners
group patterns of sound in time. Finally, the evidence for hierarchical
encoding of rhythmic patterns was examined.
It is
clear from the numerous examples presented in this chapter that the auditory
system is capable of a very high level of information abstraction, provided
that it is given appropriate input. Knowledge of the properties of this system
should therefore prove of considerable value to those wishing to exploit its capabilities
in future technological developments.
1. The
view taken here is that the concept of "unconscious inference" serves
as a useful heuristic to guide the experimenter in the study of grouping
mechanisms, and that this is also true of principles of figural goodness. However,
it is not assumed that these concepts serve
as "explanations" of grouping phenomena in any fundamental sense.
Rather, it is expected that the behavioral phenomena de scribed here will
ultimately be explained in terms of underlying physiological mechanisms. Where
possible, relevant neurophys iological findings are presented here; however,
the neurophysiological bases of most of the phenomena described in this chapter
are presently unknown.
2. A
periodic sound can be described in terms of its sinusoidal frequency components, or partials. When the
frequencies of the upper partials are integral multiples of the
frequency of the lowest partial (or fundamental), the sound is termed harmonic.
When this is not the case, the sound is termed nonharmonic. Sounds
are termed fused when they appear as single sounds and are termed unfused
when they appear as several simultaneous sounds.
3.
Kubovy, M. The ear is not phase deaf. Paper presented at the
nineteenth annual meeting of the Psychonomic Society, San An
tonio, November, 9-11, 1978.
4.
Kubovy, M. The sound of silence: A new pitch-segregation phenomenon.
Paper presented at the seventeenth annual meeting of the Psychonomic Society,
5. A
major scale consists of seven tones per octave. When a scale is presented in
ascending order, the intervals formed by successive tones are two major
seconds, a minor second, three major seconds, and a minor second. (See Figure 32.24.) when the scale
begins on the note C it is called the C-major scale. When it begins on the note
D it is called the D-major scale, and so on. Any scale consisting of tones
related by this pattern of intervals is called a diatonic scale.
6. A
contrapuntal pattern results when two or more melodic lines are presented in
parallel.
7.
First-order localization cues are provided by differences in amplitude and in
time of arrival of the sounds at the two ears.
8. The
fact that following the nondominant ear required a larger amplitude difference
between the ears in the second experiment compared with the first simply reflects
the fact that different subjects were employed in these two experiments, as
large differences
exist between
subjects on this measure.
9. Reverberation times in enclosed spaces are of the
order of seconds, which is consistent with the time courses observed here.
10. Erickson, R. LOOPS, art informal timbre
experiment. Unpublished manuscript, Center for Music
Experiment,
11.
Neurones in the auditory system have been found that exhibit peaks of
sensitivity at octave multiples (Evans, 1974; Suga & Jen, 1976). Such
neurones would mediate the hypothesized convergence of input based on the octave relation.
12. Neurones in the auditory system have been found
whose firing is facilitated when certain harmonically related tones are presented
together (Suga, O'Neill, & Manabe, 1979). The characteristics of these
units are as hypothesized to exist at one stage along the channel mediating transposition.
13.
The chromatic scale consists of tones that are spaced in semitone increments. There
are 12 such tones within the octave.
14. For a description of lateral inhibitory networks see Ratliff (1965).
15. A harmonic progression may be broadly defined as a series of chords that
bear certain relationships to each other. For a detailed description see Forte
(1974).
16. A formant is a fixed frequency region in which the partials of a
tone are prominent, regardless of the frequency of the fundamental. 17. Plomp,
R., & Steeneken, H. J. M. Pitch versus timbre. Paper presented at
the Seventh International Congress on Acoustics, Bu dapest, 1971.
18.
The critical band is that band of frequencies within which the loudness of a
band of noise of constant sound pressure is independent of bandwidth.
19.
Deutsch, D., & Feroe, J. The internal representation
of rhythmic patterns. In preparation.
REFERENCES
Allen, D. Octave discriminability of musical and non-musical subjects. Psychonomic
Science, 1967, 7, 421-422.
Attneave, F, & Olson, R. K. Pitch as a medium: A new approach to psychophysical
scaling. Journal of Psychology, 1971,84,147-165. Babbitt, M.
Twelve-tone invariants as compositional determinants. The Musical
Quarterly, 1960, 46, 246-259.
Babbitt,
M. The structure and function of musical theory. College
Music Symposium,
1965,5,10-21.
Babkoff,
H. Diotic temporal interactions: Fusion and temporal order. Perception and
Psychophysics, 1975, 18, 267-272.
Bachem,
A. Note on Neu's review of the literature on absolute pitch. Psychological
Bulletin, 1948, 45, 161-162.
Bachem, A. Time factors in relative and absolute pitch; Studies in
psychology. Journal of the Acoustical Society of
Baird,
J. W. Memory for absolute pitch: Studies in psychology. In Titchener
commemorative volume. Worchester, 1917.
Bartlett,
J. C., & Dowling, W. J. Recognition of transposed melodies: A key-distance
effect in developmental perspective. Journal of Experimental Psychology:
Human Perception and Performance,
1980,
6,501-515.
Bartlett, N. R., & Bartlett, S. C. Synchronization of a motor
response with an anticipated sensory event. Psychological Review, 1959, 66, 203-218.
Berger,
K. W. Some factors in the recognition of timbre. Journal of the Acoustical
Society of
Bharucha, J., & Krumhansl, C. L. The representation of
harmonic structure in music: Hierarchies of stability as a function of context.
Cognition, 1983, 13, 63-102.
Bjork, R. A. All-or-none subprocesses in the learning of complex se
quences. Journal of Mathematical Psychology, 1968, 5, 182-195. Blackwell,
H. R., & Schlosberg, H. Octave generalization, pitch discrimination, and
loudness thresholds in.the white rat. Journal of Experimental Psychology, 1942,
33, 407-419.
Bower,
G. H. A selective review of organizational factors in memory.
In E. Tulving & W. Donaldson (Eds.), Organization of
memory.
Bower, G. H., & Winzenz, D. Group structure, coding and memory for
digit series. Journal of Experimental Psychology Monographs, 1969,
80 (2, Pt. 2), 1-17.
Bregman, A. S. The formation of auditory streams.
In J. Requin (Ed.), Attention and performance (Vol.
7). Hillsdale, NJ.: Erlbaum, 1978. Bregman,
A. S. Asking the "what for" question in auditory perception.
In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual organization.
Bregman, A. S., &
Bregman, A. S., & Dannenbring, G. L. The
effect of continuity on auditory stream segregation. Perception
and Psychophysics, 1973, 13, 308 312.
Bregman, A. S., & Pinker, S. Auditory streaming
and the building of timbre. Canadian Journal of Psychology, 1978,
32, 20-31. Bregman, A. S., & Rudnicky, A. I. Auditory segregation: Stream
or streams? Journal of Experimental Psychology: Human Perception and
Performance, 1975, 104, 263-267.
Broadbent,
D. E. The role of auditory localization in attention and
memory span. Journal of the Acoustical Society of
Broadbent, D. E. Perception and communication.
Broadbent, D. E., & Ladefoged, P. Auditory perception of temporal order. Journal of the Acoustical Society of
Browne,
R. Review of The structure of atonal music by A. Forte. Journal of
Music Theory, 1974, 18, 390-415.
Burns,
E. M. In search of the shruti. Journal
of the Acoustical Society of
Burns, E. M. Octave adjustment by non-western
musicians. Journal of the Acoustical Society
of
Unpublished doctoral
thesis,
Burns,
E. M., & Ward, W. D. Categorical perception-Phenomenon or epiphenomenon:
Evidence from experiments on the perception of melodic musical intervals. Journal
of the Acoustical Society of
Burns,
E. M., & Ward, W. D. Intervals, scales, and
tuning. In D. Deutsch (Ed.), The psychology
of music.
Cabot, R. C.,
Cherry, E. C. Some experiments on the recognition of speech, with one
and two ears. Journal of the Acoustical Society of
Cherry, E. C., & Taylor, W. K. Some further experiments
upon the recognition of speech, with one and with two ears. Journal of the
Acoustical Society of
Chrisman, R. Identification and correlation of pitch-sets. Journal of Music
Theory, 1971, 15, 58-83.
Cooper, G. W., & Meyer, L. B. The rhythmic structure of music.
words in an unattended channel. Journal
of Experimental Psy chology, 1972, 94, 308-313.
Craig, J. D. The effect of musical training and
cerebral asymmetries in perception of an auditory illusion. Cortex, 1979,
15, 671-677. Dannenbring, G. L. Perceived auditory
continuity with alternately rising and falling frequency transitions. Canadian
Journal of Psychoiogy, . 1976,30,99-114.
Dannenbring, G. L., & Bregman, A. S. Stream
segregation and the '; illusion of overlap. Journal of
Experimental Psychology: Human Perception and Performance, 1976, 2,
544-555.
Dannenbring,
G. L., & Bregman, A. S. Streaming versus fusion of sinusoidal components of
complex tones. Perception and Psychophysica,' 1978, 24 369-376.
"·
de Boer, E. On the "residue" and auditory
pitch perception. In W. I.. Keidel & W. D.
Neff (Eds.), Handbook of sensory physiology (VoT= V/3).
Deutsch, D. Music recognition. Psychological
Review, 1969, 76, 300 307.
Deutsch, D. Tones and numbers: Specificity of interference in shorn term
memory. Science, 1970, 68, 1604-1605. ; -` a Deutsch, D. Effect of
repetition of standard and comparison tones on . '
recognition memory for pitch. Journal of Experimental
Psychology, ~ 1972a,93,156-162.
Deutsch, D. Mapping of interactions in the pitch memory store. Science,' 1972b,175,1020-1022.
Deutsch, D. Octave generalization and tune recognition. Perception
` and Psychophysics, 1972c,11, 411-412.
Deutsch, D. Interference in memory between tones adjacent in the musical
scale. Journal of Experimental Psychology, 1973a, 100, 228-231.
AUDITORY PATTERN RECOGNITION
Deutsch,
D. Octave generalization of specific interference effects in memory for tonal
pitch. Perception and Psychophysics, 1973b, 13, 271-275.
Deutsch, D. An auditory illusion. Nature (
Deutsch, D. Auditory memory. Canadian Journal of
Psychology, 1975a,
29,87-105.
Deutsch, D. Facilitation by repetition in recognition memory for tonal
pitch. Memory and Cognition, 1975b, 3, 263-266.
Deutsch, D. Musical illusions. Scientific
American,1975c, 233,92-104. Deutsch,
D. The organization of short-term memory for a single
acoustic attribute. In D. Deutsch & J. A. Deutsch (Eds.), Short term memory.
Deutsch,
D. Two-channel listening to musical scales. Journal
of the Acoustical Society of
Deutsch, D. Lateralization by frequency in dichotic tonal sequences as a
function of interaural amplitude and time differences. Journal of the Acoustical Society of
Deutsch, D. Memory and attention in music. In M. Critchley & R. A. Henson (Eds.), Music and the brain.
Deutsch, D. Interactive effects in memory for harmonic intervals. Per ception and Psychophysics, 1978b, 24, 7-10.
Deutsch, D. Lateralization by frequency for repeating sequences of
dichotic 400-Hz and 800-Hz tones. Journal
of theAcoustical Society of
Deutsch, D. Octave generalization and melody identification. Perception
and Psychophysics, 1978e, 23, 91-92.
Deutsch, D. The psychology of music. In E. C. Carterette & M. P. Fried man (Eds.), Handbook of
perception (Vol. 10).
Deutsch, D. Binaural integration of melodic patterns. Perception
and Psychophysics, 1979a, 25, 399-405.
Deutsch, D. Octave generalization and the consolidation of melodic
information. Canadian Journal of Psychology, 1979b,
33, 201 204.
Deutsch, D. Ear dominance and sequential interactions. Journal of
the Acoustical Society of
Deutsch, D. The processing of structured and
unstructured tonal sequences. Perception and Psychophysics, 1980b,
28, 381-389. Deutsch, D. The
octave illusion and auditory perceptual integration.
In J. V. Tobias & E. D. Schubert (Eds.), Hearing research and
theory (Vol. 1).
Deutsch, D. The
influence of melodic context on pitch recognition judgment. Perception
and Psychophysics, 1982a, 31, 407-410. Deutsch, D.
The processing of pitch combinations. In D. Deutsch
(Ed.),
The psychology of music.
Deutsch, D. The generation of two isochronous
sequences in parallel. Perception and Psychophysics, 1983a, 34,
331-337.
Deutsch, D. The
octave illusion in relation to handedness and familial handedness background.
Neuropsychologia, 1983b, 21,289-293.
Deutsch, D., & Boulanger, R. C. Octave equivalence and the immediate recall
of pitch sequences. Music Perception, 1984, 2, 40-51. Deutsch,
D., & Feroe, J. Disinhibition in pitch memory. Perception and
Psychophysics, 1975,17,320-324.
Deutsch, D., & Feroe, J. The internal representation of pitch
sequences in tonal music. Psychological Review, 1981, 88, 503-522.
Deutsch, D., & Roll, P. L. Error patterns in delayed pitch comparison as a
function of relational context. Journal of Experimental
Psy chology, 1974, 103, 1027-1034.
Deutsch,
D., & Roll, P. L. Separate "what" and "where" decision
mech anisms in processing a dichotic tonal sequence. Journal of Exper
imental Psychology: Human Perception and Performance, 1976, 2, 23-29.
Deutsch,
J. A., & Deutsch, D. Attention: Some theoretical considerations. Psychological
Review, 1963, 70, 80-90.
Dewar, K. M. Context effects in recognition memory for tones. Unpublished doctoral dissertation,
Divenyi, P. L., & Hirsh, I. J. Discrimination of the silent gap in
two tone sequences of different frequencies. Journal of the
Acoustical Society of
Divenyi, P. L., & Hirsh, I. J. Identification of temporal order in
three tone sequences. Journal of the Acoustical Society of
Divenyi,
P L., & Hirsh,
Divenyi, P. L., & Hirsh,
Dowling,
W. J. Recognition of melodic transformations: Inversion, retrograde, and
retrograde-inversion. Perception and Psychophysics, 1972,12, 417-421.
Dowling,
W. J. The perception of interleaved melodies. Cognitive Psy chology, 1973a, 5, 322-337.
Dowling,
W. J. The 1215-cent octave: Convergence of western and non western data on
pitch scaling. Journal of the Acoustical Society of
Dowling, W. J. Rhythmic groups and subjective chunks
in memory for melodies. Perception and Psychophysics, 1973c, 4, 37-40.
Dowling, W. J. Scaling and contour: Two components of a theory of memory for
melodies. Psychological Review, 1978, 85, 342-354. Dowling,
W. J., & Bartlett, J. C. The importance of
interval informatioi in long-term memory for melodies. Psychmusicology,1981,1, 30-49
Dowling,
W. J., & Fujitani, D. S. Contour, interval, and pitch recognition in memory
for melodies. Journal of the Acoustical Society of
Dowling, W. J., & Hollombe, A. W. The perception of
melodies distorte( by splitting into several octaves:
Effects of increasing proximit, and melodic contour. Perception
and Psychophysics, 1977, 21, 60 64.
Drobisch,
M. W Ober die mathematische bestimmung den musikalischei intervalle, 1846. (Cited by C. A. Ruckmick, A new classification o tonal qualities.
Psychological Review, 1929, 36, 172-180.)
Erickson, R. Sound structure in music.
Estes, W K. An associative basis for coding and organization in
memory In A. W. Melton & E. Martin (Eds.), Coding processes in humai
memory. Washington, D. C.: Winston, 1972.
Evans,
E. F. Neural processes for the detection of acoustic patterns any for sound
localization. In F. 0. Schmitt & F. T. Worden
(Eds.), Th neurosciences, third study program.
Fitzgibbon, P. J., Pollatsek, A., & Thomas, I. B. Detection of
tempora gaps within and between perceptual tonal groups. Perception
an, Psychophysics, 1974, 16, 522-528.
Forte, A. The structure of atonal music.
Forte,
A. Tonal harmony in concept and practice (2nd ed.).
New Yorl Holt, Rinehart, and Winston, 1974.
Fraisse, P Les
structures rhythmiques. Louvain: Editions Universitaire 1956.
Fraisse, P.
Rhythm and tempo. In D. Deutsch (Ed.), The
psychology music. New York: Academic, 1982.
Fraisse, P,
Pichot, P, & Clairouin, G. Les aptitudes rhythmiques. Etuc comparee des
oligophrenes et des enfants normaux. Journal c Psychologie Normal et
Pathologique, 1949, 42, 309-330.
Frances, R. La perception de la musique.
Garner,
W. R. The processing of information and structure.
Garner,
W. R., & Gottwald, R. L. The perception and learning of
tempos patterns. Quarterly Journal of Experimental Psychology, 1968,21 97-109.
Geffen,
G., & Reynolds, N. Pure-tone perception and ear advantages in dichotic
listening. Perception and Psychophysics, 1982, 31, 68
75.
Gengel,
R. W., & Hirsh, I. J. Temporal order: The effect of single versus repeated
presentations, practice, and verbal feedback. Perception
and Psychophysics, 1970, 7,209-211.
Goodglass, H., & Quadfasel, F. A. Language laterality in left-handed
aphasics. Brain, 1954, 77, 521-543.
Gray, J. A., & Wedderburn, A. A. I. Grouping strategies with simul
taneous stimuli. Quarterly Journal of Experimental Psychology,
1960,12,180-184.
Greeno,
J. G., & Simon, H. A. Processes for sequence production. Psy chological
Review, 1974, 81, 187-196.
Gregory, R. L. The intelligent eye.
Grey, J. M. Timbre discrimination in musical patterns. Journal of
the Acoustical Society of
Grey,
J. M., & Gordon, J. W. Perceptual effects of spectral modifications in
musical timbres. Journal of the Acoustical Society of
Grey,
J. M., & Moorer, J. A. Perceptual evaluation of synthesized musical
instrument tones. Journal of the Acoustical Society of
Haas,
H. Uber den einfluss eines Einfachechos auf die Horsamkeit von Sprache. Acustica,
1951, 1, 49-52.
Hall, D. E. Quantitative evaluation of musical scale tunings. American
Journal of Physics, 1974, 42, 543-552.
Hall,
D. E. Musical acoustics: An introduction.
Handel, S. Temporal segmentation of repeating auditory patterns. Journal of
Experimental Psychology, 1973, 101, 46-54.
Handel, S. Using polyrhythms to study rhythm. Music Perception, 1984, 1,465-484.
Handel, S., & Oshinsky, J. S. The
meter of syncopated auditory poly rhythms. Perception and
Psychophysics, 1981, 30, 1-9.
Handel, S., & Yoder, D. The
effects of intensity and rhythm intervals on the perception of auditory and
visual temporal patterns. Quarterly Journal of Experimental
Psychology, 1975, 27, 111-122.
Hanson, A. R., & Riseman, E. M. (Eds.). Computer vision systems.
Harris,
J. D. The decline of pitch discrimination with time. Journal
of Experimental Psychology, 1952, 43, 96-99.
H6caen, H., & de Ajureaguerra, J. Left handedness.
Hdcaen, H., & Piercy, M. Paroxysmal dysphasia and the problem of
cerebral dominance. Journal of Neurology and Neurological Psy chiatry,
L956,19,194-201.
Heise, G. A., & Miller, G. A. An
experimental study of auditory patterns. American Journal of
Psychology, 1951, 64, 68-77.
Helmholtz,
V. H. On the sensations of tones as a physiological basis for the
theory of music (2nd English ed.).
Hirsh, I. J. Auditory perception of temporal order. Journal of the Acoustical Society ofAmerica, 1959, 31, 759-767.
Hirsh, I. J. Temporal order and auditory perception. In H. R. Moskowitz, B. Scharf, & J. C. Stevens (Eds.),
Sensation and measurement.
Hirsh, I. J. Order of events in three sensory modalities. In S. K.
Hirsh, D. H. Eldridge,
Hirsh, I.
J., & Fraisse, P. Simultaneite et succession de stimuli heter ogenes. Annee
Psychologique, 1964, 64, 1-19.
Hirsh, I. J., & Sherrick, C. E. Perceived order in
different sense mo dalities. Journal of Experimental Psychology, 1961,
62, 423-432. Hochberg, J. Organization and the Gestalt
tradition. In E. C. Carterette & M. P. Friedman (Eds.),
Handbook of perception (Vol. 1).
Howe,
H. S. Some combinatorial properties of pitch structures.
Journal of Music Theory, 1965, 4, 45-61.
PERCEPTUAL ORGANIZATION AND COGNITION
Huggins,
A. W. F. Distortion of the temporal pattern of speech: Inter ruption and
alternation. Journal of the Acoustical Society ofAnwrica, 1964,36,1055-1064.
Hulse, S. H., Humpal, J., & Cynx, J. Discrimination and
generalization of rhythmic and arrhythmic sound patterns by European starlings (Sturnus
vulgaris). Music Perception, 1984, 1, 442-464.
Humphreys,
L. F. Generalization as a function of method of reinforce ment. Journal of Experimental Psychology, 1939, 25,361-372. Idson, W. L., & Massaro, D. W. A
bidimensional model of pitch in the
recognition of melodies. Perception and Psychophysics, 1978,
24, 551-565.
Judd, T. Comments on Deutsch's musical scale illusion. Perception
and Psychophysics, 1979, 26, 85-92.
Julesz, B. Foundations of cyclopean perception.
Julesz, B., & Hirsh, I. J. Visual and auditory perception-An essay
of comparison. In E. E. Davis & P. B. Denes (Eds.), Human
commu nication: A unified view.
Kahneman, D. Attention and effort.
Kallman, H. J. Octave equivalence as measured by similarity ratings. Perception
and Psychophysics, 1982, 32, 37-49.
Kallman,
H. J., & Massaro, D. W. Tone chroma is functional in melody recognition. Perception
and Psychophysics, 1979, 26, 32-36. Keele, S. W., &
Neill, W. T. Mechanisms of attention. In E. C. Carterette
& M. P. Friedman (Eds.), Handbook of perception (Vol. 9).
Keiler, A. On some properties of Schenker's pitch derivations. Music
Perception, 1983, 1, 200-228.
Klinke, R., Boerger, G., & Gruber, J. Alteration of afferent, tone-evoked
activity of neurons of the cochlear nucleus following acoustic stim ulation of
the contralateral ear. Journal of the Acoustical Society of
Klinke,
R., Boerger, G., & Gruber, J. The influence of the
frequency relation in dichotic stimulation upon the cochlear nucleus activity.
In R. Plomp & G. F. Smoorenburg (Eds.), Frequency
analysis and periodicity detection in hearing.
Koester,
T. The time error in pitch and loudness discrimination as a
function of time interval and stimulus level. Archives
of Psychology, 1945, 297, entire issue.
Koffka, K. Principles of Gestalt psychology.
Kotovsky,
K., & Simon, H. A. Empirical tests of a theory of human acquisition of
concepts of sequential events. Cognitive Psychology, 1973, 4, 399-424.
Krumhansl, C. L. The psychological representation of
musical pitch in a tonal context. Cognitive Psychology, 1979, 11,
346-374. r Krumhanal, C. L. Perceptual structures
for tonal music. MusicPerception, 1983, 1, 28-62.
Krumhansl, C. L., Bharucha, J. J., & Kessler, E. J. Perceived
harmonic structure of chords in three related musical keys. Journal of Ex-
-' perimental Psychology: Human Perception and Performance, 1982, 8,24-36.
Krumhanal,
C. L., & Kessler, E. J. Tracing the dynamic changes in perceived tonal
organization in a spatial representation of musical keys. Psychological
Review, 1982, 89, 334-368.
Krumhansl,
C. L., & Shepard, R. N. Quantification of the hierarchy of tonal functions
within a diatonic context. Journal ofExperimental Psychology: Human
Perception and Performance, 1979, 5, 579 594.
Kubovy,
M. Concurrent pitch-segregation and the theory of indispensable attributes. In M. Kubovy & J. Pomerantz (Eds.), Perceptual orga
nization.
Kubovy,
M., Cutting, J. E., & McGuire, R. M. Hearing with the third ear: Dichotic
perception of a melody without monaural familiarity cues. Science, 1974, 186,
272-274.
Kubovy, M., & Howard, F. P Persistence of a pitch-segregating echoic
memory. Journal of Experimental Psychology: Human Perception and
Performance, 1976, 2,
531-537.
Kubovy,
M., &
Leeuwenberg, E. L. A
perceptual coding language for visual and auditory patterns. American
Journal of Psychology, 1971, 84, 307-349. Lerdahl,
F., & Jackendoll; R. A generative theory of
tonal music.
Lerdahl, F., & Jackendoff; R. An
overview of hierarchical structure in music. Music Perception, 1983b,
1, 229-252.
Lewin, D. The intervallic content of a
collection of notes. Journal of Music Theory, 1960, 4, 98-101.
Lewin, D. A theory of segmental association
in twelve-tone music. Per spectives of New Music, 1962, 1,
89-116.
Lewis, J. L. Semantic processing of unattended messages using dichotic
listening. Journal of Experimental Psychology, 1970, 85, 225-228. Locke, S., & Kellar,
L. Categorical perception in a non-linguistic mode. Cortex, 1973,
9, 355-368.
Longuet-Higgins, H. C. Letter to a musical friend. Music Review, 1962a, 23,244-248.
Longuet-Higgins,
H. C. Second letter to a musical friend. Music Review,
1962b, 23, 271-280.
Longuet-Higgins,
H. C. The perception of music. Interdisciplinary
Science Reviews, 1978, 3, 148-156.
Mach, E. The analysis of sensations and
the relation of the physical to the psychical (C. M. Williams, trans.; W.
Waterlow, review and supplement).
Mathes, R. C., & Miller, R. L. Phase effects in monaural perception.
Journal of the Acoustical Society of
McClurkin, R. H., & Hall, J. Pitch and timbre in a two-tone dichotic
auditory illusion. Journal of the Acoustical Society of
McLean,
R. S., & Gregg, L. W. Effects of induced chunking on temporal aspects of
serial retention. Journal of Experimental Psychology, 1967, 74, 455-459.
McNally, K. A., & Handel, S. Effects of element composition on
streaming and the ordering of repeating sequences. Journal of
Experimental Psychology: Human Perception and Performance, 1977, 3, 451
460.
Meyer, L. B. Emotion and meaning in music.
Meyer, L. B. Music, the arts and ideas.
Meyer,
L. B.-Explaining music: Essays and explorations.
Meyer,
M. On the attributes of the sensations. Psychological
Review, 1904, 11, 83-103.
Meyer,
M. Review of G. Revesz, "Zur Grundleguncy der Tonpsychologie." Psychological
Bulletin, 1914, 11, 349-352.
Michon,
J. A. Magnitude scaling of short durations with closely spaced stimuli. Psychonomic
Science, 1967, 9, 359-360.
Miller, G. A., & Heise, G. A. The
trill threshold. Journal of the Acoustical Society of
Miller, G. A., & Licklider, J. C. R. The
intelligibility of interrupted speech. Journal of the Acoustical
Society of
Miller, J. R., & Carterette, E. C. Perceptual
space for musical structures. Journal of the Acoustical Society of
Milner, B., Branch, C., & Rasmussen, T. Evidence for bilateral
speech representation in some
nonrighthanders. Transactions
of the American Neurological Association, 1966, 91, 306,308.
Montpellier, G. de. Les alterations morphologiques des mouvements rapides.
Louvain, France: Institut Superieur de Philosophie,1935. Mueller, G. E.,
& Schumann, F. Experimentklle Beitrange zur Untersuchung des Gedachtnisses.
Zeitschrift fur Psychologie un Phy siologie der Sinnesorgane, 1894,
6, 81-190; 257-339.
Nabelek, I. V., Nabelek, A. K., & Hirsh, I. J. Pitch of sound bursts
with continuous or discontinuous change of frequency. Journal of
the Acoustical Society of
Narmour,
E. Beyond Schenkerism.
Narmour, E. Some major theoretical problems
concerning the concept of hierarchy in the analysis of tonal music. Music Perception, 1983,
1,129-199.
Neisser, U. Cognitive psychology.
Ortmann,
O. On the melodic relativity of tones. Psychological
Mono graphs, 1926,35
(Whole No. 162).
Patterson, J. H., & Green, D. M. Discrimination of transient signals
having identical energy spectra. Journal of the
Acoustical Society of
Perkins, D.N. Coding position in a sequence
by rhythmic grouping. Memory and Cognition, 1974,
2, 219-223.
Perle,
G. Serial composition and atonality (3rd ed.).
Perle, G. Twelve-tone tonality.
Piston, W. Harmony (2nd ed.).
Plomp,
R. Timbre as a multidimensional attribute of complex tones. In
R. Plomp & G. F. Smoorenburg (Eds.), Frequency analysis and periodicity
detection in hearing.
Plomp,
R. Perception of sound signals
at low signal-to-noise ratios. In D. J. Getty & J. H.
Howard, Jr. (Eds.), Auditory and visual pattern recognition.
Plomp,
R., & Mimpen, A. M. The ear as frequency analyzer II.
Journal of the Acoustical Society of
Plomp,
R., & Steeneken, H. J. M. Effect of phase on the timbre of complex tones. Journal
of the Acoustical Society of
Plomp, R., Wagnaar, W. A., & Mimpen, A. M. Musical interval
recognition with simultaneous tones. Acustica, 1973, 29,
101-109.
Povel, D. J. Internal representation of simple temporal patterns. Journal of
Experimental Psychology: Human Perception and Performance, 1981, 7,3-18.
Povel, D. J., & Okkerman, H. Accents in equitone sequences. Perception
and Psychophysics, 1981, 30, 565-572.
Preusser,
D., Garner, W. R., & Gottwald, R. L. Perceptual organization of two-element temporal patterns as a
function of their component one-element patterns. American Journal of
Psychology, 1970, 83, 151-170.
Rakowski,
A. Tuning of isolated musical intervals. Journal of the
Acoustical Society of
Rameau,
J. P. Treatise on harmony (P. Gosset, trans.).
Rasch,
R. A. The perception of simultaneous notes such as in
polyphonic music. Acustica, 1978, 40, 1-72.
Ratliff,
F. Mach bands: Quantitative studies of neural networks in the retina.
Restle,
F. Theory of serial pattern learning: Structural trees. Psycho
logical Review, 1970, 77, 481-495.
Restle, F., & Brown, E. Organization of serial pattern learning. In G. H. Bower
(Ed.), The psychology of learning and
motivation (Vol. 4).
Revesz, G. Zur Grundleguncy der Tonpsychologie.
Risset, J. C. Musical acoustics. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of
perception (Vol. 4).
Risset, J. C., & Wessel, D. L. Exploration of timbre by analysis and
synthesis. In D. Deutsch (Ed.), Psychology
of music.
Royer, F. L., & Garner, W. R. Response uncertainty and perceptual
difficulty of auditory temporal patterns. Perception
and Psycho physics, 1966, 1, 41-47.
Royer, F. L., & Garner, W. R. Perceptual organization of
nine-element auditory temporal patterns. Perception
and Psychophysics, 1970, 7,115-120.
Ruckmick,
C. A. A new classification of tonal qualities. Psychological
Review, 1929, 36, 172-180.
Saldanha,
E. L., & Corso, J. F. Timbre cues for the recognition of musical
instruments. Journal of the Acoustical Society of
Salzer, F. Structural hearing.
Schackford,
C. Some aspects of perception. I. Journal of Music
Theory, 1961, 5, 162-202.
Schackford,
C. Some aspects of perception. II. Journal of Music
Theory,
1962,6,66-90.
Schaeffer,
P. Traite des objets musicaux.
Schenker, H.
Neus musikalische theorien un phantasien: Der freie satz.
Schenker,
H. Harmony. (0. Jonas, Ed.; E. M. Borgese, trans.)
Schoenberg, A. Style and idea.
Schroeder,
M. R. Models of hearing. Proceedings of the Institute of
Elec trical and Electronics Engineers, 1975, 63, 1332-1350.
Schubert,
E. D., & Parker, C. D. Addition to Cherry's findings on switching speech
between two ears. Journal of the Acoustical Society of
Shepard, R. N. Structural representations of musical pitch. In D. Deutsch
(Ed.), The psychology of music.