The Perception of Musical
R. A.
Rasch and R. Plomp
A. Introduction
The aim of research in music
perception is to explain how we respond subjectively to musical sound signals.
In this respect it is a part of psychophysics, the general denomination for
scientific-fields concerned with the relationship between the objective,
physical properties of sensory stimuli in our environment and the subjective,
psychological responses evoked by them. If the stimuli are of an acoustic
nature, we speak of psychoacoustics.
Psychoacoustics can
be of a general,
theoretical
nature; it can
The most important topics of musical psychoacoustics are the subjective
properties of musical tones (pitch, loudness, timbre) and the phenomena that
occur when several tones are presented simultaneously, which is what usually
happens in music (beats and roughness, combination tones, consonance and
dissonance). We will focus our discussion on these topics. However, before we
deal more extensively with them, some attention must be given to the
methodology of psychoacoustics and to the frequency-analyzing power of the car,
a capacity that is fundamental to its perceptual functioning.
B. Methodology
Psychoacoustics is an empirical or,
rather, experimental science. Observations from daily life and informal tryouts
may be starting points for psychoacoustical
knowledge, but the core of the scientific content is the result of laboratory
investigations. In this respect it is an interdisciplinary field of research.
Contributions have been made both by experimental psychologists and by
physicists and acousticians.
A psycboacoustical
experiment
can be
described most simply in a stimulus-response scheme. The stimulus is the sound presented to the subject. The experimenter requires
the subject to give a response. The
experimenter tries to discover the relationship between stimulus and response
characteristics. Both stimulus and response are observable events. The subject
is considered a "black box" that cannot be entered by the
experimenter. Psychoacoustical research is often
carried out without an attempt to explain the experimental results functionally
in terms of sensory processes. Such attempts are made in research that is labeled physiological
acoustics, a part of sensory and neurophysiology.
Our ears are very sensitive organs. Because of this, very accurate
control of the stimulus variables is required in psychoacoustical
experiments. Sound pressure level differences of less than I dB, time
differences of a few msec, and frequency differences of less than
1 Hz can have a profound effect on the subjective response to
a stimulus. It is impossible to obtain well-controlled psychoacoustic stimuli
by manual means, like playing tones or chords on a musical instrument. The
precision of the ear in distinguishing fine nuances is much greater than our
ability to produce these nuances. As a rule, psychoacoustics makes use of
electronic audio equipment that can produce sound stimuli according. to any specification. In recent years it has become feasible
to run the experiments under computer control. The computer can also be used
for storage and analysis of stimuli and response data. Most problems concerning
the production of the stimuli in psychoacoustical
experiments may be considered solved. After the sound stimulus has been
produced, it must reach the subject's eardrum with the least possible
distortion. Usually high-quality headphones are used unless the spatial effect
of the listening environment is involved. Background noises should he reduced,
if not eliminated.
It is possible to have the subject
describe his perception verbally. However, this response is often insufficient
because our sensations allow much finer distinctions than our vocabulary does.
Moreover, the use of words may differ from subject to subject. Because of this,
in psychoacoustics most results are derived from responses made on the basis of
a certain perception without direct reference to the perception itself. For
example, if we have to indicate in which of two time intervals a sound has
occurred, the response is a time indication based on an auditory sensation. A
great deal of inventiveness is often respired of the experimenter in designing
his experimental paradigms.
The procedures used most often in psychoacoustical experiments are choice methods and
adjustment methods: A single presentation of a sound event (one or more
stimuli) to which a response must be made is called a trial. Using choice methods, the
subject has to make, for each trial, a choice from a limited set of
well-defined alternatives. The simplest case is the one with two alternatives,
the two-alternativeforced-choice
(2AFC). The insertion of the word "forced" is essential: The
subject is obliged to choose. He must guess when he is incapable of making a
meaningful choice.
For example, let us assume that the
investigator is studying under what conditions a probe tone can be heard
simultaneously with another, or masking sound. Each trial contains two
successive time periods marked by visual signals. The masking sound is
continuously present; the probe tone occurs in one of two time periods,
randomly determined. If the probe tone is clearly detectable, the subject
indicates whether it was presented in the first or in the second period. If the
tone is not perceived at all, the subject must guess, resulting in an
expectation of 50% correct responses. The transition from clearly detectable
to not detectable tones is gradual. It is reflected by a gradual slope of the
so-called psychometric curve that
represents the percentage of correct responses plotted as a function of the
sound pressure level of the target tone. The sound pressure level that
corresponds to a score of 75% correct responses is usually adopted as the
threshold for detection.
In order to arrive at an accurate
estimate of the threshold, the experimenter varies sound pressure level of the
tone for the successive trials. In the constant
stimuli method the experimenter presents the tones according to a fixed
procedure. The method of constant stimuli is time consuming because a number of
trials are definitely supra- or infra-threshold and, therefore, do not give
much information. Another class of choice methods, called adaptive methods, makes more efficient
use of trials. The experimental series is started with a certain initial value
of the stimulus variable. One or more correct responses, depending upon the
experimental strategy adopted, result in a change in the stimulus variable that
makes it harder for the subject to make a correct choice. If the subject makes
one or more false responses, the experimental task is facilitated. In this way,
the value of the stimulus variable fluctuates around a certain value, which can
be defined to-be the threshold for perception.
Besides choice methods there is the adjustment method. The subject controls
the stimulus variable himself, and he uses this control to find an optimal
value. This
method is not always feasible.The
adjustment method is suitable for stimulus
variables that allow an optimal quality in perception: the best pitch for a tone
in a musical interval, the most comfortable loudness, the greatest similarity
or dissimilarity, etc. The optimal adjustment behaves like a stable equilibrium
between lower and higher, both suboptimal, adjustments. Adjustment methods have
the advantage that the results can be derived directly from the adjusted value,
and do not have to be derived indirectly from the psychometric curve.
C.
The Ear as a Frequency
Analyzer
Only by the ear's capacity to
analyze complex sounds are we able to discriminate
simultaneous tones in music. Frequency
analysis may be con idered the most characteristic
property of the peripheral ear. The cochlea is divide
over
its entire length into two parts by the basilar membrane. In 1942 Von Bekesy was the
first to observe,
with ingenious experimentation, that at every point along its
lenghty this membrane vibrates with maximum amplitude for a
specific frequency. This finding confirmed the hypothesis, launched 80 years
earlier by Helmholtz, that the cochlea performs a
frequency analysis. Sound components with high frequencies are represented
close to the base; components with low frequencies are represented near the
apex of the cochlea. The frequency scale of the sound is converted into a
spatial scale along the basilar membrane.
This capacity of the car means that any periodic sound wave or complex tone is resolved into its frequency
components, also called partials or barmonics (see Fig. 1). In mathematics the
analogous procedure of determining the sinusoidal components of a periodic
function is called Fourier analysis. In
contrast with the theoretically perfect

There are many ways of studying the
extent to which the ear can separate simultaneous tones. Only two approaches
will be considered here. The first method investigates how many harmonics
(with frequencies nf, n = 1,
2, 3, 4, etc.) can be distinguished in a complex tone. This can be done by
using the 2AFC procedure: The listener has to decide which of two simple (sinusoidal) tones-one- with frequency nf, the other with frequency (n ±
1/2)f-is also present in the complex tone. The percentage of
correct responses varies from 100 for low values of n to about 50 for high
values of n. Experiments along these, lines have shown. (Plomp,
1964) that, on the average, listeners are able to distinguish the
first
A quite different approach involves
measuring the minimum sound pressure level necessary for a probe tone to be
audible when presented with a complex,
tone.
This is the so-called masked threshold; by varying the probe-tone frequency, we
obtain the "masking pattern" of the complex tone. In Fig. 2 such a
pattern is reproduced. The masking pattern of a complex tone of 500 Hz reveals
individual peaks corresponding to the first five harmonics, nicely demonstrating
the limited frequency-analyzing power of the ear.
The usual measure indicating how well a system is able to analyze complex signals is its bandwidth. The finding that the fifth harmonic can be distinguished from the fourth and the sixth means that the mutual distance should be a minor third or more. This distance constitutes a rough, general estimate of the bandwidth of the hearing mechanism, known in the psychophysical literature as the critical bandwidth (Fig. 3). A detailed review (Plomp, 1976) revealed that the bandwidth found experimentally is dependent on the experimental conditions. The values may differ by a factor of two.
In the lower frequency region (below 500 Hz) critical bandwidth is more or less constant if expressed in Hz. That means that musical intervals (frequency ratios) larger than the critical bandwidth at high frequencies may fall within the critical bandwidth at lower frequencies.


II.
PERCEPTUAL ATTRIBUTES OF SINGLE TONES
A. Pitch
Pitcb
is the most characteristic property of tones, both
simple (sinusoidal) and complex. Pitch systems (like the diatonic-chromatic and
the 12-tone system) are among the most elaborate and intricate ever developed
in Western and non-Western music. Pitch is related to the frequency of a simple
tone and to the fundamental frequency of a complex tone. The frequency of a
tone is a property that can usually be controlled in production and is well
preserved during its propagation to the listener's cars. '
For our purposes, pitch may be characterized as a one-dimensional attribute, i.e., all tones can be ordered along a
single scale with respect to pitch . The extremes of this scale are low
(tones with low frequencies) and
high (tones with
high frequencies). Sometimes tones with different spectral compositions
(timbres) are not easily comparable as to pitch. It is possible that the
clearness of pitch varies, for example, as a result of important noise
components or inharmonic partials, or that the
subjective character of the pitch varies, for example, when
comparing the pitch of simple and complex-tones.
There are a number of subjective pitch scales:
1. The mel
scale (see Stevens, Volkmann, & Newman, 1937). A simple tone of 1000
Hz has a defined pitch of 1000 mel. The pitch in mels of other tones with another frequency must be
determined by comparative scaling experiments. A sound with a itch subjectively twice that of a 1000 Hz tone is 2000 mel;
"half pitch"
is 500
2. The musical pitcb scale (i.e., the
ordinary indications C1, D1,
. . . C4,. . . , A4, etc.). These indications are only usable in
musical situations.
3. The physical
frequency scale in Hz. In psychoacoustical literature the pitch of a tone is often
indicated by its frequency or, in the case of complex tones, by its fundamental
frequency. Since the correspondence between frequency and pitch is monotonic,
frequency is a rough indication of our pitch sensation. It must be realized
however, that our perception operates more or less on the basis of a
logarithmic frequency scale.
Pitch in its musical sense has a range of about 20 to 5000 Hz, roughly the range of the fundamental frequencies of piano strings and organ pipes. Tones with higher frequencies are audible but without definite pitch sensation. Low tones in the range of 10 to 50 Hz can have the character of a rattling sound. The transition from the perception of single pulses to a real pitch sensation is gradual. Pitch can be perceived after very few periods of the sound wave have been presented to the ear.
Simple tones have unambiguous
pitches that can be indicated ,by means of their
frequencies. These frequencies may serve as reference frequencies for the pitches
of complex tones. The pitch sensation of complex tones is much more difficult
to understand than the pitch of simple tones. As was discussed, the first
However, a complex tone, as heard in practice, is characterized by a single pitch, the pitch of the fundamental component. This pitch will be referred to as low pitch here.
In psychoacoustical literature
this pitch is also known under a variety of other terms, such as periodicity
pitch,
repetition
pitch,
residue pitch, and virtual pitch. Experiments (Terhatdt, 1971) have shown that the pitch of a complex tone
with fundamental frequency f is somewhat lower than that of a sinusoidal tone
with frequency f The existence of low pitch of a
complex tone raises two questions. First, why are all components of the complex
tones perceived as a perceptual unit; that is,
why do all partials fuse into
one percept? Second, why is the pitch of this perceptual tone the pitch of the
fundamental component?
The first question can be answered
with reference to the Gestalt theory of perception. The "Gestalt
explanation" may be formulated as follows. The various components of a
complex tone are always present simultaneously. We become familiar with the
complex tones of speech signals (both of our own speech and of other speakers)
from an early age. It would not be efficient to perceive them all separately.
All components point to a single
source and meaning
so that perception of them as a unit gives a simpler view of the environment
than separate perception. This mode of perception must be seen as a perceptual
learning process. Gestalt psychology has formulated a number of laws that
describe the perception of complex sensory stimuli. The perception of low pitch
of complex tones can be classed under the heading of the "law of common
fate." The harmonics of a complex tone exhibit "common fate."
The second question can also be answered wit the help of a learning process
directed toward perceptual efficiency. The periodicity of a complex tone is the
most constant feature in its composition. The amplitudes of the partials are
subjected to much variation, caused by selective reflection, absorption, passing of
objects, etc. Masking can also obscure certain partials. The periodicity, however,
is a very stable and constant factor in a complex tone. This is reflected in the
wave form built up from harmonics. The periodicity of a complex tone is at the
same time the periodicity of the fundamental component of the tone.
The perception of complex tones can be seen as a pattern recognition process.
The presence of a complete series of harmonics is not a necessary condition for
the pitch recognition process to sue ed. It is sufficient that at least a few pairs of
adjacent harmonics are present so the the periodicity can be determined. It is
conceivable that there is a perceptual learning process that makes possible the
recognition of fundamental periodicity from a limited number of harmonic
partials. This learning process is based on the same experiences as those that led
to singular pitch perception. Pattern recognition theories of the perception of
low pitch are of relatively recent origin. Several times they have been worked out
in detailed mathematical models that simulate the perception of complex tones
(Goldstein, 1973; Wightman, 1973; Terhardt, 1974a; see also de Boer, 1976, 1977;
Patterson & Wightman, 1976; Gerson & Goldstein, 1978; Houtsma, 1979;
Piszczalski & Galler, 1979). It will probably take some time before the questions
about the low singular
pitch of complex tones are completely solved.
The classical literature on tone
perception abounds with theories based on von Helmholtz's
(1863) idea that the low pitch of a complex tone is based on the relative
strength of the fundamental component. The higher harmonics are thought only to
influence the timbre of the tones but not to be strong enough to affect pitch.
However, low pitch perception also occurs when the fundamental component is
not present in the sound stimulus. This was already observed by Seebeck (1841) and brought to the attention of the modern psychoacousticians by Schouten
(1938). These observations led Schouten to the
formulation of
a
periodicity pitch tbeory.
In this theory pitch is derived from the
waveform periodicity of the unresolved higher harmonics of the stimulus, the residue. This periodicity does
not change if a component (e.g., the fundamental one) is removed. With this
theory the observations of Seebeck and Schouten concerning tones without fundamental components
could be explained. An attempt has also been made to explain the low pitch of a
tone without fundamental ("the missing fundamental") as the result of
the occurrence of combination tones, which provide a fundamental component in
the inner ear. However, when these combination tones are
effectively masked by low-pass noise, the sensation of low pitch remains (Licklider, 1954).
In musical practice complex tones with weak or absent fundamentals are
very common. Moreover, musical tones are often partially masked by other tones.
These tones can, however, possess very clear low pitches. Effective musical
sound stimuli are often incomplete when compared to the sound produced by the
source (instrument, voice)
dominance region are most influential with regard to pitch. One
way of showing this is to work with tones with inharmonic
partials. Assume a tone with partials of 204, 408, 612, 800, 1000, and 1200 Hz.
The first three partials in isolation would give a pitch of "204 Hz."
All six together give a pitch of "200 Hz" because of the relative
weight of the higher partials, which lie in the. dominance
region. The low pitch of complex tones with low fundamental frequencies (under
500 Hz) depends on the higher partials. The low pitch..of tones with high fundamental frequencies is determined
by the fundamental because it lies in the dominance region.
Tones with inharmonic components have been
used quite frequently in tone perception research. An approximation of the
pitch evoked by them is the fundamental of the least-deviating harmonic series.
Assume a tone with components of 850, 1050, 1250, 1450, 1650 Hz. The
least-deviating harmonic series is 833, 1042, 1250, 1458, and 1667 Hz, which
contains the fourth, fifth, sixth, seventh, and eighth harmonics of a complex
tone with a fundamental of 208.3 Hz. This fundamental can be used as an
approximation of the pitch sensation of the inharmonic
complex (Fig. 4). Let us consider an inharmonic tone
with frequency components of 900, 1100, 1300, 1500, 1700 Hz. This tone has an
ambiguous pitch, since two approximations by harmonic series are possible,
namely one with a fundamental of 216.6 Hz (the component of 1300 Hz being the
sixth harmonic in this case) and one with a fundamental of 185.9 Hz (1300 Hz
being the seventh harmonic).
If not all partials of a complex tone are necessary for low pitch perception, how few of them are sufficient? The following series of experimental investigations show a progressively decreasing number (see Fig. 5). De Boer (1956) N%,orkcd with five harmonics in the dominant region; Schouten, Ritsma, and Cardozo (1962), with three; Smoorenburg (1970), with two; Houtsma and Goldstein (1972), with one plus onethat is, one partial presented to each ear. In the latter case it is also possible to elicit low pitch perception. The authors concluded that low pitch was a central neural process not brought about by the peripheral sense organ (the ears). The last step in the series should be a low pitch perception evoked by one partial. That this is also possible has been shown by Houtgast (1976). The following conditions have to be fulfilled: The frequency region of the low pitch has to be filled with noise, the single partial must have a low signal-to-noise ratio, and attention has to be directed to the fundamental frequency region by prior stimuli. These conditions create a perceptual situation in which it is not certain that the fundamental is not there so that we are brought to the idea that it should be there by inference from earlier stimuli.


B.
Loudness
The
physical correlate that underlies the loudness of a tone is intensity, usually
measure, expressed either relative to a zero level defined in
the experimental situation or relative to a general reference sound pressure of
2 x 10-5 N/m2. Sound pressure levels of performed music vary roughly from 40 dB
for a pianissimo to about 90 dB for a full orchestral forte-tutti
(Winckel, 1962). By means of electronic amplification
higher levels are reached in pop concerts. These levels, sometimes beyond 100
dB, are potentially damaging to the ear in case of prolonged presentation (Flugrath, 1969; Rintelman,
Lindberg, & Smitley, 1972; Wood & Lipscomb,
1972; Fearn, 1975a,b).
The subjective assessment of loudness is more complicated than the
physical measurement of the sound pressure level. Several loudness scales have been proposed during the last decades. None of
them, however, can be applied fully satisfactorily in all conditions. We give
the following summary review:
01 The sone
scale, a purely psychophysical
loudness scale (Stevens, 1936). The loudness of a
simple (sinusoidal) tone of 1000 Hz with a sound pressure level of 40 dB is
defined to be 1 sone; a tone with double loudness is
assigned the loudness of 2 sones, etc. In general; a
sound of X sones
is n times louder than a sound of X /
n sones. The experimental determination of
the relationship between the physical sound level and the psychophysical
loudness is not very reliable because of the uncertainty of what is actually
meant by "X times louder."
02 The
sone scale, a mixed physical -psychophysical loudness scale with
scale values expressed in dB and, therefore, termed loudness level (LL). The loudness level of a sound in phones
is equal to the sound pressure level of a 1000 Hz tone with the same loudness.
For tones of 1000 Hz the identity relation SPL = LL holds. The loudness level of simple tones with other tones
with other frequencies and of complex tones or other sounds (noises, etc.) is
found by comparison experiments, which can be done with acceptable reliability.
These comparisons may be used to draw contours of equal loudness as a function
of, for example, frequency.
03
The sensation-level scale,
also a mixed scale. Sensation level is defined as the sound pressure level
relative to threshold level and, as such, is also expressed in dB. II may
differ as a function of frequency or other characteristics of a sound but also from subject to subject.
04 In many papers on psychoacoustics
no loudness indications are given. Instead, physical levels are mentioned. For
the investigator this is the most precise reference and at
the same time a rough indication of subjective loudness.
In the description of the relation between sound pressure level and loudness, a cleat distinction must be made between sounds with all spectral energy within one critical band and sounds with spectral energy spread over more than one critical band. If al sound energy is limited to one critical band; the loudness L in sones increase monotonically with intensity I. The relation is often approached by the equation
n
L = kl , [n may be 0.33]
in which k and n are empirically chosen constants. A consequence of this relation is the rule that equal intensity ratios result in equal loudness ratios. Now, an intensity ratio is a fixed level difference (dB) ...
This
have been much interested in the level difference that
results in doubling or halving loudness, and many experiments have been carried
out to establish this. The outcomes of these experiments are disappointingly
dissimilar. Stevens (1955) summarized all experiments known to him with the median value of
10 dB for doubling loudness, later
(1972) modified to 9 dB. These values correspond to values of n = 0.3
and n = 0.33 for the exponent in the formula. It is also
possible to interpret the subjective loudness judgment
as an imaginary judgment
of distance to the sound source. In this theory (
The assessment of loudness is a complicated matter if sound energy is
present in more than one critical band. This situation is the common one for
musical tones, especially for chords, and music played by ensembles, choirs,
and orchestras. Total loudness is greater than when the same amount of sound
energy is concentrated within one critical band. A number of models have been
proposed that intend to be simulations of the perceptual processes involved and
the parameters of which have been assigned values in accordance with
psychophysical experiments. Well known are the models by Stevens (1955), Zwicker, Flottorp, & Stevens
(1957), Zwicker and Scharf
(1965), and Stevens (1972). These models have also been applied to musical
sounds, especially to organ tones (Churcher, 1962;
Pollard, 1978a,b).
Although loudness variations play an important role in music, they are
less important than pitch variations. The number of assignable loudness
degrees in music is limited to about five, coded musically from soft to loud as
pianissimo, piano, mezzoforte, forte, and
fortissimo. The definition of these loudness degrees is rather imprecise
(Clark & Milner, 1964; Clark & Luce, 1965; Patterson, 1974). Judgment
of musical loudness cannot have the degree of reliability 'and preciseness that
is possible with the judgment of (relative) pitch, duration, tempo, etc. 'this
is a consequence of the fact that the underlying physical dimension, intensity,
is hard to control precisely. Sources of variation are encountered in sound
production, in the fixed acoustic conditions of a room (absorption and thus
attenuation by walls, floor, ceiling, etc.), in variable acoustic conditions
(like the presence or the absence of an audience, the relative positions of
sound source and listener, disturbing external noises), and in the audiograms of the listeners. In all
the stages on the road from sound production to sound perception, sound
pressure level is liable to be altered whereas frequency is not.
C. Timbre
Timbre is, after pitch and
loudness, the third attribute of the subjective experience of musical tones.
Subjectively, timbre is often coded as the function of the sound source or of
the meaning of the sound. We talk about the timbre of certain musical
instruments, of vowels, anti of
sounds that signify certain events in our environment
What are the physical parameters that contribute to the perception of a certain timbre? In a restricted sense timbre may be considered the subjective counterpart of the spectral composition of tones. Especially. important is the relative amplitude of the harmonics. This view was first stated by Helmholtz over a century ago and is reflected by the definition of timbre according to the American Standards Association (Acoust. Terminology S1.1., 1960):
"Timbre is that attribute of auditory sensation in terms of which a listener can judge that two steady-state complex tones having the same loudness and pitch are dissimilar."
Recent research
has shown that temporal characteristics of the tones may have a profound
influence on timbre as well, which has led to a broadening of the concept of
timbre (Schouten, 1968). Both onset effects

Sounds cannot be ordered on a single
scale with respect to timbre. Timbre is a multidimensional
attribute of the perception of sounds. Dimensional research is highly
time-consuming and is therefore always done with a restricted set of sound
stimuli. The dimensions found in such an investigation are of course determined
by the stimulus set.
Dimensional research of timbre leads to the ordering of sound stimuli on the dimensions of a timbre space. An example of such research is that by Von Bismarck (1974a,b). His stimulus set contained a large number (35) of tone and noise stimuli. The most important factors found by him can be characterized as follows:
(a) sharpness, determined by a distribution of spectral energy that has its gravity point in the higher frequency region and
(b) compactness, a factor that distinguishes between tonal (compact) and
noise (not compact) aspects of sound.
In some investigations sound stimuli
have been submitted to multidimensional scaling, both perceptual and physical.
The physical scaling can be based on the spectral composition of the sounds, as
was done in Plomp's (1979) experiments with tones
from a number of organ stops. Figure 6 gives the two-dimensional representation
of 10 sounds, both perceptual and physical. The representations correspond
rather well, leading to the conclusion that in this set of stimuli the sound
spectrum is the most important factor in the perception of timbre.
Other examples of dimensional research on timbre are the investigations by Plomp (1970), Wedin and Goude (1972), Plomp and Steeneken (1973), Miller and Carterette (1975), Grey (1977), and de Bruijn (1978).
III. PERCEPTUAL
ATTRIBUTES OF SIMULTANEOUS TONES
A.
Beats and Roughness
In this and the following sections
we will discuss perceptual phenomena that occur as the
result of two simultaneous tones. We
will call the simultaneously sounding for the
primary tones.
We consider first the case of two simultaneous simple tones. Several conditions can be distinguished, depending on frequency difference (Fig. 7). If the two prima tones have equal frequencies, they fuse into one tone, in which the intensity depends on the phase relation between the two primary tones. If the tones differ somewhat frequency, the result is a signal with periodic amplitude and frequency variation with a frequency equal to the frequency difference.

The
amplitude variations, however, can be
considerable and result in a fluctuating intensity and perceived
loudness. These loudness fluctuations are called beats, if they can be discerned individually by the ear, which
occurs if their frequency is less than about 20 Hz. A stimulus equal to the sum
of two simple tones with equal amplitudes and frequencies f and g
p(t)
= sin 2đft + sin
2đgt
can be described as
p(t) = 2 cos 2đ 1/2(g - f)t x sin 2đ 1/2(f + g)t
This is a signal with a frequency that is the average
of the original primary frequencies, and an amplitude
that fluctuates slowly with a beat
frequency of g-f
Hz (Fig. 8). Amplitude variation is less strong if the two
primary tones have different amplitudes.
When the frequency difference is larger than about 20 Hz, the ear is no longer able to follow the rapid amplitude fluctuations individually. Instead of the sensation of fluctuating loudness, there is a rattle-like sensation called roughness. Beats and roughness can only occur if the two primate tones are not resolved by the ear that means. not processed separately but -combined). If the frequency difference is larger than the critical band, the tones are perceived individually with no interference-phenomena.
In musical sounds beats
can occur with just noncoinciding harmonics of
mistuned consonant intervals of complex tones. If the fundamental frequencies
of the tones of
No psychophysical research has been done on mistuned intervals of complex tones, but to a certain extent psychophysical results found with two beating simple tones and with amplitude-modulated simple tones (see Fig. 9) can be applied to the perception of beating mistuned intervals of complex. tones (Zwicker, 1952; Terhardt, 1968a,b, 19746). The following relations can be stated. Thresholds vary with beat frequency. There appears to be a minimum at about 5 to 10 Hz. The threshold decreases when the sound pressure level increases. It is possible to define perceptual quantities called beating strength and roughness strength and to determine their values as a function of stimulus characteristics. Research following this line has shown that such a quantity increases with modulation depth and with sound pressure level. Moreover, there seems to be a modulation frequency giving maximal roughness (about 50 to 70 Hz).

B. Combination Tones
Two simple tones
at a relatively high
sound pressure level and with a frequency difference that is not too large can
give rise to the perception of so-called combination tones. These combination
tones arise in the ear as a product of nonlinear transmission characteristics.
The combination tones are not present in the acoustic signal. However, they
are perceived as if they were present. The ear cannot distinguish between
perceived components that are "real" (in the stimulus) and those that
are not (combination tones). The combination tones are simple tones that may
be cancelled effectively by adding a real simple tone with the same frequency
and amplitude but opposite phase. This cancellation tone can be used to
investigate combination tones.
The
possible frequencies of combination
tones can be derived from a general transmission function. Assume a
stimulus with two simple tones:
p(t)
= cos 2đft
+ cos
2đgt
f and g being the two frequencies. Linear transmission is described by
d=ap+c
(a and c being constants). If transmission is not linear,
higher order components are introduced:
d=atp+a2p2+a3p3+...
The quadratic term can be developed as follows:
p2 = (cos 2pft + cos
2pgt)
= 1 + 1/2 cos 2đ2ft. + 1/2 cos 2đ2gt + cos 2đ(f+g)t + cos 2đ(f - g)t
It can be seen that components with frequencies 2f, 2g, f + g, and f - g are introduced in this way. Similarly, the cubic term can be
developed:
3
p3 = (cos 2đft + cos 2gt) =
=9/4 cos 2đ ft
+
9/4 cos 2đgt +1/4cos 2đ
3 ft +
+ 1/4cos 2đ3gt + 3/4 cos 2đ(2f + g)t + 3/4 cos 2đ(2g + f)t +
+
3/4 cos 2đ(2f - g)t +
3/4 cos
2đ (2g - f)t
This term is responsible for components with frequencies 3f 3g, 2f + g, 2g + f, 2f - g, 2g - f. The higher terms of the nonlinear transmission formula can be worked out analogously. The factors just preceding the cosine terms indicate the relative amplitudes of the components in their groups. Psychoacoustical research on combination tones has shown that the pitches of the combination tones agree with the frequencies predicted by nonlinear transmission (Plump, 1965; Smoorenburg, 1972a,b; Hall, 1975; Weber & Mellert, 1975; Schroeder, 1975b; Zurek & Leskowitz, 1976). However, the correspondence between the relative amplitude predicted and the subjective loudness measured is far from perfect. Clearly, the phenomenon of combination tones is is more complicated than can be described in a simple formula. Moreover, there are individual differences, which should be expected since this is a distortion process. Experiments have shown (see Fig. 10)

that the following combination tone frequencies are the most important: the so-called diference tone with frequency g - f Hz, the second-order difference tone with frequency 2f - g Hz, and the third-order difference tone with frequency 3f - 2g Hz. The diagram illustrates that the combination tones arc stronger for small frequency differences of the primary tones than for large differences; this indicates that the origin of combination tones is tightly connected with the frequency-analyzing process in the inner ear. It should be noted that the importance of summation tones (with frequency f + g) and the so-called aural harmonics (with frequencies 2f, 3f, etc., and 2g, 3g, etc.) is questionable. Although combination tones were discovered by musicians in musical contexts (Tartini and Sorge in the eighteenth century), their significance for music is not very high. They can be easily evoked by playing loud tones in the high register on two flutes or recorders or double stops on the violin.
In a normal listening situation,however, their levels are usually low to attract attention. Moreover, they will be masked by the tones of other (lower) instruments. Some violin teachers (following Tartini) advise the use of combination tones as a tool for controlling the intonation of double-stop intervals. Because audible combination tones behave more as simple tones in lower frequency regions than the complex tones to be intonated, a pitch comparison of combination tones and played tones should not be given too much weight.
C. Consonance and Dissonance
The simultaneous sounding of several tones may be pleasant or "euphonious" to varying degrees. The pleasant sound is called consonant; the unpleasant or rough one, dissonant. The terms consonance and dissonance have been used here in a perceptual or sensory sense. This aspect has been labeled tonal consonance (Plump & Levelt, 1965) or sensory consonance (Terhardt, 1976), to be distinguished from consonance in a musical situation. Musical consonance has its roots in perceptual consonance, of course, but is dependent on the rules of music theory, which, to a certain extent, can operate independently from perception.
The perceptual consonance of an
interval consisting of two simple tones depends directly
a on the frequency difference between the tones, not upon the frequency ratio
(or musical interval). If the frequency separation is very
small or large (more than critical bandwidth-the tones not interfering with
each other), the two tones together sound consonant. Dissonance
occurs if the frequency separation is less than a critical bandwidth
(see Fig.11). The most dissonant interval arises
with a frequency separation of about a quarter of the critical bandwidth:
about 20 Hz in low-frequency regions, about 4% (a little less than a semitone)
in the higher regions (Fig. 12). The frequency separation of the minor third
(20%), major third (25%), fourth (33%). fifth (50%), arid so on is
usually enough to' give consonant combination of simple tones. However, if
the frequencies are low, the frequency separation of thirds (and eventually
also fifths) is less than critical bandwidth so that even these intervals cause
a


The consonance of intervals of complex tones can be derived from the
consonances of the simple-tone combinations comprised in them. In this case
the dissonance is the additive element. The dissonance of all combinations of neighboring partials can be determined and added to give
the total dissonance and, inversely, the total consonance of the sound. Sounds with widely spaced partials, such as clarinet tones (with
only the odd harmonics) are more consonant than sounds with narrowly spaced
partials. The composition of the plenum of an organ is such that the
partials are widely spaced throughout the spectrum. Some mathematical models
have been worked out that describe the dissonance of a pair of simple tones and
the way in which the dissonances of partial pairs in tonne complexes have to be
added (Plump & Levelt, 1965; Kamcoka
& Kuriyagawa, 1969a,l);
Hutchinson, 1978). As far as can be
decided, these models give a
good picture of
consonance perception.
The consonance of a musical interval, defined as the sum of two complex tones with a certain ratio in fundamental frequency, is highly dependent on the simplicity of the frequency ratio. Intervals with frequency ratios that can be expressed in small integer numbers (say, less than 6) are relatively consonant because the lower, most important components of the two tones are either widely apart or coincide. If the frequency ratio is less simple, there will be a number of partials from the two tones that differ only a little in frequency, and these partial pairs give rise to dissonance. It seems that intervals with the number 7 in their frequency proportions (7/4, 7/5.... ) are about on the borderline between consonance and dissonance.
Experiments
with inharmonic
partials (Slaymaker,
1970; Pierce, 1966) have
shown that is
not necessarily on the simple frequency ratio between the
fundamental frequencies (which is usually the cause of the coincidence).
If the number of partials in a
complex tone increases or if the strengths of the higher harmonics (with narrow
spacing) increase, the tone is perceived as more dissonant (compare the trumpet
with the flute, for instance). However, the nth partial is required in order
to make an interval with frequency ratio n : m or m : n relatively consonant. For example,
if the fifth harmonic is absent, the usual beating (dissonance) of a mistuned
major third (4:5) will be absent (see also Fig. 12).
Musical consonance in Western polyphonic and harmonic music is clearly
based on perceptual consonance of complex (harmonic) tones. Intervals with
simple frequency ratios are consonant. Intervals with nonsimple
frequency ratios are dissonant. The way in which consonance and dissonance are
used in music theory and composition varies considerably from one historical
period to another.
IV. CONCLUSION
More than a century ago von Helmholtz published his classic volume On the Sensations of Tone (1863). The subtitle specifically indicates the intention of this study: As a Physiological Basis for the Theory of Music." For Helmholtz the theory of music (as a compendium of rules that control composition and as such the musical sound stimulus) could only be understood fully if it could be shown that its elements had their origin in the perceptual characteristics of our hearing organ. Helmholtz's working hypothesis has been put aside by later investigators, both those who worked in music and those who worked in psychoacoustics. Several reasons for this can be given. First, before the introduction of electroacoustic means of tone production and control in the 1920s, it was not possible to carry out the necessary psychoacoustical experiments, while Helmholtz's observations proved to be insufficient in many ways. Second, it turned out that music theory has its own rules apart from the perceptual relevance of the characteristics of the sounds that it creates. Therefore, it is not clear, neither for the music theorist nor for the psychoacoustician, which aspects of music theory should be subjected to psychoacoustical research and which should not. Fortunately, in recent years much research has been initiated that is aimed at the investigation of the relationship between musical-theoretical and perceptual entities. For the time being, no complete view can be given, but there may come a time in which Helmholtz's ideas on the relation between the properties of our perceptual processes anti the elements of musical composition can receive new, more complete and exact formulations than was possible a century ago.
REFERENCES
-Berger, K. W. Some factors in the recognition of timbre.
Journal of the Acoustical Society of
R. A. Rasch and R. Plomp -
Boer, L:. dc. On the 'residue' and
auditory pitch perception. In W. D. Keidel
& W. D. Neff (U-s.), Handbook of
sensory physiology. (Volume
V, Auditory system, Part 3, Clinical and special topics)
Boer, E. de. Pitch theories unified. In E. F. Evans & J. 1'. Wilson (Eds.), Psycbopbysics and physiology of hearing.
Bruijn, A. de. Timbre-classification of complex tones. Acustica, 1978, 40, 108-114.
Churcher, B. G. Calculation of loudness levels for musical
sounds. Journal of the Acoustical Society
of
Evans, E. F., & Wilson, J. I'. (I-As.), Psycbopbysics and physiology of bearing.
-Fearn, R. W. Level limits on pop music.-Journal of Sound arid Vibration, 1975, 38, 591-591. (a)
-Fearn, R. 1f. Level measurements of music. Journal of Sound and Vibration, 1975, 43, 588-591. (b) Flugrath, J. M. Modern-day rock-and-roll music and
damage-risk criteria.
Journal of the
Acoustical Society of
Gerson, A.,
& Goldstein, J. L. Evidence for a general template in central optimal
processing for
pitch of complex tones.
Journal of the Acoustical Society of
Goldstein, J.
L. An optimum processor theory for the central formation of
the pitch of complex tones.
Journal
of the Acoustical Society of
Green, D. M. An
introduction to bearing.
Hillsdale,
-Grey, J. M. Multidimensional perceptual scaling of musical timbres.
Journal of the Acoustical Society of
Hall, J. L. Nonmonotonic behavior of
distortion product 2f,-f,: Psychophysical observations. Journal of the Acoustical Society of
Helmholtz,
H.
von. Die Lebre von den Tonempfndungen ads pbysiologische
Grundlage fir die Tbeorie der Musik (Secbste
Ausg.). Braunschweig: Vicweg, 1913 (Ist ed., 1863).
Translated by A. J. Ellis as:
On the
sensations of tone as a physiological basis for the theory of music.
Houtgast, T. Subharmonic pitches
of a pure tone at low S/N ratio.
Journal
of the Acoustical Society of
Houtsma, A.J.M. Musical pitch of
two tone complexes and predictions by modern pitch theories.
Journal of the
Acoustical Society of
Houtsma, A.J.M., & Goldstein,
J. L. The central origin
of the pitch of complex tones: Evidence from musical interval recognition.
Journal of the Acoustical Society of
-Kamarka, A., & Kuriyagawa, M.
Consonance theory, Part 11: Consonance of complex tones and its calculation
method. Journal of the Acoustical Society
of
Licklider, J.C.R. 'Periodicity' pitch and 'place' pitch. Journal of the Acoustical Society
of
Miller J. R., & Carterette, E. C. Perceptual space for musical structures. Journal of the Acoustical
Society of
Patterson, R. D., & Wrightman, F. L. Residue pitch as a function of component spacing. Journal of
the
Acoustical Society of
-Pierce, J. R. Attaining consonance in arbitrary
scales.
Journal of the
Acoustical Society of
1966,
40,
249.
Piszczalski, M., & Galler, B. A. Predicting musical pitch from component frequency ratios. Journal of
the Acoustical Society of
Plomp, R. The ear as a frequency
analyzer. Journal of the
Acoustical Society of
Plump, R. Pitch of complex tones.
Journal of the
Acoustical Society of
Plump, R.
Auditory psychophysics. Annual Review of Psycholagy,
1975, 16,
207-232. Plump, R.
Aspects of tone sensation.
Plump, R. Fysikaliska motsvarigheter
till klanf5rg hos stationara
Ijud. In Vkr boned ocb musiken.
-Plump, R., & Levelt, W.J.M. Tonal consonance and critical bandwidth. Journal of the Acoustical Society of •
Plump, R., & Smoorenburg,
G. F. (Lds.),
Frequency
analysis and periodicity detection in bearing.
Plomp, R., & Steenekcn,
fields. Actutica, 1973, 28, 49-59.
Pollard, H. F. Loudness of pipe organ sounds. 1. Plenum combinations. Acustica,
1978, 41, 65-74. (a) Pollard, H. F. Loudness of pipe organ sounds. 11. Single
notes. Acustica, 1978, 41, 75-85. (b) Rintelmann, W. F., Lindberg, R. F., & Smitley,
E. K. Temporary threshold shift and recovery patterns
and-roll, music presentation.
Journal of the Acoustical Society of
1249-1255.
Ritsma, R. J. Frequencies dominant in the perception of the pitch of complex
sounds.
Journal
of the Acoustical Society of
Roederer, J. G.
Introduction to the pbysia and psycbopbysics of
music.
Saldanha, E. L., & Corso, J. F. Timbre cues and the identification of musical
instruments.
Journal of the
Acoustical Society of
Schouten, J. F. The perception of subjective tones. Proceedings of the Koninklijke
Nederlandse Akademie van Wetenscbappen, 1938, 41, 1083-1093.
Schouten, J. R., Ritsma, 12. J., & Cardow, B. L. Pitch of the residue. Journal of the
Acoustical Society of
Schouten, J. F. The perception of timbre. In Report of the Sixth International Congress
on Acoustics,
-Schroeder, M. R. Models of hearing. Proceedings of the IEEE, 1975, 63, 1332-1350.
Schroeder, M. R. Amplitude behavior of the cubic difference tone. Journal of the
Acoustical Society of
Schubert, E. D. (Ed.) Prycbological
acoustics.
1979 (Benchmark Papers in Acoustics
13).
Seashore,
C. E. Psychology of music.
-Slaymaker, F. H. Chords from tones having
stretched partials.
Journal of the
Acoustical Society of
Smaorenburg, G. F. Pitch perception of two-frequency stimuli.
Journal of the
Acoustical Society of
Smoorrenburg, G. F. Audibility region of combination tones.
Journal of the
Acoustical Society of