The Perception of Musical Tones

R. A. Rasch and R. Plomp


A. Introduction


The aim of research in music perception is to explain how we respond subjectively to musical sound signals. In this respect it is a part of psychophysics, the general denomination for scientific-fields concerned with the relationship between the objective, physical properties of sensory stimuli in our environment and the subjective, psychological responses evoked by them. If the stimuli are of an acoustic nature, we speak of psychoacoustics. Psychoacoustics can be of a general, theoretical nature; it can also be applied to a certain class of auditory stimuli, such as music and speech. This chapter is devoted to musical psycboacoustics.

The most important topics of musical psychoacoustics are the subjective properties of musical tones (pitch, loudness, timbre) and the phenomena that occur when several tones are presented simultaneously, which is what usually happens in music (beats and roughness, combination tones, consonance and dissonance). We will focus our discussion on these topics. However, before we deal more extensively with them, some attention must be given to the methodology of psychoacoustics and to the frequency-analyzing power of the car, a capacity that is fundamental to its perceptual functioning.


B. Methodology


Psychoacoustics is an empirical or, rather, experimental science. Observations from daily life and informal tryouts may be starting points for psychoacoustical knowledge, but the core of the scientific content is the result of laboratory investigations. In this respect it is an interdisciplinary field of research. Contributions have been made both by experimental psychologists and by physicists and acousticians.

A psycboacoustical experiment can be described most simply in a stimulus-response scheme. The stimulus is the sound presented to the subject. The experimenter requires the subject to give a response. The experimenter tries to discover the relation­ship between stimulus and response characteristics. Both stimulus and response are observable events. The subject is considered a "black box" that cannot be entered by the experimenter. Psychoacoustical research is often carried out without an attempt to explain the experimental results functionally in terms of sensory processes. Such attempts are made in research that is labeled physiological acoustics, a part of sensory and neurophysiology.

Our ears are very sensitive organs. Because of this, very accurate control of the stimulus variables is required in psychoacoustical experiments. Sound pressure level differences of less than I dB, time differences of a few msec, and frequency differences of less than 1 Hz can have a profound effect on the subjective response to a stimulus. It is impossible to obtain well-controlled psychoacoustic stimuli by manual means, like playing tones or chords on a musical instrument. The precision of the ear in distinguishing fine nuances is much greater than our ability to produce these nuances. As a rule, psychoacoustics makes use of electronic audio equipment that can produce sound stimuli according. to any specification. In recent years it has become feasible to run the experiments under computer control. The computer can also be used for storage and analysis of stimuli and response data. Most problems concerning the production of the stimuli in psychoacoustical experiments may be considered solved. After the sound stimulus has been produced, it must reach the subject's eardrum with the least possible distortion. Usually high-quality headphones are used unless the spatial effect of the listening environment is involved. Background noises should he reduced, if not eliminated.


It is possible to have the subject describe his perception verbally. However, this response is often insufficient because our sensations allow much finer distinctions than our vocabulary does. Moreover, the use of words may differ from subject to subject. Because of this, in psychoacoustics most results are derived from responses made on the basis of a certain perception without direct reference to the perception itself. For example, if we have to indicate in which of two time intervals a sound has occurred, the response is a time indication based on an auditory sensation. A great deal of inventiveness is often respired of the experimenter in designing his experimental paradigms.

The procedures used most often in psychoacoustical experiments are choice methods and adjustment methods: A single presentation of a sound event (one or more stimuli) to which a response must be made is called a trial. Using choice methods, the subject has to make, for each trial, a choice from a limited set of well-defined alternatives. The simplest case is the one with two alternatives, the two-alternative­forced-choice (2AFC). The insertion of the word "forced" is essential: The subject is obliged to choose. He must guess when he is incapable of making a meaningful choice.

For example, let us assume that the investigator is studying under what conditions a probe tone can be heard simultaneously with another, or masking sound. Each trial contains two successive time periods marked by visual signals. The masking sound is continuously present; the probe tone occurs in one of two time periods, randomly determined. If the probe tone is clearly detectable, the subject indicates whether it was presented in the first or in the second period. If the tone is not perceived at all, the subject must guess, resulting in an expectation of 50% correct responses. The transi­tion from clearly detectable to not detectable tones is gradual. It is reflected by a gradual slope of the so-called psychometric curve that represents the percentage of correct responses plotted as a function of the sound pressure level of the target tone. The sound pressure level that corresponds to a score of 75% correct responses is usually adopted as the threshold for detection.

 In order to arrive at an accurate estimate of the threshold, the experimenter varies sound pressure level of the tone for the successive trials. In the constant stimuli method the experimenter presents the tones according to a fixed procedure. The method of constant stimuli is time consuming because a number of trials are definitely supra- or infra-threshold and, therefore, do not give much information. Another class of choice methods, called adaptive methods, makes  more efficient use of trials. The experimental series is started with a certain initial value of the stimulus variable. One or more correct responses, depending upon the experimental strategy adopted, result in a change in the stimulus variable that makes it harder for the subject to make a correct choice. If the subject makes one or more false responses, the experimental task is facilitated. In this way, the value of the stimulus variable fluctuates around a certain value, which can be defined to-be the threshold for perception.

Besides choice methods there is the adjustment method. The subject controls the stimulus variable himself, and he uses this control to find an optimal value. This method is not always feasible.The adjustment method is suitable for stimulus variables that allow an optimal quality in perception: the best pitch for a tone in a musical interval, the most comfortable loudness, the greatest similarity or dissimilarity, etc. The optimal adjustment behaves like a stable equilibrium between lower and higher, both suboptimal, adjustments. Adjustment methods have the advantage that the results can be derived directly from the adjusted value, and do not have to be derived indirectly from the psychometric curve.


C. The Ear as a Frequency Analyzer


Only by the ear's capacity to analyze complex sounds are we able to discriminate simultaneous tones in music. Frequency analysis may be con idered the most charac­teristic property of the peripheral ear. The cochlea is divide over its entire length into two parts by the basilar membrane. In 1942 Von Bekesy was the first to observe, with ingenious experimentation, that at every point along its lenghty this membrane vibrates with maximum amplitude for a specific frequency. This finding confirmed the hypothesis, launched 80 years earlier by Helmholtz, that the cochlea performs a frequency analysis. Sound components with high frequencies are represented close to the base; components with low frequencies are represented near the apex of the cochlea. The frequency scale of the sound is converted into a spatial scale along the basilar membrane.

This capacity of the car means that any periodic sound wave or complex tone is resolved into its frequency components, also called partials or barmonics (see Fig. 1). In mathematics the analogous procedure of determining the sinusoidal components of a periodic function is called Fourier analysis. In contrast with the theoretically perfect Fourier analysis, the frequency-analyzing power of the ear is limited: Only the lower harmonics can be analyzed individually.

There are many ways of studying the extent to which the ear can separate simultaneous tones. Only two approaches will be considered here. The first method investigates how many harmonics (with frequencies nf, n = 1, 2, 3, 4, etc.) can be distinguished in a complex tone. This can be done by using the 2AFC procedure: The listener has to decide which of two simple (sinusoidal) tones-one- with frequency nf, the other with frequency (n ± 1/2)f-is also present in the complex tone. The percentage of correct responses varies from 100 for low values of n to about 50 for high values of n. Experiments along these, lines have shown. (Plomp, 1964) that, on the average, listeners are able to distinguish the first five to seven harmonics.

A quite different approach involves measuring the minimum sound pressure level necessary for a probe tone to be audible when presented with a complex, tone. This is the so-called masked threshold; by varying the probe-tone frequency, we obtain the "masking pattern" of the complex tone. In Fig. 2 such a pattern is reproduced. The masking pattern of a complex tone of 500 Hz reveals individual peaks corresponding to the first five harmonics, nicely demonstrating the limited frequency-analyzing power of the ear.

The usual measure indicating how well a system is able to analyze complex signals is its bandwidth. The finding that the fifth harmonic can be distinguished from the fourth and the sixth means that the mutual distance should be a minor third or more. This distance constitutes a rough, general estimate of the bandwidth of the hearing mechanism, known in the psychophysical literature as the critical bandwidth (Fig. 3). A detailed review (Plomp, 1976) revealed that the bandwidth found experimentally is dependent on the experimental conditions. The values may differ by a factor of two.


In the lower frequency region (below 500 Hz) critical bandwidth is more or less constant if expressed in Hz. That means that musical intervals (frequency ratios) larger than the critical bandwidth at high frequencies may fall within the critical bandwidth at lower frequencies.




A. Pitch


Pitcb is the most characteristic property of tones, both simple (sinusoidal) and complex. Pitch systems (like the diatonic-chromatic and the 12-tone system) are among the most elaborate and intricate ever developed in Western and non-Western music. Pitch is related to the frequency of a simple tone and to the fundamental frequency of a complex tone. The frequency of a tone is a property that can usually be controlled in production and is well preserved during its propagation to the listener's cars. '

For our purposes, pitch may be characterized as a one-dimensional attribute, i.e., all tones can be ordered along a single scale with respect to pitch . The extremes of this scale are low (tones with low frequencies) and high (tones with high frequencies). Sometimes tones with different spectral compositions (timbres) are not easily comparable as to pitch. It is possible that the clearness of pitch varies, for example, as a result of important noise components or inharmonic partials, or that the subjective character of the pitch varies, for example, when comparing the pitch of simple and complex-tones. There are a number of subjective pitch scales:

1. The mel scale (see Stevens, Volkmann, & Newman, 1937). A simple tone of 1000 Hz has a defined pitch of 1000 mel. The pitch in mels of other tones with another frequency must be determined by comparative scaling experiments. A sound with a itch subjectively twice that of a 1000 Hz tone is 2000 mel; "half pitch" is 500 mel, etc. Since there is no unambiguous subjective meaning of "a pitch half as high" or "double as high," the mel scale is a rather unreliable scale. It is not used very often.

2. The musical pitcb scale (i.e., the ordinary indications C1, D1, . . . C4,. . . , A4, etc.). These indications are only usable in musical situations.

3. The physical frequency scale in Hz. In psychoacoustical literature the pitch of a tone is often indicated by its frequency or, in the case of complex tones, by its fundamental frequency. Since the correspondence between frequency and pitch is monotonic, frequency is a rough indication of our pitch sensation. It must be realized however, that our perception operates more or less on the basis of a logarithmic frequency scale.

Pitch in its musical sense has a range of about 20 to 5000 Hz, roughly the range of the fundamental frequencies of piano strings and organ pipes. Tones with higher frequencies are audible but without definite pitch sensation. Low tones in the range of 10 to 50 Hz can have the character of a rattling sound. The transition from the perception of single pulses to a real pitch sensation is gradual. Pitch can be perceived after very few periods of the sound wave have been presented to the ear.


Simple tones have unambiguous pitches that can be indicated ,by means of their frequencies. These frequencies may serve as reference frequencies for the pitches of complex tones. The pitch sensation of complex tones is much more difficult to understand than the pitch of simple tones. As was discussed, the first five to seven harmonics of a complex tone can be distinguished individually if the listener's attention is drawn to their possible presence.

 However, a complex tone, as heard in practice, is characterized by a single pitch, the pitch of the fundamental component. This pitch will be referred to as low pitch here.

In psychoacoustical literature this pitch is also known under a variety of other terms, such as periodicity pitch, repetition pitch, residue pitch, and virtual pitch. Experiments (Terhatdt, 1971) have shown that the pitch of a complex tone with fundamental frequency f is somewhat lower than that of a sinusoidal tone with frequency f The existence of low pitch of a complex tone raises two questions. First, why are all components of the complex tones perceived as a perceptual unit; that is, why do all partials fuse into one percept? Second, why is the pitch of this perceptual tone the pitch of the fundamental component?

The first question can be answered with reference to the Gestalt theory of perception. The "Gestalt explanation" may be formulated as follows. The various components of a complex tone are always present simultaneously. We become familiar with the complex tones of speech signals (both of our own speech and of other speakers) from an early age. It would not be efficient to perceive them all separately. All components point to a single source and meaning so that perception of them as a unit gives a simpler view of the environment than separate perception. This mode of perception must be seen as a perceptual learning process. Gestalt psychology has formulated a number of laws that describe the perception of complex sensory stimuli. The perception of low pitch of complex tones can be classed under the heading of the "law of common fate." The harmonics of a complex tone exhibit "common fate."

The second question can also be answered wit the help of a learning process

directed toward perceptual efficiency. The periodicity of a complex tone is the

most constant feature in its composition. The amplitudes of the partials are

subjected to much variation, caused by selective reflection, absorption, passing of

objects, etc. Masking can also obscure certain partials. The periodicity, however,

is a very stable and constant factor in a complex tone. This is reflected in the

wave form built up from harmonics. The periodicity of a complex tone is at the

same time the periodicity of the fundamental component of the tone.

The perception of complex tones can be seen as a pattern recognition process.

The presence of a complete series of harmonics is not a necessary condition for

the pitch recognition process to sue ed. It is sufficient that at least a few pairs of

adjacent harmonics are present so the  the periodicity can be determined. It is

conceivable that there is a perceptual learning process that makes possible the

recognition of fundamental periodicity from a limited number of harmonic

partials. This learning process is based on the same experiences as those that led

to singular pitch perception. Pattern recognition theories of the perception of

low pitch are of relatively recent origin. Several times they have been worked out

in detailed mathematical models that simulate the perception of complex tones

(Gold­stein, 1973; Wightman, 1973; Terhardt, 1974a; see also de Boer, 1976, 1977;

Patterson & Wightman, 1976; Gerson & Goldstein, 1978; Houtsma, 1979;

Piszczalski & Galler, 1979). It will probably take some time before the questions

about the low singular pitch of complex tones are completely solved.

 The classical literature on tone perception abounds with theories based on von Helmholtz's (1863) idea that the low pitch of a complex tone is based on the relative strength of the fundamental component. The higher harmonics are thought only to influence the timbre of the tones but not to be strong enough to affect pitch. However, low pitch perception also occurs when the fundamental component is not present in the sound stimulus. This was already observed by Seebeck (1841) and brought to the attention of the modern psychoacousticians by Schouten (1938). These observations led Schouten to the formulation of a periodicity pitch tbeory. In this theory pitch is derived from the waveform periodicity of the unresolved higher harmonics of the stimulus, the residue. This periodicity does not change if a component (e.g., the fundamental one) is removed. With this theory the observations of Seebeck and Schouten concerning tones without fundamental components could be explained. An attempt has also been made to explain the low pitch of a tone without fundamental ("the missing fundamental") as the result of the occurrence of combination tones, which provide a fundamental component in the inner ear. However, when these combination tones are effectively masked by low-pass noise, the sensation of low pitch remains (Licklider, 1954).

In musical practice complex tones with weak or absent fundamentals are very common. Moreover, musical tones are often partially masked by other tones. These tones can, however, possess very clear low pitches. Effective musical sound stimuli are often incomplete when compared to the sound produced by the source (instrument, voice) dominance region are most influential with regard to pitch. One way of showing this is to work with tones with inharmonic partials. Assume a tone with partials of 204, 408, 612, 800, 1000, and 1200 Hz. The first three partials in isolation would give a pitch of "204 Hz." All six together give a pitch of "200 Hz" because of the relative weight of the higher partials, which lie in the. dominance region. The low pitch of complex tones with low fundamental frequencies (under 500 Hz) depends on the higher partials. The low pitch..of tones with high fundamental frequencies is determined by the fundamental because it lies in the dominance region.

Tones with inharmonic components have been used quite frequently in tone perception research. An approximation of the pitch evoked by them is the fundamental of the least-deviating harmonic series. Assume a tone with components of 850, 1050, 1250, 1450, 1650 Hz. The least-deviating harmonic series is 833, 1042, 1250, 1458, and 1667 Hz, which contains the fourth, fifth, sixth, seventh, and eighth harmonics of a complex tone with a fundamental of 208.3 Hz. This fundamental can be used as an approximation of the pitch sensation of the inharmonic complex (Fig. 4). Let us consider an inharmonic tone with frequency components of 900, 1100, 1300, 1500, 1700 Hz. This tone has an ambiguous pitch, since two approximations by harmonic series are possible, namely one with a fundamental of 216.6 Hz (the component of 1300 Hz being the sixth harmonic in this case) and one with a fundamental of 185.9 Hz (1300 Hz being the seventh harmonic).

If not all partials of a complex tone are necessary for low pitch perception, how few of them are sufficient? The following series of experimental investigations show a progressively decreasing number (see Fig. 5). De Boer (1956) N%,orkcd with five har­monics in the dominant region; Schouten, Ritsma, and Cardozo (1962), with three; Smoorenburg (1970), with two; Houtsma and Goldstein (1972), with one plus one­that is, one partial presented to each ear. In the latter case it is also possible to elicit low pitch perception. The authors concluded that low pitch was a central neural process not brought about by the peripheral sense organ (the ears). The last step in the series should be a low pitch perception evoked by one partial. That this is also possible has been shown by Houtgast (1976). The following conditions have to be fulfilled: The frequency region of the low pitch has to be filled with noise, the single partial must have a low signal-to-noise ratio, and attention has to be directed to the fundamental frequency region by prior stimuli. These conditions create a perceptual situation in which it is not certain that the fundamental is not there so that we are brought to the idea that it should be there by inference from earlier stimuli.


B. Loudness

The  physical correlate that underlies the loudness of a tone is intensity, usually

measure, expressed either relative to a zero level defined in the experimental situation or relative to a general reference sound pressure of 2 x 10-5 N/m2. Sound pressure levels of performed music vary roughly from 40 dB for a pianissimo to about 90 dB for a full orchestral forte-tutti (Winckel, 1962). By means of electronic amplification higher levels are reached in pop concerts. These levels, sometimes beyond 100 dB, are potentially damaging to the ear in case of prolonged presentation (Flugrath, 1969; Rintelman, Lindberg, & Smitley, 1972; Wood & Lipscomb, 1972; Fearn, 1975a,b).

The subjective assessment of loudness is more complicated than the physical measurement of the sound pressure level. Several loudness scales have been proposed during the last decades. None of them, however, can be applied fully satisfactorily in all conditions. We give the following summary review:

01 The sone scale, a purely psychophysical loudness scale (Stevens, 1936). The loudness of a simple (sinusoidal) tone of 1000 Hz with a sound pressure level of 40 dB is defined to be 1 sone; a tone with double loudness is assigned the loudness of 2 sones, etc. In general; a sound of X sones is n times louder than a sound of X / n sones. The experimental determination of the relationship between the physical sound level and the psychophysical loudness is not very reliable because of the uncertainty of what is actually meant by "X times louder."


02 The sone scale, a mixed physical -psychophysical loudness scale with scale values expressed in dB and, therefore, termed loudness level (LL). The loudness level of a sound in phones is equal to the sound pressure level of a 1000 Hz tone with the same loudness. For tones of 1000 Hz the identity relation SPL = LL holds. The loudness level of simple tones with other tones with other frequencies and of complex tones or other sounds (noises, etc.) is found by comparison experiments, which can be done with acceptable reliability. These comparisons may be used to draw contours of equal loudness as a function of, for example, frequency.

03 The sensation-level scale, also a mixed scale. Sensation level is defined as the sound pressure level relative to threshold level and, as such, is also expressed in dB. II may differ as a function of frequency or other characteristics of a sound but also from subject to subject.

04 In many papers on psychoacoustics no loudness indications are given. Instead, physical levels are mentioned. For the investigator this is the most precise reference and at the same time a rough indication of subjective loudness.

In the description of the relation between sound pressure level and loudness, a cleat distinction must be made between sounds with all spectral energy within one critical band and sounds with spectral energy spread over more than one critical band. If al sound energy is limited to one critical band; the loudness L in sones increase monotonically with intensity I. The relation is often approached by the equation


L = kl          , [n may be 0.33]

in which k and n are empirically chosen constants. A consequence of this relation is the rule that equal intensity ratios result in equal loudness ratios. Now, an intensity ratio is a fixed level difference (dB) ...

This have been much interested in the level difference that results in doubling or halving loudness, and many experiments have been carried out to establish this. The outcomes of these experiments are disappointingly dissimilar. Stevens (1955) sum­marized all experiments known to him with the median value of 10 dB for doubling loudness, later (1972) modified to 9 dB. These values correspond to values of n = 0.3 and n = 0.33 for the exponent in the formula. It is also possible to interpret the subjective loudness judgment as an imaginary judgment of distance to the sound source. In this theory (Warren, 1977) half loudness must correspond to double distance, which gives, in. free field conditions, a decrease of 6 dB sound pressure level. Warren conducted experiments in which this value is indeed found.

The assessment of loudness is a complicated matter if sound energy is present in more than one critical band. This situation is the common one for musical tones, especially for chords, and music played by ensembles, choirs, and orchestras. Total loudness is greater than when the same amount of sound energy is concentrated within one critical band. A number of models have been proposed that intend to be simulations of the perceptual processes involved and the parameters of which have been assigned values in accordance with psychophysical experiments. Well known are the models by Stevens (1955), Zwicker, Flottorp, & Stevens (1957), Zwicker and Scharf (1965), and Stevens (1972). These models have also been applied to musical sounds, especially to organ tones (Churcher, 1962; Pollard, 1978a,b).

Although loudness variations play an important role in music, they are less important than pitch variations. The number of assignable loudness degrees in music is limited to about five, coded musically from soft to loud as pianissimo, piano, mezzo­forte, forte, and fortissimo. The definition of these loudness degrees is rather imprecise (Clark & Milner, 1964; Clark & Luce, 1965; Patterson, 1974). Judgment of musical loudness cannot have the degree of reliability 'and preciseness that is possible with the judgment of (relative) pitch, duration, tempo, etc. 'this is a consequence of the fact that the underlying physical dimension, intensity, is hard to control precisely. Sources of variation are encountered in sound production, in the fixed acoustic condi­tions of a room (absorption and thus attenuation by walls, floor, ceiling, etc.), in variable acoustic conditions (like the presence or the absence of an audience, the relative positions of sound source and listener, disturbing external noises), and in the audiograms of the listeners. In all the stages on the road from sound production to sound perception, sound pressure level is liable to be altered whereas frequency is not.


C. Timbre

Timbre is, after pitch and loudness, the third attribute of the subjective experience of musical tones. Subjectively, timbre is often coded as the function of the sound source or of the meaning of the sound. We talk about the timbre of certain musical instruments, of vowels, anti of sounds that signify certain events in our environment (apparatus, sounds from nature, footsteps, etc.).

What are the physical parameters that contribute to the perception of a certain timbre? In a restricted sense timbre may be considered the subjective counterpart of the spectral composition of tones. Especially. important is the relative amplitude of the harmonics. This view was first stated by Helmholtz over a century ago and is reflected by the definition of timbre according to the American Standards Association (Acoust. Terminology S1.1., 1960):

"Timbre is that attribute of auditory sensation in terms of which a listener can judge that two steady-state complex tones having the same loudness and pitch are dissimilar."

 Recent research has shown that temporal characteristics of the tones may have a profound influence on timbre as well, which has led to a broadening of the concept of timbre (Schouten, 1968). Both onset effects  (rise time, presence of noise or inharmonic partials during onset, unequal rise of partials, characteristic shape of rise curve, etc.) and steady state effects (vibrato, amplitude modulation, gradual swelling, pitch instability, etc.) are important factors in the recognition and, therefore, in the timbre of tones. Experiments (Clark, Robertson, & Luce, 1964; Berger, 1964; Saldanha & Corso, 1964) have shown that the identification of instrumental sounds is impaired when temporally characteristic parts of tones (especially the onsets) are removed.

Sounds cannot be ordered on a single scale with respect to timbre. Timbre is a multidimensional attribute of the perception of sounds. Dimensional research is highly time-consuming and is therefore always done with a restricted set of sound stimuli. The dimensions found in such an investigation are of course determined by the stimulus set.

Dimensional research of timbre leads to the ordering of sound stimuli on the dimensions of a timbre space. An example of such research is that by Von Bismarck (1974a,b). His stimulus set contained a large number (35) of tone and noise stimuli. The most important factors found by him can be characterized as follows:

(a) sharpness, determined by a distribution of spectral energy that has its gravity point in the higher frequency region and

(b) compactness, a factor that distinguishes between tonal (compact) and noise (not compact) aspects of sound.

In some investigations sound stimuli have been submitted to multidimensional scaling, both perceptual and physical. The physical scaling can be based on the spectral composition of the sounds, as was done in Plomp's (1979) experiments with tones from a number of organ stops. Figure 6 gives the two-dimensional representation of 10 sounds, both perceptual and physical. The representations correspond rather well, leading to the conclusion that in this set of stimuli the sound spectrum is the most important factor in the perception of timbre.

Other examples of dimensional research on timbre are the investigations by Plomp (1970), Wedin and Goude (1972), Plomp and Steeneken (1973), Miller and Carterette (1975), Grey (1977), and de Bruijn (1978).




A. Beats and Roughness


In this and the following sections we will discuss perceptual phenomena that occur as the result of two simultaneous tones. We will call the simultaneously sounding for the primary tones.

We consider first the case of two simultaneous simple tones. Several conditions can be distinguished, depending on frequency difference (Fig. 7). If the two prima tones have equal frequencies, they fuse into one tone, in which the intensity depends on the phase relation between the two primary tones. If the tones differ somewhat frequency, the result is a signal with periodic amplitude and frequency variation with a frequency equal to the frequency difference.


The amplitude variations, however, can be considerable and result in a fluctuating intensity and perceived loudness. These loudness fluctuations are called beats, if they can be discerned individually by the ear, which occurs if their frequency is less than about 20 Hz. A stimulus equal to the sum of two simple tones with equal amplitudes and frequencies f and g


p(t) = sin 2šft + sin gt


can be described as


p(t) = 2 cos 2š 1/2(g - f)t  x  sin 2š 1/2(f + g)t


This is a signal with a frequency that is the average of the original primary frequen­cies, and an amplitude that fluctuates slowly with a beat frequency of g-f Hz (Fig. 8). Amplitude variation is less strong if the two primary tones have different amplitudes.

When the frequency difference is larger than about 20 Hz, the ear is no longer able to follow the rapid amplitude fluctuations individually. Instead of the sensation of fluctuating loudness, there is a rattle-like sensation called roughness. Beats and roughness can only occur if the two primate tones are not resolved by the ear that means. not processed separately but -combined). If the frequency difference is larger than the critical band, the tones are perceived individually with no interference-phenomena.

In musical sounds beats can occur with just noncoinciding harmonics of mistuned consonant intervals of complex tones. If the fundamental frequencies of the tones of an octave (theoretically 1:2) or fifth (2:3) differ a little from the theoretical ratio, there will be harmonics that differ slightly in frequency and will cause beats. These beats play an important role when tuning musical instruments.


No psychophysical research has been done on mistuned intervals of complex tones, but to a certain extent psychophysical results found with two beating simple tones and with amplitude-modulated simple tones (see Fig. 9) can be applied to the perception of beating mistuned intervals of complex. tones (Zwicker, 1952; Terhardt, 1968a,b, 19746). The following relations can be stated. Thresholds vary with beat frequency. There appears to be a minimum at about 5 to 10 Hz. The threshold decreases when the sound pressure level increases. It is possible to define perceptual quantities called beating strength and roughness strength and to determine their values as a function of stimulus characteristics. Research following this line has shown that such a quantity increases with modulation depth and with sound pressure level. Moreover, there seems to be a modulation frequency giving maximal roughness (about 50 to 70 Hz).



B. Combination Tones


Two simple tones at a relatively high sound pressure level and with a frequency difference that is not too large can give rise to the perception of so-called combination tones. These combination tones arise in the ear as a product of nonlinear transmission characteristics. The combination tones are not present in the acoustic signal. However, they are perceived as if they were present. The ear cannot distinguish between perceived components that are "real" (in the stimulus) and those that are not (combination tones). The combination tones are simple tones that may be cancelled effectively by adding a real simple tone with the same frequency and amplitude but opposite phase. This cancellation tone can be used to investigate combination tones.

The possible frequencies of combination tones can be derived from a general transmission function. Assume a stimulus with two simple tones:

p(t) = cos 2šft + cos 2šgt


f and g being the two frequencies. Linear transmission is described by

d=ap+c (a and c being constants). If transmission is not linear, higher order components are introduced:

d=atp+a2p2+a3p3+... The quadratic term can be developed as follows:                                                        2

p2 = (cos 2pft + cos 2pgt)   =

= 1 + 1/2 cos 2š2ft. + 1/2 cos 2š2gt + cos 2š(f+g)t + cos 2š(f - g)t


It can be seen that components with frequencies 2f, 2g, f + g, and f - g are introduced in this way. Similarly, the cubic term can be developed:


p3 = (cos 2šft + cos 2gt) =

=9/4 cos 2š ft + 9/4 cos 2šgt +1/4cos 2š 3 ft +

+ 1/4cos 2š3gt + 3/4 cos 2š(2f + g)t + 3/4 cos 2š(2g + f)t +

+ 3/4 cos 2š(2f - g)t + 3/4 cos 2š (2g - f)t


This term is responsible for components with frequencies 3f 3g, 2f + g, 2g + f, 2f - g, 2g - f. The higher terms of the nonlinear transmission formula can be worked out analogously. The factors just preceding the cosine terms indicate the relative amplitudes of the components in their groups. Psychoacoustical research on combination tones has shown that the pitches of the combination tones agree with the frequencies predicted by nonlinear transmission (Plump, 1965; Smoorenburg, 1972a,b; Hall, 1975; Weber & Mellert, 1975; Schroeder, 1975b; Zurek & Leskowitz, 1976). However, the correspondence between the relative amplitude predicted and the subjective loudness measured is far from perfect. Clearly, the phenomenon of combination tones is is more complicated than can be described in a simple formula. Moreover, there are individual differences, which should be expected since this is a distortion process. Experiments have shown (see Fig. 10)

that the following combination tone frequencies are the most important: the so-called diference tone with frequency g - f Hz, the second-order difference tone with frequency 2f - g Hz, and the third-order difference tone with frequency 3f - 2g Hz. The diagram illustrates that the combination tones arc stronger for small frequency differences of the primary tones than for large differences; this indicates that the origin of combination tones is tightly connected with the frequency-analyzing process in the inner ear. It should be noted that the importance of summation tones (with frequency f + g) and the so-called aural harmonics (with frequencies 2f, 3f, etc., and 2g, 3g, etc.) is questionable. Although combination tones were discovered by musicians in musical contexts (Tartini and Sorge in the eighteenth century), their significance for music is not very high. They can be easily evoked by playing loud tones in the high register on two flutes or recorders or double stops on the violin.

In a normal listening situation,however, their levels are usually low to attract attention. Moreover, they will be masked by the tones of other (lower) instruments. Some violin teachers (following Tartini) advise the use of combination tones as a tool for controlling the intonation of double-stop intervals. Because audible combi­nation tones behave more as simple tones in lower frequency regions than the complex tones to be intonated, a pitch comparison of combination tones and played tones should not be given too much weight.


C. Consonance and Dissonance


The simultaneous sounding of several tones may be pleasant or "euphonious" to varying degrees. The pleasant sound is called consonant; the unpleasant or rough one, dissonant. The terms consonance and dissonance have been used here in a perceptual or sensory sense. This aspect has been labeled tonal consonance (Plump & Levelt, 1965) or sensory consonance (Terhardt, 1976), to be distinguished from consonance in a musical situation. Musical consonance has its roots in perceptual consonance, of course, but is dependent on the rules of music theory, which, to a certain extent, can operate independently from perception.


The perceptual consonance of an interval consisting of two simple tones depends directly a on the frequency difference between the tones, not upon the frequency  ratio (or musical interval). If the frequency separation is very small or large (more than critical bandwidth-the tones not interfering with each other), the two tones together sound consonant. Dissonance occurs if the frequency separation is less than a critical bandwidth (see Fig.11). The most dissonant interval arises with a frequency separation of about a quarter of the critical bandwidth: about 20 Hz in low-frequency regions, about 4% (a little less than a semitone) in the higher regions (Fig. 12). The frequency separation of the minor third (20%), major third (25%), fourth (33%). fifth (50%), arid so on is usually enough to' give consonant combination of simple tones. However, if the frequencies are low, the frequency separation of thirds (and eventually also fifths) is less than critical bandwidth so that even these intervals cause a dissonant beating. For this reason, these consonant intervals are not used in the bass register in musical compositions.


The consonance of intervals of complex tones can be derived from the consonances of the simple-tone combinations comprised in them. In this case the dissonance is the additive element. The dissonance of all combinations of neighboring partials can be determined and added to give the total dissonance and, inversely, the total consonance of the sound. Sounds with widely spaced partials, such as clarinet tones (with only the odd harmonics) are more consonant than sounds with narrowly spaced partials. The composition of the plenum of an organ is such that the partials are widely spaced throughout the spectrum. Some mathematical models have been worked out that describe the dissonance of a pair of simple tones and the way in which the dissonances of partial pairs in tonne complexes have to be added (Plump & Levelt, 1965; Kamcoka & Kuriyagawa, 1969a,l); Hutchinson, 1978). As far as can be decided, these models give a good picture of consonance perception.

The consonance of a musical interval, defined as the sum of two complex tones with a certain ratio in fundamental frequency, is highly dependent on the simplicity of the frequency ratio. Intervals with frequency ratios that can be expressed in small integer numbers (say, less than 6) are relatively consonant because the lower, most important components of the two tones are either widely apart or coincide. If the frequency ratio is less simple, there will be a number of partials from the two tones that differ only a little in frequency, and these partial pairs give rise to dissonance. It seems that intervals with the number 7 in their frequency proportions (7/4, 7/5.... ) are about on the borderline between consonance and dissonance.


Experiments with inharmonic partials (Slaymaker, 1970; Pierce, 1966) have shown that is not necessarily on the simple frequency ratio between the fundamental frequencies (which is usually the cause of the coincidence).

If the number of partials in a complex tone increases or if the strengths of the higher harmonics (with narrow spacing) increase, the tone is perceived as more dissonant (compare the trumpet with the flute, for instance). However, the nth partial is required in order to make an interval with frequency ratio n : m or m : n relatively consonant. For example, if the fifth harmonic is absent, the usual beating (dissonance) of a mistuned major third (4:5) will be absent (see also Fig. 12).

Musical consonance in Western polyphonic and harmonic music is clearly based on perceptual consonance of complex (harmonic) tones. Intervals with simple frequency ratios are consonant. Intervals with nonsimple frequency ratios are dissonant. The way in which consonance and dissonance are used in music theory and composition varies considerably from one historical period to another.




More than a century ago von Helmholtz published his classic volume On the Sensations of Tone (1863). The subtitle specifically indicates the intention of this study: As a Physiological Basis for the Theory of Music." For Helmholtz the theory of music (as a compendium of rules that control composition and as such the musical sound stimulus) could only be understood fully if it could be shown that its elements had their origin in the perceptual characteristics of our hearing organ. Helmholtz's working hypothesis has been put aside by later investigators, both those who worked in music and those who worked in psychoacoustics. Several reasons for this can be given. First, before the introduction of electroacoustic means of tone production and control in the 1920s, it was not possible to carry out the necessary psychoacoustical experiments, while Helmholtz's observations proved to be insufficient in many ways. Second, it turned out that music theory has its own rules apart from the perceptual relevance of the characteristics of the sounds that it creates. Therefore, it is not clear, neither for the music theorist nor for the psychoacoustician, which aspects of music theory should be subjected to psychoacoustical research and which should not. For­tunately, in recent years much research has been initiated that is aimed at the investi­gation of the relationship between musical-theoretical and perceptual entities. For the time being, no complete view can be given, but there may come a time in which Helmholtz's ideas on the relation between the properties of our perceptual processes anti the elements of musical composition can receive new, more complete and exact formulations than was possible a century ago.




-Berger, K. W. Some factors in the recognition of timbre. Journal of the Acoustical Society of America, 1964, 36,

 R. A. Rasch and R. Plomp -Bismarck, G. von. 'timbre of steady sounds: A factorial investigation of its verbal attributes. 'Acustica, 1974, 30, 146-159. -Bismarck, G. von. Sharpness as an attribute of the timbre of steady sounds. Acustica, 1974, 30, 159-172. Boer, E. de. On the 'residue' in hearing. Dissertation, Amsterdam, 1956.

Boer, L:. dc. On the 'residue' and auditory pitch perception. In W. D. Keidel & W. D. Neff (U-s.), Handbook of sensory physiology. (Volume V, Auditory system, Part 3, Clinical and special topics) Berlin: Springer-Verlag, 1976. Pp. 479-583.

Boer, E. de. Pitch theories unified. In E. F. Evans & J. 1'. Wilson (Eds.), Psycbopbysics and physiology of hearing. New York: Academic Press, 1977. Pp. 323-335.

Bruijn, A. de. Timbre-classification of complex tones. Acustica, 1978, 40, 108-114.

Churcher, B. G. Calculation of loudness levels for musical sounds. Journal of the Acoustical Society of America, 1962, 34, 1634-1642.

Clark, M., & Luce, D. Intensities of orchestral instrument scales played at prescribed dynamic markings. Journal of the Audio Engineering Society, 1965, 13, 151-157.

Clark, M. Jr., & Milner, P. Dependence of timbre on the tonal loudness produces by musical instruments. Journal of the Audio Engineering Society, 1964, 11, 28-31.

Clark, M. Jr., Robertson f., & Luce, 1). A preliminary experiment on the perceptual basis for musical instrument families. Journal of the Audio Engineering Society, 1964, 12, 199-203.

Evans, E. F., & Wilson, J. I'. (I-As.), Psycbopbysics and physiology of bearing. New York: Academic Press, 1977.

-Fearn, R. W. Level limits on pop music.-Journal of Sound arid Vibration, 1975, 38, 591-591. (a)

-Fearn, R. 1f. Level measurements of music. Journal of Sound and Vibration, 1975, 43, 588-591. (b) Flugrath, J. M. Modern-day rock-and-roll music and damage-risk criteria. Journal of the Acoustical Society of America, 1969, 45, 704-711.

Gerson, A., & Goldstein, J. L. Evidence for a general template in central optimal processing for pitch of complex tones. Journal of the Acoustical Society of America, 1978, 63, 498-510.

Goldstein, J. L. An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America, 1973, 54, 1496-1516.

Green, D. M. An introduction to bearing. Hillsdale, New York: Lawrence Erlbaum, 1976.

-Grey, J. M. Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 1977, 61, 1270-1277.

Hall, J. L. Nonmonotonic behavior of distortion product 2f,-f,: Psychophysical observations. Journal of the Acoustical Society of America, 1975, 58, 1046-1050.

Helmholtz, H. von. Die Lebre von den Tonempfndungen ads pbysiologische Grundlage fir die Tbeorie der Musik (Secbste Ausg.). Braunschweig: Vicweg, 1913 (Ist ed., 1863). Translated by A. J. Ellis as: On the sensations of tone as a physiological basis for the theory of music. London: Longmans, Green, 1885. (1st ed., 1875; reprint of the 1885 ed., New Y ork: Dover, 1954).

Houtgast, T. Subharmonic pitches of a pure tone at low S/N ratio. Journal of the Acoustical Society of America, 1976, 60, 405-409.

Houtsma, A.J.M. Musical pitch of two tone complexes and predictions by modern pitch theories. Journal of the Acoustical Society of America, 1979, 66, 87-99.

Houtsma, A.J.M., & Goldstein, J. L. The central origin of the pitch of complex tones: Evidence from musical interval recognition. Journal of the Acoustical Society of America, 1972, 51, 520-529. Hutchinson, W., & Knopoff, L. The acoustic component of Western consonance. Interface, 1978, 7, I-29. -Kamcoka, A., & Kuriyagawa, M. Consonance theory, Part 1: Consonance of dyads. Journal of the Acoustical Society of America, 1969, 45, 1451-1459. (a)

-Kamarka, A., & Kuriyagawa, M. Consonance theory, Part 11: Consonance of complex tones and its calculation method. Journal of the Acoustical Society of America, 1969, 45, 1460-1469. (b)

Licklider, J.C.R. 'Periodicity' pitch and 'place' pitch. Journal of the Acoustical Society

of America, 1954, 16, 945.

Miller J. R., & Carterette, E. C. Perceptual space for musical structures. Journal of the Acoustical

Society of America, 1975, 58, 711-720.

Patterson, R. D., & Wrightman, F. L. Residue pitch as a function of component spacing. Journal of

the Acoustical Society of America, 1976, 59, 1450-1459.

-Pierce, J. R. Attaining consonance in arbitrary scales. Journal of the Acoustical Society of America,

1966, 40, 249.

Piszczalski, M., & Galler, B. A. Predicting musical pitch from component frequency ratios. Journal of

the Acoustical Society of America, 1979. 66, 710-720.

Plomp, R. The ear as a frequency analyzer. Journal of the Acoustical Society of America, 1964, 36, 1628-1636. Plump, R. Detectability threshold for combination tones. Journal of the Acoustical Society of America, 1965, 37, 1110-1123.

Plump, R. Pitch of complex tones. Journal of the Acoustical Society of America, 1967, 41, 1526-1533. Plump, R. Timbre as a multidimensional attribute of complex tones. In R. Plump & G. F. Smoorenburg (Eds.), Frequency analysis and periodicity detection in bearing. Leiden: Sijthoff, 1970. Pp. 397-414.

Plump, R. Auditory psychophysics. Annual Review of Psycholagy, 1975, 16, 207-232. Plump, R. Aspects of tone sensation. New York: Academic Press, 1976.

Plump, R. Fysikaliska motsvarigheter till klanf5rg hos stationara Ijud. In Vkr boned ocb musiken. Stockholm: Kungl. Musikaliska Akadcmicn, 1979.

-Plump, R., & Levelt, W.J.M. Tonal consonance and critical bandwidth. Journal of the Acoustical Society of • America, 1965, 38, 548-560.

Plump, R., & Smoorenburg, G. F. (Lds.), Frequency analysis and periodicity detection in bearing. Leiden: Sijthoff, 1970.

Plomp, R., & Steenekcn, H.J.M. Place dependence of timbre in reverberant sound

fields. Actutica, 1973, 28, 49-59.

Pollard, H. F. Loudness of pipe organ sounds. 1. Plenum combinations. Acustica,

1978, 41, 65-74. (a) Pollard, H. F. Loudness of pipe organ sounds. 11. Single

notes. Acustica, 1978, 41, 75-85. (b) Rintelmann, W. F., Lindberg, R. F., & Smitley,

E. K. Temporary threshold shift and recovery patterns from two types of rock

and-roll, music presentation. Journal of the Acoustical Society of America, 1972, 51,


Ritsma, R. J. Frequencies dominant in the perception of the pitch of complex

sounds. Journal of the Acoustical Society of America, 1967, 42, 191-198.

Roederer, J. G. Introduction to the pbysia and psycbopbysics of music. New York and

Berlin: Springer, 1974 (2nd ed., 1975).

Saldanha, E. L., & Corso, J. F. Timbre cues and the identification of musical

instruments. Journal of the Acoustical Society of America, 1964, 36, 1021-2026.

Schouten, J. F. The perception of subjective tones. Proceedings of the Koninklijke

Nederlandse Akademie van Wetenscbappen, 1938, 41, 1083-1093.

Schouten, J. R., Ritsma, 12. J., & Cardow, B. L. Pitch of the residue. Journal of the

Acoustical Society of America, 1961, 34, 1418-1424.

Schouten, J. F. The perception of timbre. In Report of the Sixth International Congress

on Acoustics, Tokyo, Paper GP-6-2, 1968.

-Schroeder, M. R. Models of hearing. Proceedings of the IEEE, 1975, 63, 1332-1350.

Schroeder, M. R. Amplitude behavior of the cubic difference tone. Journal of the

Acoustical Society of America, 1975, 58, 728-732. (b)

Schubert, E. D. (Ed.) Prycbological acoustics. Stroudsburg, Pennsylvania: Dowden,

1979 (Benchmark Papers in Acoustics 13).

Seashore, C. E. Psychology of music. New York: McGraw-Hill, 1938 (Reprint New York: Dover, 1967). Seebeck, A. Beobachtungen 0bcr cinige Bedingungen der Entstchung von Tiinen. Annalen der Physik used, Cbemie, 1841, 53, 417-436.

-Slaymaker, F. H. Chords from tones having stretched partials. Journal of the Acoustical Society of America, 1970, 47, 1569-1571.

Smaorenburg, G. F. Pitch perception of two-frequency stimuli. Journal of the Acoustical Society of America, 1970, 48, 924-942.

Smoorrenburg, G. F. Audibility region of combination tones. Journal of the Acoustical Society of America,