Pitch
perception models - a historical review
Alain de
Cheveign´e
CNRS - Ircam,
cheveign@ircam.fr
Abstract
This
paper analyzes theories and models of pitch from
a
historical perspective. Pythagoras is credited with
the
first ”psychophysical” model, the monochord, that
he
used to formulate a law that links a physical quantity
(ratio
of string lengths) to a psychological quantity
(musical
interval). The relation between pitch itself
and
frequency emerged progressively with Aristoxenos,
Boethius,
Mersenne and Galileo. The anatomist Du Verney
first
proposed the idea of resonance within the ear,
and
of a ”tonotopic” projection from the ear to the brain.
The
notion of frequency analysis, formalized mathematically
by
Joseph Fourier, was developed by Helmholtz into
a
beautiful theory of hearing that bridged mathematics,
physiology
and music. Helmholtz unfortunately followed
Ohmin
postulating that pitch is determined by one particular
frequency
component, the fundamental, thus sparking
a
controversy that has drained energy of hearing scientists
for
decades, opposing tenants of ”spectral pitch”
to
tenants of ”temporal pitch”. Today the terms of the
disagreement
have shifted, and the disagreement is now
between
models based on ”pattern matching” (originated
by
de Boer but already hinted to by Helmholtz) and those
based
on ”autocorrelation” (originated by Licklider, but
already
implicit in earlier work). Despite the disagreements,
there
are deep connections between these various
theories
of pitch, and between them and the many methods
that
have been proposed for the artificial equivalent
of
pitch perception: fundamental frequency estimation.
Using
a historical perspective I will try to make apparent
these
relations between models and methods. The aim
is
to help us go beyond the controversies and develop a
better
understanding of how we perceive pitch.
1.
Introduction
The
history of yesterday’s ideas suggests that today’s
might
not last, and that better ones await us in the future.
By
looking carefully at theories that did not survive, we
may
learn to identify the weak points of our own theories
and
fix them. The historical perspective has other virtues.
Among
factors that slow down progress in Science, Boring
cites
the need to conform to the “Zeitgeist”, the spirit
of
the times [1]. Another factor is controversy that may
lock
progress into sterile argument. History serves as
an
antidote to these factors. Models are often reincarnations
of
older ideas, themselves with roots deeper in
time.
By digging up the roots we can see the commonalities
and
differences between successive or competing
models.
Anyone who likes ideas will find many good
ones
in the history of science.
Some
early theories focused on explaining
consonance
and
musical scales
[2],
others on the
physiology
of
the ear
[3],
and others again on the
physics of sound
[4,
5, 6]. Certain thinkers, such as Helmholtz [7], have
tried
to address all these aspects, others were less ambitious.
Music
once constituted a major part of Science,
and
theories of music were theories of the world. Today,
music
and science go each their own way, and the goal
of
hearing science is more modestly to explain how we
perceive
sound. However, music is still an important part
of
our auditory experience, and, historically, theories of
hearing
have often been theories of musical pitch.
Today,
two competing explanations of pitch prevail:
autocorrelation
and pattern-matching,
that inherit from
the
rival theories of place and time, themselves rooted in
early
concepts of resonance and time interval. Autocorrelation
and
pattern-matching each have variants. The historical
perspective
reveals both their unity and the originalities
of
each, and suggests directions in which future
models
might evolve. This paper is a short version of an
upcoming
chapter on pitch perception models [8].
2. Resonance
2.1.
Interval and ratio, pitch and frequency
Pythagoras
(6th century BC) is credited for relating
musical
intervals
to ratios
of string length on a monochord
[6].
The monochord consists of a board with two bridges
between
which a string is stretched. A third bridge divides
the
string in two parts. Intervals of unison, octave,
fifth
and fourth arise for length ratios of 1:1, 1:2, 2:3,
3:4,
respectively. The monochord can be seen as an early
example
of a psychophysical model, in that it relates the
perceptual
property of musical interval to a ratio of physical
quantities.
The physics of the model were quickly
occluded
by the mathematics or mystics of the
numbers
involved
in the ratios [2]. Ratios of numbers between 1
and
4 were taken to govern both musical consonance and
the
relations between heavenly bodies. Aristoxenos (4th
century
BC) disagreed with the Pythagoreans that numbers
are
relevant to music, and instead argued that musical
scales
should be defined based on what one hears [9].
Two
millenia later, Descartes made the same objection to
Mersenne
[2]). In 1581 the role of number was also challenged,
from
a different perspective, by Vincenzo Galilei
(father
of Galileo). Using weights to vary the tension of
a
string, he found that the abovementioned intervals arise
for
ratios of 1:1, 1:4, 4:9, and
These
ratios are different from those found for length:
they
are more complex, and don’t agree with the importance
that
the Pythagoreans gave to numbers from 1 to
4.
Deciding the respective roles of mathematics, physics,
and
perception in the “laws” of music is still a problem
today.
In
addition to interval, the Greeks had the concept of
pitch, a quantity by which
sounds can be ordered from
grave
to acute [9]. They probably associated it with rate,
but
semantic overlaps between rate (of vibration), speed
(of
propagation) and force (of excitation) makes this unsure.
The
relation between ratios of string length and
ratios
of vibration frequency was established by Galileo
Galilei
[11], whereas Mersenne [12], using strings long
enough
to count vibrations, determined the actual frequencies
of
each note of the scale. This provided a relation
of
pitch with number that was firmly grounded in
the
physics of sound.
2.2.
Sympathetic resonance in the ear
A
string produces musical sounds, but it can also vibrate
in
sympathetic resonance as noted by Aristotle [10]. The
perception
of like by like was a common notion, and so
the
concept of resonance has been used in theories of
hearing
from antiquity onwards [1, 3, 6].
In
1683, Du Verney proposed that the bony spiral
lamina
within the cochlea serves as a resonator:
“.
. . being wider at the start of the first turn than the end
of
the last . . . the wider parts can be caused to vibrate
while
the others do not . . . they are capable of slower vibrations
and
consequently respond to deeper tones . . . in
the
same way as the wider parts of a steel spring vibrate
slowly
and respond to low tones, and the narrower parts
make
more frequent and faster vibrations and respond to
sharp
tones . . . according to the various motions of the
spiral
lamina, the spirits of the nerve which impregnate
its
substance [that of the lamina] receive different impressions
that
represent within the brain the various aspects
of
tones”
[13].
This paragraph concentrates several key
concepts
of place theory: frequency-selective response,
tonotopy, and tonotopic
projection to the brain. Subsequent
progress
of the resonance theory is recounted in
[3].
In 1758, Le Cat [14] proposed that the basilar membrane
is
constituted of strings like those of a harpsichord,
and
Helmholtz later used a similar metaphor.
2.3.
Superposition and Ohm’s law
Mersenne
reported that he could hear within the sound of
a
string, or a voice, up to five pitches corresponding to the
fundamental,
the octave, the octave plus fifth, etc. [12].
He
knew also that a string can respond sympathetically
to
higher harmonics, and yet he found it hard to accept
that
it could vibrate simultaneously at all those frequencies.
This
was easier for a younger mind such as that
of
Sauveur, who in 1701 coined the terms “fundamental”
and
“harmonic” [15]. The physics of string vibration
were
worked out in the 18th century by a succession of
physicists:
Taylor, Daniel Bernoulli, Lagrange, dAlembert,
and
Euler [4]. Euler in particular, by introducing
the
concept of linear superposition, made it easy to understand
the
multiple vibrations of a string that had so
troubled
Mersenne.
Mersenne
and Galileo usually conceived of vibrations
as
merely being periodic, without regard to their shape,
but
18th century physicists found that solutions were often
easy
to derive if they assumed “pendular” (sinusoidal)
vibrations.
For linear systems, they could then extend the
solutions
to any sum of sinusoids thanks to Euler’s principle.
A
wide variety of shapes can be obtained in this
way,
meaning that the method was quite general. That
any
shape can
be obtained in this way was proved in
1820
by Fourier [16]. In particular, any periodicwave can
be
expressed as the superposition of sinusoids with periods
that
are integer fractions of the fundamental period.
Fourier’s
theoremhad a tremendous impact onmathematics
and
physics.
Up
to that point, pitch had been closely associated
with
progress in the physics of periodic vibration, and
it
seemed obvious that this new tool must somehow be
relevant
to pitch. In 1843 Ohm formulated a law, later
rephrased
and clarified by Helmholtz, according to which
every
pitch corresponds to a sinusoidal partial within the
stimulus
waveform. For Ohm, the presence of a partial
was
ascertained by applying Fourier’s theorem, and
Helmholtz
proposed that the same operation is approximated
by
the cochlea[17, 7].
Ohm’s
law extended the principle of linear superposition
to
the sensory domain. Just as a complex waveform
is
the sum of sinusoids, so for Helmholtz the sensation
produced
by a complex sound such as a musical note was
“composed”
of simple sensations, each evoked by a partial.
In
particular, he associated the main pitch of a musical
tone
to its fundamental partial.
2.4.
The missing fundamental
Ohm’s
law is the result of a choice. Mersenne had given
little
attention to the shape of periodic vibrations which
he
had no means to observe. His law relating frequency
to
pitch did not mention shape. However, Fourier’s theorem
now
implied that, depending on its shape, a vibration
might
contain several sinusoidal partials, each with a different
frequency.
This raised an obvious question: does
pitch
relate (a) to the period of the vibration as a whole,
or
(b) to the period of one of the partials? If (b) is true,
then
Fourier analysis is required to determine pitch, if (a)
it
is unnecessary. Ohm chose (b).
Seebeck
had already addressed the question experimentally,
using
a siren to produce periodic stimuli with
several
pulses irregularly spaced within a period[1]. Regardless
of
the number of pulses, pitch followed the fundamental
period,
consistent with (a). Furthermore, by
applying
Fourier’s theorem to the waveform, Seebeck
showed
that pitch salience did not depend on the relative
amplitude
of the fundamental partial, which for some
pulse
configurationswas very small. Since the same pitch
was
also heard when the stimulus contained only that partial,
he
could conclude that
pitch does not depend on a
particular
partial.
This contradicted (b). Low pitch in
the
absence of a fundamental partial was already known
from
earlier work on beats [18].
Nevertheless,
Ohm chose (b) and Helmholtz endorsed
this
choice. Many authors have puzzled over
the
Seebeck-Ohm-Helmholtz controversy and the reasons
why
Helmholtz did not take seriously Seebeck’s
arguments[1,
19, 20, 21]. One reason was no doubt that,
by
extending Ohm’s law to upper harmonics, Helmholtz
could
explain the higher pitches that some people (among
which
Mersenne and himself) occasionally heared. One
can
speculate that additional reasons were the conviction
that
a theorem as powerful as Fourier’s must be relevant,
and
the desire to ensure that the parts of his monumental
theory
would fit together.
Helmholtz
had three options to address the missing
fundamental
problem without renouncing his theory, two
of
which he used. The first was to invoke
nonlinear distortion
in
sound-producing apparatus or in the ear. As an
explanation
of periodicity pitch, that hypothesiswas quite
weak
already at the time, as argued by Helmholtz’s translator,
Ellis
[7]. However it took over sixty years before
Schouten
and Licklider laid the explanation to rest. With
an
optical siren, Schouten produced a complex tone that
lacked
a fundamental. He managed, not only to prove
that
the distortion product at the fundamental had a very
low
amplitude, but also to cancel it. The absence of a fundamental
component
was verified by adding a sinusoidal
tone
with a nearby frequency and checking for absence of
a
beat. The low pitch was unaffected by removeal of the
fundamental
partial, as it was unaffected when Licklider
masked
it with noise [22, 23]. This rules out the distortion
product
explanation of lowpitch. However distortion
products
do exist, and they sometimes do affect pitch, so
that
explanation tends to resurface from time to time.
A
second optionwas Helmholtz’s concept of “unconscious
inference”
that prefigured pattern matching (next
section)[24].
A third option, that Helmholtz apparently
did
not use, was to treat cochlear resonators as strings.
As
Mersenne and others had noticed, a string vibrates
sympathetically
with sounds tuned to its fundamental
mode
and with their harmonics. Thus it responds
to
a periodic sound regardless of whether or not it contains
a
fundamental partial. It is, in essence, a filter tuned
to
periodicity. Helmholtz had used the bank-of-strings
metaphor
to describe the cochlea. Nevertheless, he chose
to
characterize each filter as if it were a
Helmholtz resonator
tuned
to a single sinusoidal partial. Had he chosen
to
treat them as a strings, the missing fundamental
problem
would have not existed. Of course, a bank of
strings
does not fit Fourier’s theorem, and this is perhaps
why
he did not choose this option. If he had chosen it,
the
model would have eventually been proven wrong as
cochlear
filters are not tuned to periodicity.
3. Pattern
matching
We
are confronted with incomplete patterns everyday,
and
our brain is good at “reconstructing” perceptually the
parts
that are missing. Pattern matching models assume
that
this is how pitch is perceived when the fundamental
partial
is missing. The idea is thus that the fundamental
partial
is the necessary correlate of pitch, as Ohm
claimed,
but that it may nevertheless be absent if other
parts
of the pattern (harmonics often associated with it)
are
present. This idea was prefigured byHelmholtz’s “unconscious
inference”
and John Stuart Mill’s concept of
“possibilities”
[24, 1]. As a possible mechanism, Thurlow
suggested
that listeners use their own voice as a
“template”
to match with incoming patterns of harmonics
[25].
In
1956, de Boer described the concept of pattern
matching
in his thesis [26], but the best-known models
are
those of Goldstein [27], Wightman [28] and Terhardt
[29].
These models are closely related, but each has its
characteristic
flavor. Goldstein’s is probabilistic and performs
optimum
processing of a set of estimates of partial
frequencies
(obtained by a process that is not defined, but
that
could be Helmholtz’s cochlear analysis). Wightman
takes
the limited-resolution profile of activity across the
cochlea,
and feeds it to a hypothetical internal “Fourier
transformer”
to obtain a pattern akin to the autocorrelation
function.
Terhardt follows Ohm in positing for each
partial
its own sensation of spectral pitch, from which an
internal
template derives a virtual pitch that matches that
of
the (possibly missing) fundamental. That template is
learned.
Pattern-matching
models are well known and will not
be
described in greater detail here. There is a close relationship
between
pattern-matching models and spectrumbased
signal-processing
methods for
fundamental frequency
estimation, such as subharmonic
summation, harmonic
sieve,
autocorrelation or cepstrum [30, 8]. For the
last
two, this reflects the fact that the Fourier transform,
applied
to a spectrum (power spectrum for autocorrelation,
log
spectrum for cepstrum) is sensitive to the regular
pattern
of harmonics.
Terhardt’smodel
is distinct in that it requires that templates
be
learned by exposure to harmonically rich stimuli,
an
idea that is attractive but constraining. It can be
argued
that learning could just as well occur from exposure
to
patterns of subharmonics (superperiods) of periodic
sounds,
and that harmonically rich stimuli are thus
unnecessary
[8]. Shamma and Klein went further and
showed
that templates may be learned by exposure to
noise
[31]. What
this suggests is that the harmonic relations
within
the template are a mathematical property
that
needs merely to be discovered, not learned. Indeed,
other
devices embody the pattern-matching properties of
a
harmonic template without having “learned” them. Examples
are
the autocorrelation function and the string.
4. Temporal
models
Democritus
(5th century BC) and Epicurus (4th century
BC)
are credited with the idea that a sound-producing
body
emits atoms that propagate to the listener’s ear, an
idea
later adopted by Beeckman and Gassendi [2]. Related
is
the idea that a string “hits” the air repeatedly, and
that
pitch reflects the rate at which sound pulses hit the ear
[32,
2]. If so, it should be a simple matter to measure the
interval
between two consecutive atoms or pulses, rather
than
wait for a series of pulses to build up sympathetic
vibration
in a resonator.
The
influence of this temporal view of pitch can be
observed
indirectly in the “coincidence” theories of consonance
that
developed in the 16th and 17th centuries.
Two
notes were judged consonant if their vibrations coincided
often
[2].
Early
temporal models assumed that patterns of
pulses
are handled by the “brain”, and thus they tend to
be
less elaborate than resonance models. Compare for
example
Anaxagoras (5th century BC) for whom hearing
involved
penetration of sound to the brain, and Alcmaeon
of
is
by means of the ears, because within them is an empty
space,
and this empty space resounds
[6]. The latter obviously
“explains”
more. A similar contrast is seen between
the
monumental resonance theory of Helmholtz [7], and
the
two-page “telephone theory” that
to
it, according to which the ear is merely a telephone receiver
that
transmits pulses to the brain [33].
Helmholtz
theory [34]. One can speculate that they disapproved
of
Ohm’s choice (Sect. 2.3), objected to the
obligatory
Fourier analysis, and in general resented the
weight
of Helmholtz’s authority. Some of these theories
qualify
as “temporal” (e.g. those of
others
were essentially variations on the theme of a resonant
cochlea.
in
frog or rabbit nerves (352 per second) were insufficient
to
carry pitch over its full range (up to 4-5 kHz
for
musical pitch). The need for high firing rates was relaxed
in
1930 by Wever and Bray’s “volley theory” [35].
Subsequent
measurements from the auditory nerve con-
firmed
that the volley principle is essentially valid (in a
stochastic
form), in that synchrony to temporal features
is
measurable up to about 4-5 kHz in the auditory nerve
[36].
Synchrony is also observed at more central neural
relays,
but the upper frequency limit decreases as one
proceeds.
Temporal
and resonance models differ essentially in
the
time required to make a frequency measurement.
Resonance
involves the build-up of energy by accumulation
of
successive waves, and this requires time that
varies
inversely with frequency resolution. The relation
_ _ _ _ _ _ _
that constrains temporal and spectral resolution
was
formalized by G´abor [37], but it was known already
to
Helmholtz. Helmholtz reasoned that notes occur
at
a rate of up to 8 per second in music, and from this he
calculated
the narrowest possible bandwidth for cochlear
filters.
Frequency resolution was thus dictated by necessary
temporal
resolution, rather than by constraints related
to
the implementation of cochlear filters.
In
contrast, a time-domain mechanism needs just
enough
time to measure the interval between two events
(plus
time to make sure that each event is an event, plus
time
to make sure that they are not both part of a larger
pattern).
The time required is on the order of
two periods
of
the lowest expected frequency; accuracy is limited
only
by noise or imperfection of the implementation [8].
An
explanation of the puzzling fact that a time-domain
mechanism
can escape G´abor’s relation was given by
Nordmark
[38].
A
weakness of temporal models, as described so far,
is
their reliance on events. Events need of course to be
extracted
from the waveform (or from the neural pattern
that
it evokes). For simple waveforms this is trivial: one
may
use peaks or zero-crossings. For complex waveforms
the
problem is more delicate, as evident from the
difficulties
encountered by time-domain methods of fundamental
frequency
estimation [30]. It is hard (perhaps
impossible)
to find a definition of “event” that allows stable
period
measurement in every case.
This
weakness is evident in the phase sensitivity of
early
temporal models [28]. For example, a mechanism
that
measures intervals between peaks is confused by
waveforms
that have several peaks per period. A mechanism
that
measures intervals between envelope peaks is
confused
by phase manipulations that produce two envelope
peaks
per period, etc. As pitch is often invariant for
such
phase manipulations, such phase-sensitive mechanisms
cannot
hold. The autocorrelation model provided
a
solution to this problem.
5.
Autocorrelation
In
the autocorrelation (AC) model, each sample of the
waveform
is used, as it were, as an “event”. Each is compared
to
every other sample, and the inter-event interval
that
gives the best match (on average) indicates the period.
Concretely,
comparison is performed by
multiplying
samples
and summing the products over a time window.
If
samples are equal their products tend to be large, and
so
the autocorrelation function (ACF) has a peak at the
period
(and its multiples). The peak is the cue to pitch. A
slightly
more straightforward idea is to subtract samples
and
sum the squared differences, as proposed in the
cancellation
model
of [39]. The cue to pitch is then a dip in
the
difference function. Cancellation and AC models are
formally
equivalent [39].
The
original formulation of the AC model is due to
Licklider
[40], although an interesting precursor was proposed
by
up
the basilar membrane, is reflected at the apex, travels
down,
and meets the next pulse at a position that indicates
the
period [41]. In Licklider’s model, the ACF was
calculated
within the auditory nervous system, for each
channel
of the auditory filter bank. The model was reformulated
and
implemented computationally by Meddis
and
Hewitt [42], and confronted to autocorrelation statistics
of
actual nerve recordings by Cariani and Delgutte
[43].
A similar model based on first order interspike interval
statistics
was proposed by
was
cited earlier. Another variant is the
strobed temporal
integration
model of
Patterson and colleagues, in
which
patterns are cross-correlated with a strobe function
consisting
of one pulse per period [45]. Yost proposed a
simpler
predictive model based on waveform autocorrelation
[46].
One may cite also a number of “autocorrelation”
models
in which the ACF was produced by an internal
“Fourier
transformer” operating on a spectral profile
coming
from the cochlea [28].
An
important theorem, the Wiener-Khintchine theorem,
say
that ACF and power spectrum are Fourier transforms
one
of the other. In this sense the AC model can be
seen
as an incarnation of the two steps of
spectral analysis
and
pattern matching. This implies a relation between
these
rival approaches, as stressed early on by de Boer
[26].
They differ of course in how they might be implemented
in
the auditory nervous system, in properties such
as
frequency versus temporal resolution (Sect. 4), and in
the
way they can be extended to handle mixtures of tones
[47,
8].
It
is interesting to compare autocorrelation to the
string
which we
encountered several times in this review.
Implementation
of autocorrelation requires a delay, associated
with
a multiplier (e.g. a coincidence-detector neuron).
Delayed
patterns are multipliedwith undelayed patterns.
The
string too consists of a delay that, as it were,
feeds
upon itself. Delayed patterns are added to undelayed
patterns,
and their sum delayed again. This shows
a
basic similarity between string and AC. It also shows
their
difference. In the AC model a pattern is delayed at
most
once. In the string it is delayed many times, and
these
multiple delays are necessary for the build-up of
resonance
that allows the string to be selective.
6. Discussion
Autocorrelation
and pattern matching are the two major
options
for explaining pitch today. Pitch is evoked mainly
by
stimuli that are periodic, and its value depends on their
period. The two approaches can be
seen as two different
ways
of extracting the period fromthe stimulus. Autocorrelation
does
so directly, and pattern-matching indirectly
via
a first stage of Fourier transformation. The choice between
them
corresponds to that made by Ohm, a century
and
a half ago (Sect. 2.3).
Cochlear
frequency resolution, as Helmholtz pointed
out,
must be limited. Filters are of roughly constant
“Q”,
and thus have difficulty resolving upper harmonics,
closely
spaced on a logarithmic scale. Pattern-matching
depends
on frequency resolution, and cannot work for
stimuli
that contain only partials that are unresolved. Indeed,
such
stimuli tend to have a weak pitch, and this can
be
interpreted in favor of pattern-matching [8]. On the
other
hand, the pitch does exist and thus needs explaining,
which
pattern matching cannot do. This argues in
favor
of the AC model. The AC model could, in principle,
cover
both resolved and unresolved stimuli, but the
marked
behavioral differences between them suggest that
there
might instead be two mechanisms [48, 49, 50, 51].
The
superior performance in pitch tasks for conditions
in
which partials are resolved is strongly suggestive of a
pattern-matching
mechanism, that breaks down for unresolved
conditions.
It might nevertheless be due to other
factors
that co vary with “resolution” [8]. The issue of resolved
vs.
unresolved is currently a central issue in pitch
theory.
Pattern
matching and AC models both have many
variants.
At times discussions may tend to focus on
relatively
minor differences between rival formulations
(e.g.
between Terhardt’s vs. Goldstein’s formulation
of
pattern-matching, or between first-order and all-order
spike
statistics for the AC model). The historical approach
is
useful to widen the perspective, to emphasize
the
similarities between variants, and possibly even to
suggest
new, perhaps radically different, directions in
which
to seek explanations of pitch and hearing.
This
author is mainly interested (Licklider would
have
said “ego-involved” [52]) in a particular variant of
the
AC model, cancellation. The reason is that it brings
together
mechanisms of pitch and of
sound segregation
that
may be of use in particular to explain perception of
multiple
pitch [53, 47]. An algorithm based on cancellation
has
recently proved to be effective for fundamen-
tal
frequency estimation [54]. The concept of cancellation
fits
well with the ideas on redundancy and neural
metabolism
reduction of Barlow [55].
Interest
in pitch is fueled by interest in music, a very
old
activity. Ideas for this review were searched for in
sources
as ancient and diverse as possible. There are
big
gaps. Many important sources are known only indirectly
from
citations of later authors, suggesting that
much
material of interest has been lost. Indeed, there
is
evidence that some of the knowledge that developed
over
the last 25 centuries was known long before that, in
Sources
consulted were exclusively in English or French.
Those
written in Latin, German or other languages were
inaccessible
for lack of linguistic competence. A more
complete
review is due to appear shortly [8].
7. Conclusions
The
history of models of pitch perception has been reviewed.
Modern
ideas reincarnate older ideas, and their
roots
extend as far back as records are available. Models
that
are in competition today may have common roots.
The
historical approach allows commonalities and differences
to
be put in perspective. Hopefully this should help
to
defuse sterile controversy that is sometimes harmful to
the
progress of ideas [1]. It also may be of use to newcomers
to
the field to understand, say, why psychoacousticians
insist
on studying musical pitch with unresolved
stimuli
(that sound rather unmusical), why they add low
pass
noise (which makes tasks even more difficult), etc.
The
good reasons for these customs are easier to understand
with
a vision of the debates fromwhich present-day
pitch
theory evolved.
8. References
[1]
Boring EG (1942) Sensation and perception in the
history
of experimental psychology.
Appleton-Century.
[2]
Cohen, H. F. (1984). ”Quantifying music,”
D.
Reidel (Kluwer).
[3]
von B´ek´esy, G., and Rosenblith,W. A. (1948). ”The
early
history of hearing - observations and theories,”
J.
Acoust. Soc. Am. 20, 727-748.
[4]
Lindsay, R. B. (1966). ”The story of acoustics,” J.
Acoust.
Soc. Am. 39, 629-644.
[5]
Lindsay, R. B. (1973). ”Acoustics: historical and
philosophical
development,” Stroudsburg, Dowden,
Hutchinson
and Ross.
[6]
Hunt, F. V. (1992 (original: 1978)). ”Origins in
acoustics,”
of
[7]
von Helmholtz, H. (1877). ”On the sensations of
tone
(English translation A.J. Ellis, 1885, reprinted
1954),”
[8]
de Cheveign´e, A. (2004). ”Pitch perception models,”
in
”Pitch,” Edited by C. Plack and A. Oxenham,
New
York, Springer Verlag, in press.
[9]
Macran, H. S. (1902). ”The harmonics of Aristoxenus
(reprinted
1990, Georg Olms Verlag,
[10]
Wiener, P. P. (1973-1974). ”The Dictionary of the
History
of Ideas: Studies of Selected Pivotal Ideas,”
[11]
Galilei, G. (1638). ”Mathematical discourses concerning
two
new sciences relating to mechanicks
and
local motion, in four dialogues (translated by
THO.WESTON,
reprinted in Lindsay, 1973, pp 40-
61),”
[12]
Mersenne, M. (1636). ”Harmonie Universelle (reproduit
1975,
Cramoisy.
[13]
Du Verney, J. G. (1683). ”Trait´e de l’organe de
l’ouie,
contenant la structure, les usages et les maladies
de
toutes les parties de l’oreille,”
[14]
Le Cat, C.-N. (1758). ”La Th´eorie de l’ouie:
suppl´ement
`a cet article du trait´e des sens,”
Vallat-la-Chapelle.
[15]
Sauveur, J. (1701). ”Syst`eme g´en´eral des intervalles
du
son (translated by R.B. Lindsay as ”General
system
of sound intervals and its application to
sounds
of all systems and all musical instruments”,
reprinted
in Lindsay, 1973, pp 88-94),” Mmoires de
l’Acadmie
Royale des Sciences 279-300, 347-354.
[16]
Fourier, J. B. J. (1820). ”Trait´e analytique de la
chaleur,”
[17]
Ohm, G. S. (1843). ”On the definition of a tone with
the
associated theory of the siren and similar sound
producing
devices (translated by Lindsay, reprinted
in
Lindsay, 1973, pp 242-247,” Poggendorf’s Annalen
der
Physik und Chemie 59, 497ff.
[18]
Young, T. (1800). ”Outlines of experiments and inquiries
respecting
sound and light,” Phil. Trans. of
the
Royal Society of
plates).
[19]
Schouten, J. F. (1970). ”The residue revisited,” in
”Frequency
analysis and periodicity detection in
hearing,”
Edited by R. Plomp and G. F. Smoorenburg,
[20]
de Boer, E. (1976). ”On the ”residue” and auditory
pitch
perception,” in ”Handbook of sensory physiology,
vol
V-3,” Edited by W. D. Keidel and W. D.
[21]
Turner, R. S. (1977). ”The Ohm-Seebeck dispute,
Hermann
von Helmholtz, and the origins of physiological
acoustics,”
The Britsh Journal for the History
of
Science 10, 1-24.
[22]
Schouten, J. F. (1938). ”The perception of subjective
tones,”
Proc. Kon. Acad. Wetensch (Neth.) 41,
1086-1094
[reprinted in Schubert Schubert, E. D.
(1979).
”Psychological acoustics (Benchmark papers
in
Acoustics, v 13),”
Dowden,
Hutchinson & Ross, Inc].
[23]
Licklider, J. C. R. (1954). ”Periodicity pitch and
place
pitch,” J. Acoust. Soc. Am. 26, 945.
[24]
Warren, R. M., and Warren, R. P. (1968).
”Helmholtz
on perception: its physiology and development,”
[25]
Thurlow,W. R. (1963). ”Perception of low auditory
pitch:
a multicue mediation theory,” Psychol. Rev.
70,
461-470.
[26]
de Boer, E. (1956), ”On the ”residue” in hearing,”
unpublished
doctoral dissertation.
[27]
Goldstein, J. L. (1973). ”An optimumprocessor theory
for
the central formation of the pitch of complex
tones,”
J. Acoust. Soc. Am. 54, 1496-1516.
[28]
Wightman, F. L. (1973). ”The patterntransformation
model
of pitch,” J. Acoust. Soc.
Am.
54, 407-416.
[29]
Terhardt, E. (1974). ”Pitch, consonance and harmony,”
J.
Acoust. Soc. Am. 55, 1061-1069.
[30]
Hess, W. (1983). ”Pitch determination of speech
signals,”
[31]
Shamma, S., and Klein, D. (2000). ”The case of the
missing
pitch templates: how harmonic templates
emerge
in the early auditory system,” J. Acoust.
Soc.
Am. 107, 2631-2644.
[32]
Bower, C. M. (1989). ”Fundamentals of Music
(translation
of De Institutione Musica, Anicius
Manlius
Severinus Boethius, d524),”
[33]
Rutherford, E. (1886). ”A new theory of hearing,”
J.
Anat. Physiol. 21, 166-168.
[34]
Wever, E. G. (1949). ”Theory of hearing,” New
York,
Dover.
[35]
Wever, E. G., and Bray, C. W. (1930). ”The nature
of
acoustic response: the relation between sound
frequency
and frequency of impulses in the auditory
nerve,”
Journal of experimental psychology 13,
373-387.
[36]
Johnson, D. H. (1980). ”The relationship between
spike
rate and synchrony in responses of auditorynerve
fibers
to single tones,” J. Acoust. Soc. Am. 68,
1115-1122.
[37]
G´abor, D. (1947). ”Acoustical quanta and the theory
of
hearing,” Nature 159, 591-594.
[38]
Nordmark, J. O. (1968). ”Mechanisms of frequency
discrimination,”
J. Acoust. Soc. Am. 44, 1533-
1540.
[39]
de Cheveign´e, A. (1998). ”Cancellation model of
pitch
perception,” J. Acoust. Soc. Am. 103, 1261-
1271.
[40]
Licklider, J. C. R. (1951). ”A duplex theory of pitch
perception,”
Experientia 7, 128-134.
[41]
Hurst, C. H. (1895). ”A new theory of hearing,”
Proc.
Trans. Liverpool Biol. Soc. 9, 321-353 (and
plate
XX).
[42]
Meddis, R., and Hewitt,M. J. (1991). ”Virtual pitch
and
phase sensitivity of a computer model of the auditory
periphery.
I: Pitch identification,” J. Acoust.
Soc.
Am. 89, 2866-2882.
[43]
Cariani, P. A., and Delgutte, B. (1996). ”Neural correlates
of
the pitch of complex tones. I. Pitch and
pitch
salience,” J. Neurophysiol. 76, 1698-1716.
[44]
Moore, B. C. J. (1977, 2003). ”An introduction
to
the psychology of hearing,” London, Academic
Press.
[45]
Patterson, R. D., Robinson, K., Holdsworth, J.,
McKeown,
D., Zhang, C., and Allerhand, M.
(1992).
”Complex sounds and auditory images,” in
”Auditory
physiology and perception,” Edited by Y.
Cazals,
K. Horner and L. Demany, Oxford, Pergamon
Press,
429-446.
[46]
Yost, W. A. (1996). ”Pitch strength of iterated rippled
noise,”
J. Acoust. Soc. Am. 100, 3329-3335.
[47]
de Cheveign´e, A., and Kawahara, H. (1999). ”Multiple
period
estimation and pitch perception model,”
Speech
Communication 27, 175-185.
[48]
Houtsma, A. J. M., and Smurzynski, J. (1990).
”Pitch
identification and discrimination for complex
tones
with many harmonics,” J. Acoust. Soc. Am.
87,
304-310.
[49]
Carlyon,R. P., and Shackleton, T.M. (1994). ”Comparing
the
fundamental frequencies of resolved
and
unresolved harmonics: evidence for two pitch
mechanisms?,”
J. Acoust. Soc. Am. 95, 3541-3554.
[50]
Carlyon, R. P. (1996). ”Masker asynchrony impairs
the
fundamental-frequency discrimination of unresolved
harmonics,”
J. Acoust. Soc. Am. 99, 525-
533.
[51]
Carlyon, R. P. (1998). ”Comments on ”A unitary
model
of pitch perception” [J. Acoust. Soc. Am.
102,
1811-1820 (1997)],” J. Acoust. Soc. Am. 104,
1118-1121.
[52]
Licklider, J. C. R. (1959). ”Three auditory theories,”
in
”Psychology, a study of a science,” Edited by S.
Koch,
New York, McGraw-Hill, I, 41-144.
[53]
de Cheveign´e, A. (1993). ”Separation of concurrent
harmonic
sounds: Fundamental frequency estimation
and
a time-domain cancellation model of auditory
processing,”
J. Acoust. Soc. Am. 93, 3271-
3290.
[54]
de Cheveign´e, A., and Kawahara, H. (2002). ”YIN,
a
fundamental frequency estimator for speech and
music,”
J. Acoust. Soc. Am. 111, 1917-1930.
[55]
Barlow, H. B. (1961). ”Possible principles underlying
the
transformations of sensory messages,” in
”Sensory
Communication,” Edited byW. A. Rosenblith,
Cambridge
Mass, MIT Press, 217-234.