Sound is something most of us know and love. But although we hear sounds
every single day of our life, there are many aspects to experience that one
usually doesn’t pay attention to. These are the basic questions of the nature
of sound, the reasons for our hearing anything at all and the mechanics
underlying the sensation. It is not entirely clear how sound is actually
generated in physical objects, why different objects do not all sound alike or
how all this relates to our sensory apparatus. When taking a closer look at
sound as a physical phenomenon, many interesting characteristics of sound come
to proper focus. The relevant topics include the wave characteristics of sound:
diffraction, reflection, interference and so on. Taking into account the
peculiarities of the human sensory organ and the psychology of perception in
general, one is taken into the field of psychoacoustics—the study of human
auditory perception and its underlying mechanisms. This is a field rich in
applications and interesting discovery, not all of which seems intuitive at
first sight.

As one application perceptual sound codecs could
be mentioned—how can an MP3 codec throw 90% of an audio signal away and still
reproduce a perceptually near perfect replica of the original?

Sound is a wave phenomenon. That is something we’re all told on high school physics courses. But usually no time is left for
a nice intuitive picture of the thing to build up. That’s what we will try to
construct, next.

To create waves we need a medium in which to put them. Obviously the medium
needs to be elastic in order to generate any waves at all. Additionally there
must be a kind of stiffness

which holds the adjacent parts of the medium
together. This is what makes wave propagation, a form of energy transfer,
possible. It also determines the kinds of vibration which can propagate in the
medium: unlike solids, gases and liquids lack transverse binding forces and so
only longitudinal energy transfer is possible. This is why sound cannot be polarised. The other important intrinsic property of the
medium is its density—together density and the strength of the binding forces
determine the speed of wave motion in the medium.

Note that both density and the stiffness can vary within the substance: the
medium can be inhomogeneous. In addition to this, the binding forces
can be different depending on direction. In the latter case we talk about anisotropy,
which fortunately does not occur in gases.

In the case of sound, the density is just the usual weight per unit of
volume of gas and the stiffness is borne of a balance of repulsive forces
between molecules and their mean kinetic energy (temperature). In a closed
system of moving particles the sum kinetic energy will stay constant and
statistical physics predicts that the mean distance between the equally
repellant particles will try to even out. Similarly the expected velocity (with
direction) of particles over an arbitrary volume will be zero—at large scales
the gas will tend to stay put even if the molecules themselves move quite a
bit. Only after we upset the balance do the mean properties ripple. These mean
properties are used as a stepping stone to the analytic model of a sound field,
which simply forgets that the molecule level ever existed, assigns a real
velocity vector and a real pressure measure to each point in space and lets
these vary over time. From

Often we make the further assumption that at each point,
the velocity vector equals the gradient of the pressure field, though this way
we lose part of the generality of the model.

To get a hold of the process of sound propagation, we first look at a simple
example: a point source. This is simply one point in which we
explicitly control the sound pressure or the velocity vector of the field. In
practice, neither of these can be controlled independently of the other. When
we excite the medium by creating a disturbance in its structure, coupling
between adjacent particles makes the disturbance try to even out, globally. The
forces arising from the new inhomogeneity accelerate
the particles toward lower pressure. But this, assuming the pressurised
region is point‐like, can only mean that the disturbance moves outward.
The stuffed particles get pushed off from the center of the region. This makes
them move and, consequently, pushes fresh molecules out of the way. Voila:
motion. Once this has happened, the pressure is evened out. It is worth
pointing out that the pressure wave does not get ironed out in the inward
direction. This is due to the inertia of the molecules—once the pressure sets
them in motion, the pressure moves in the direction the first molecules go in.
What happens is similar to slamming a pool ball against another. Since all this
has happened through what are almost 100% elastic collisions of particles,
little energy is lost. (Some is, to the mean kinetic energy of particles,
raising the gas temperature. This accounts for some of the attenuation that sound experiences while travelling.)
As long as no new particles are brought into play the net effect is that of
making the pressurised region move outwards from the
point source. Note that individual particles do *not* move appreciable
distances in the action but stop after transferring their kinetic energy to the
next one. This is a characteristic of wave phenomena: seen at a larger scale,
energy moves, not the medium. Put in another way, turning the volume knob east
does not produce a tropic storm.

It is very important to separate such concepts as the pressure field (scalar
in 3D), the density field (scalar in 3D), the velocity (3D vector in 3D), the
gradient fields of the first two scalar ones (3D vector in 3D), the time
derivative fields of the scalar ones (scalar in 3D) and all the derived fields
on top of these. When bypassing the mathematical notation, it is exceptionally
easy to confuse the first and second time derivatives of the pressure field with
velocities and accelerations (which are taken relative to spatial coordinates
and are, hence, vector fields).

Above, we compressed the air at the source. But the same principle of wave
transmission applies if the opposite is done—namely, if we create a depressurisation zone. In this case, the surrounding
particles move in and the zone moves outward again. Also, the amount of depressurisation is significant—the more violent the
original disturbance, the bigger the propagating bump

in the medium.
Actually, aside from the energy of the original defect getting divided into a
larger area and thus growing fainter and fainter per unit volume, one can
accurately construct a series of more or less violent distrurbances
at a point by measuring the local air pressure at a point some distance removed
from the sound source. This is, roughly, how sound propagates through air in
free space and is experienced from afar.

A few things have to be noted about sound radiation. The first thing is that
we speak about pressures. The importance of this is seen when thinking about a
speaker cone that moves very slowly. In this case, the air has time to escape
from before the cone instead of forming a high pressure zone. Apparently
efficient radiation is not possible. We see that in order to emit considerable
radiation, rapid variations in pressure or large radiators must be used. This
is typical of wave emission—it is why microwave radio transmission requires
only small antennas while low frequency AM radio often employs dipole antennas
that are tens of meters long. From this we come to the second point: in order
to continuously emit sound, we cannot just move the speaker cone further and
further ahead. Instead, the cone has to come back before the air has time to
escape around the edges. In physics, the situation is discussed in terms of
coupled systems and impedance matching. The principle is,
sound emitters work best when the inertia of the medium keeps the medium from
moving appreciably and on the other hand the emitter’s own inertia isn’t large
enough to make it hard to move. Back and forth motion is the normal mode of
wave transmission, not the impulses we have discussed so far. A special case of
such movement occurs when the motion repeats at a constant rate and each cycle
involves the same precise pattern of movement. In this case we speak of periodic
motion and periodic sound/signals. The rate of repetition of a periodic motion
is dubbed the *frequency*, with Hertz (Hz)
as its unit. Hertz is the SI unit meaning times per second

. As each part
of a wave traverses at a constant velocity and, at each fixed point in space,
the vibratory motion repeats at a constant rate, we see that one period of the
motion is always exactly duplicated in a certain interval of space that depends
only on the speed of the wave motion and the frequency of the wave we drive
through the medium. As we work in a single medium, the speed stays constant, so
the length of our interval depends only on the frequency of our vibration. This
is the *wavelength* corresponding to the frequency, with an inverse
dependency on it. Because of the properties of what we will become to know as
linear systems, a certain type of periodic wave has a very special position in
our treatment. This wave is the sinusoid. It is the smooth, endless,
periodic function which we bump into in trigonometry. The sine wave has the
property that when put in a linear system (in our case, transmitted through
air), it comes through as a sine wave with the same frequency. The only
variation comes about in the form of a time lag and a change in strength. When
a combination of sine waves of different frequencies is introduced, they go
through as if the other waves weren’t even present.

In later parts of the text when we talk about sound, we usually mean
pressure variations measured at a point. This is because we have ears which are
relatively small compared to the wavelength of audible sound—we can with good
accuracy say that ears are pointlike with regard to
sound fields. Thus few humans even fully comprehend the real, complex vibrational patterns which occur in three dimensional
spaces—evolution has not equipped our brain to do such analysis. This fact is a
double edged sword, really—it would be nice to actively understand all the
phenomena involved in sound transmission since all such things affect what we
hear but, on the other hand, mathematical description and manipulation of 2+
dimensional wave phenomena quickly becomes quite unwieldy. It is quite a relief
to scientists, engineers, technicians and artists that such considerations are
not strictly necessary to fool our hearing.

When nonsinusoidal sources and/or a number of
radiators and/or closed spaces are considered, things get interesting. At once
we note something called *interference*. It is what happens when more
than one source is placed in the same space. At each point in space, the
individual contributions of our moving pressure zones (one for each emitter)
just add up. We get what is called linear wave transmission. The
name comes from mathematics and means, roughly, that given a bunch of signals,
we can first add and then feed through a system or first feed through the
system and then add, with equal results. To a considerable degree, this is what
happens with sound. In spite of its rather technical connotations, linearity is
a true friend. Without it, there would be little hope of understanding anything
about sound at an undergraduate level.

Said in another way, at small to moderate amplitudes, sound transmission in
large scale obeys a second order linear partial differential equation, called
the wave equation, which is seen in all branches of physics and is covered
early on in physics education. As is well known, once we know some solutions to
a linear differential equation, we get more by scaling and summing.

Now, as periodic waves interfere, it is interesting to see what happens in a
single, fixed point in space as time evolves. Let’s
suppose we have a one‐dimensional string where a single sinusoidal sound
source is present. We know that the pressure in a single point reflects that of
the source at any place, save the time lag it takes the vibratory motion to
reach our point and the attenuation resulting from friction and other damping
forces. If we now add a second source with an identical frequency but a
different placing on the string, we get *standing waves*. How does this
happen? Think about the peak of one period of the motion. As it leaves the two
sources, it travels at a constant velocity away from them. Precisely at the middle,
the two waves meet and we let them interfere; they add together. The same
applies for the valley parts of the wave. So in the middle, we get twice the
amplitude. We say the two sounds are *in phase* with each other. Let’s
take another point, this time choosing it so that the time to get from source 1
to the point is precisely half a cycle time greater than the time to get to our
point from source 2, that is, the difference between the distances to the
sources is a whole number of wavelengths plus one half. This time, the
sinusoids always arrive at our point precisely when they cancel each other out.
So in this point, we never observe vibratory motion. Points of these two kinds
occur repeatedly over the entire length of our string, with the amplitude of the
sinusoid motion varying between them from zero to double the source amplitude.

The last example was very simple, as only one‐dimensional
effects were considered. If two‐dimensions are used, we get a nice
interference pattern, where our special points recur on points where the
distance is, again, a whole multiple of half the wavelength.

We remember from highschool geometry that, given
two fixed points, if we draw a curve of those points where the difference of
distances from the fixed points is constant, we get a hyperbole. So the knots
and humps of our interference pattern on a plane occur on hyperboles with the
point sound sources as foci and the spacing of the points determined by the
wavelength of the sound. The same deduction goes for the 3D case, only the
sound field is quite a lot more difficult to visualise.
We get, logically enough, hyperboloids. (To see this, put a line through the
two point sources, rotate a plane set through this line and repeate
the two dimensional reasoning on this plane.)

One should note that when different frequencies are combined, the result is
more complex, since now we cannot combine the resultant vibration pointwise into a single sinusoid. But keeping to two, close
frequencies, we get an interesting phenomenon called *beating*.
When two frequencies that are close to each other are combined, we get, not an
audible combination of the two, but the frequency in the middle of the two,
varying sinusoidally in amplitude at the rate of the
difference between the two original frequencies. This is seen as follows.
Suppose we have two sine waves with frequencies

So now we have multiple sound sources, but still nothing but an empty medium
where our waves travel. How about obstacles? Starting from a single dimension
once more, we send a single pulse wave towards the end of a string which tied
to a rigid wall. What happens? Well, the pulse comes back: it gets reflected.
This is easy to understand—when a pressurized zone meets the wall, it cannot
move it, and the pressure pushes back instead, making for a reflected copy. If
the wall gives in

a little and takes a bit energy from the wave (turning
it into heat through friction, usually), the wave still bounces back but gets
attenuated. We say *absorption* has occurred. Absorption is the reason
rooms do not have indefinitely long echoes. In a sense, absorption is the
precise opposite of radiation. This way it is quite logical that, here too, the
size of the object and the frequency of the wave matter. Usually, though, the size

isn’t as much the size of the absorber, here, as it is the scale of detail of
and material used in the object. For example, a paper wall can only hold the
highest frequencies, whereas a soft, heavy curtain can absorb significant mid
and low frequency sounds. In higher dimensions (2+), reflections become much
more difficult to handle. Here approaches similar to ray optics work much
better.

When we combine reflection and interference, interesting things happen.
Taking our 1D standing waves, we can now generate them
by a single source and a wall that reflects the waves back.

One can think, as in ray optics, that the mirror image of the source now
provides the other source. A similar view works in higher dimensions, but gets
intractable quite fast when the number of reflections and reflecting objects
increases. Even more troublesome is the situation in which the reflecting
objects are not infinite, straight planes. At a very basic level the problem
with higher dimensional differential equations is precisely the one of curved
boundaries, which naturally make no sense in dimension
one.

If we put two obstacles and send a pulse between them, a periodic motion
arises. If we put a source there, instead, we observe a complex interference
pattern as the waves get reflected again and again and interfere with other
reflections and the source signal. Again, the same thing happens in higher
dimensions, only with more hard to follow patterns. If regular echoes, which
reinforce each other, can be produced at some frequency (in the case of
periodic sources, this happens when the distance between our two obstacles is a
multiple of the wavelength), *resonance* results. If such resonant
frequencies exist, they reinforce sounds of the same frequency. The opposite
(and all that is in between) can also happen—destructive interference can
greatly damp some frequencies. Resonance gives rise to different *modes*
of vibration—if resonance can happen on different frequencies, complex patterns
of vibration can arise. These patterns are taken advantage of in the design of
traditional instruments. For instance, only a slight variation in the design of
a violin can cause significant variations in its perceived timbre. Since
acoustically significant vibrational modes always appear
as (composite) standing wave formations in physical media (such as air columns,
solids and water), the different modes can often be independently
controlled—they all have their own characteristic vibrational
shape with humps and knots which gives us the possibility of exciting or
damping the modes differently relative to one another. Further, since air
columns can vibrate, so can spaces filled with air. This leads, in case, to the
issue of room acoustics: if one puts a point source (a very rough estimation of
a loudspeaker) in a room, the more the walls reflect sound, the more the room
colors the sound (longer echo means more chances for interference). As sound
circulating around a room gets reflected many times, it is necessary to ensure
that no prominent resonances occur (these are called *room modes* or
just modes and usually result from echoes between opposite walls). Same general
principles apply here as in the case of 1D resonance, with the exception of
many unusual and inharmonic modes—as such, the placement of speakers, room
geometrics and decoration crucially affect the sound field in the room. In
addition, psychoacoustical phenomena further
complicate matters. Thus, for instance, the more random the direction prominent
echoes can be made to come from, the better (as this lessens the effect of room
modes, and obvious echo directions get reduced). This is why audiophiles use
highly damped and irregularly shaped rooms to achieve a HIFI listening
environment. (Basic measures include thick carpets to absorb stray sound, book
shelves to absorb and scatter, absorbers in the ceiling and placing of heavy
furniture around the rim of the room.)

Until now, we have assumed that the medium in which our waves travel is
homogeneous—the speed of travel of wave motion is constant throughout the
space. Often this is not the case, though. In the case of sound, the speed
depends on what material the waves travel in and its temperature. Often one can
ignore the inhomogeneity, but sometimes it produces
important effects. The main one is refraction. This means velocity
dependent bending of wavefronts. Refraction is most
pronounced if sharp boundaries between media of different properties are
present—an excellent example is the boundary between water and air. If a wavefront hits such a boundary in an
other than a straight angle, the direction of the waves is bent. If the
speed decreases in the boundary, the motion bends towards the normal of the
boundary. If it increases, bending is away from the normal. If the incident
angle is great enough and the waves are getting slowed down, total reflection
occurs. All this is precisely analogous to what happens in ray optics. The only
difference is that in acoustics, one needs to worry about nonsharp
boundaries more often. This is because we are mostly dealing with sound
transmission in air in normal atmospheric pressures and in this case, the speed
differences usually arise from temperature differences—always a continuous
phenomenon. As you can already guess, refraction and total reflection happen
with graded boundaries as well. Here they take the form of smooth bending, not
abrupt changes of direction. One must also observe the fact that refraction,
just like diffraction, is frequency dependent—different frequencies refract
differently. What is the significance of all this, then? Most
often, at least indoors, none. Outdoors where temperature gradients can
be much greater, refraction effects can become significant, though. A prime
example is the way sound can propagate over lakes—if the water is warmer than
the air above it, a warm‐cold graded boundary can form in
the air above the water. This can, under some circumstances, bend sound waves
from the other side of the lake and prevent them from escaping. This can lead
to the sound propagating unusually long distances over the lake. (The
phenomenon is similar to the one employed in graded index optic fibres.)

One final phenomenon is yet to be uncovered, namely, *diffraction*.
This is something that is often, sadly enough, left to little notice. All waves
behave rather weirdly when they pass around objects. If very thin (compared to
the wavelength) objects are passed, no substantial effects are produced—such
little defects in the medium drown into the large scale wave motion. Very large
objects exhibit reflection, at least locally. But in between (e.g. around
object edges and suitably sized obstacles overall), the wave motion bends

,
creating some *pretty* complex interference patterns. Even in the case
of exceedingly simple geometric objects (e.g. balls, cylinders…), the resulting
interference is difficult to master mathematically. This is a phenomenon that
is specific to 2+ dimensional cases and is something that greatly affects the
behavior of sound in natural environments. Thus, the behavior of sound near
objects and object edges is really quite poorly understood, leading to the term
*near field effect* being used in situations where such behavior is
significant. Noteworthy examples are the sound field of a loudspeaker and the
field formed around a human head while standing in a larger sound field. The
latter to a considerable degree dominates how we hear sound and mostly
determines how the direction of a sound source affects our perception of it.

Diffraction is something which is not often taken into account when
simulating sound behaviour. Reasons for this are
multiple. Firstly, diffraction is rather difficult to simulate efficiently. As
it is a 2+ dimensional phenomenon, it does not naturally lend itself to the one
dimensional abstractions of today’s simulation methods and 2+ dimensional
simulations cost dearly in terms of processing power and memory. Secondly,
diffraction is heavily frequency dependent—it disperses waves of differing
frequencies. This is one of the reasons why accurate prediction of room
acoustics is so difficult. Thirdly, there is little need to think about 2+
dimensional effects when analysing static, linear
point‐to‐point
transmission. Though it may sound like all this is just plain academics, when
one tries to create convincing simulations of sound behaviour
for reverberation and binaural processing, this is where we usually hit the
wall.

Now we know diffraction does not fit in and is difficult to handle. Under
what assumptions, then, can we ignore the problem? Let’s start at the bottom of
things… To get a hold on wave phenomena, one needs to simplify quite a bit. The
most common way is to try to linearize and then
reduce the dimensionality of the problem. The latter part often consists of
building meshes of one dimensional simulations or neglecting the size of
phenomena in certain directions. The latter is the way we arrive at ray optics
and its audio counterpart—if we neglect the fact that our waves have a finite
wavelength, i.e. we pass it to the limit

, many ugly things go away and
we get nice, unidimensional, cleanly behaved rays
instead of multidimensional wavefronts. We can do
this if the waves are very short compared to the feature size of the
surrounding space. In the case of light and natural objects, we can quite
safely assume this to be the case. (The speed of light is high but its
frequency is even higher. This leads to the wavelength being very small. Also,
the relative frequency range of visible electromagnetic radiation is much
narrower than the range for audible sound.) With sound we bump into a
relatively wide frequency range and feature sizes in our environment which sit
right in the middle of audible wavelengths. This means that sound diffraction
in our surroundings is often considerable and can only be neglected if few
obstacles are present, sound sources can be considered point‐like,
enough damping is present and reflective surfaces are simple enough.

Before any mathematical treatment of sound is possible, we must represent it
somehow in the language of mathematics. To do this, we note what sound is: it
is just time‐dependent pressure variation. Furthermore, by taking a
point in space, we can represent sound at this point with a single number, the
pressure. When there is no sound, the pressure is just the normal atmospheric
pressure (around 100000 Pascals in the average), so
it would be a good idea to assign numbers with respect to this level. So we
represent the pressure at our point by telling how much the pressure differs
from long term average air pressure—rarefaction results in negative values,
compression in positive. What scale we use does not much matter—since most DSP
is linear, the same basic concepts apply regardless of scale. Now that we have
chosen a pressure scale, we just present the pressure as a function of time. If
we want a more complete description of the sound field, we take more points and
form a vector (a list of numbers, basically) of the pressures in those points
and represent this vector as a function of time. Usually we do not use more
than two to four points since the resulting description mostly suffices for
audio systems. Most people have never had a chance to hear anything exceeding
two channels (i.e. stereo).

So we now have functions of time. These we call signals. They can be
represented by voltages or currents on electric circuits and wires (this is the
way microphone cables, amplifiers and most consumer audio equipment works), as
grooves of varying depth on an LP, as numbers of some given precision on a
computer or as numbers encoded in the tiny pits and ridges of a CD.
Mathematically we treat these functions as mappings from real numbers to real
numbers (i.e. for each possible instance of time, we assign an infinitely
accurate measure of pressure). In digital systems, we present a string of
numbers which give a sufficiently accurate measure of the pressure at points
sufficiently close to each other in time (these numbers are called samples
and under proper conditions, they represent the original signal with near
perfect quality). (See the first section of the chapter on DSP for a closer
look at sampling.) Having got used to thinking about sound in terms of signals,
we often equate these. This makes it possible to use mathematical terminology
(which is suitable for signals) to describe what happens or is to be done to
sound. It may sound a bit strange, for instance, to talk about squaring a
sound

. Thought of as a sequence of numbers, it makes perfect sense. Especially since we aim at understanding DSP as well.

Not every sound has a frequency—no repetition, no frequency. However,
measured at a point, every sound has an amplitude.
This means roughly the same as the strength of the sound and could be defined
in a variety of ways. We pick one and speak of (peak) amplitude, defined as the
difference between maximum compression and maximum rarefaction that our sound
wave causes during a given period of time. The term can also be used without
exact, mathematically defined meaning to mean the (relative) strength of the
sound (with respect to another).

When we present some sound to people, we soon realize that amplitude (peak‐to‐peak
pressure variation) is not very significant perceptually. Instead, average
power seems to be. This is why most volume monitors use an RMS (Root
Mean Square) scale.

This is a time localized estimate of the average signal power, and is
calculated by squaring the signal, taking a weighted average over a period of
time and then taking a square root. Why should this work? One reason is that
power is preserved in Fourier decompositions whereas amplitude is not. Since we
process signals mainly in a frequency decomposed form, it is to be expected
that time‐domain characterizations which can be directly
translated to frequency domain should work the best. As the ear seems to do
time‐localized
filterbank analysis (as opposed to real Fourier
analysis which really has infinite memory

), time‐localized
averaging should not come as a surprise, either.

Now, the dynamic range of human hearing is exceptionally wide—the amplitude
ratio of the softest sound heard to the loudest noise tolerated is in the
vicinity of 100 000 000 to 1 (hundred million to one)
with most resolution in the quiet end. Around 1kHz
people tend to classify a ten‐fold increase or decrease in sound
energy as a doubling or halving, respectively, of perceived loudness. This
means that a suitable scale for sound amplitude is not linear, but logarithmic.
Values from this scale are called sound pressure levels (SPL) and their unit is the decibel (*dB*). It is defined as twenty times
the base ten logarithm of the ratio of sound pressure variation (effective
level) to the one of the softest sound heard by an average human (the
threshold of human hearing, defined as 20 micropascals
peak variation for a 1kHz sine wave). This
means that 0dB equals the threshold and a
twenty decibel increase in decibels means a *ten**‐fold*
increase in pressure variation. To illustrate, going from 0dB to 140dB means
multiplying by

Yet another amusing calculation reveals that with a sinusoid of 196dB SPL, the rarefying
part of the fluctuation reaches vacuum. This is the theoretical limit on
sinusoidal pressure fluctuations in normal atmospheric pressure, then.
(Compressive impulses can, of course, reach much higher SPLs; cf. the hydrogen bomb.) Doubling the
pressure variation, an increase of 3dB SPL is achieved. When we think a bit, we see that if
two sounds with a significant SPL difference
(say, over 15dB) are added together, their
relative difference is much greater than we would think. In effect, adding a 30dB SPL sound to one of
60dB does not increase the SPL significantly beyond 60dB.

Similarly we define the intensity level (ten times the logarithm
of the ratio of sound intensity to a reference intensity of

Now, although it was established a while ago that not all sounds need to
have a properly defined frequency, the concept of frequency still has its uses.
This is because, as we shall see later on, it is quite possible to uniquely
construct signals from sine waves with definite frequencies. This makes it
possible to talk about frequency ranges of *any* signal—we break the
signal into sine waves and discard everything but the frequencies of interest.
This can also be accomplished directly. Such ranges (called bands or
subbands) can then be processed and analysed separately, which, of course, is precisely what
goes on when we watch the spectrum analyzer on a hip soundsystem,
crank up the bass on a car stereo or speak through a telephone (which
constitutes a severely bandlimited channel).
Simultaneously measuring the relative contributions of all the different
frequency ranges in a signal gives rise to the spectrum of a sound.
Depending on the way in which we extract the subbands,
we arrive at different kinds of spectra. Nevertheless, they all give some sort
of budget of how much bass, middle and trebble our
signal has. Since our ear performs an analysis somewhat reminiscent of the kind
described above, spectra are invaluable in discussing and analysing
sound and related technology. Even when working with the kind of simple,
intuitive definition of the kind given above.

We want to defer the introduction of math, so any rigorous treatment of
spectra (amongst other things) is necessarily postponed as well. This will
leave some holes and vaguely defined concepts, here. Be forewarned when we use
such terms as periodic, continuous, discrete, spectrum and so on.

Most traditional acoustic research has centered around
highly reductionistic approaches, such as using
anechoic chambers, sinusoid test tones and so on. In the real world, however,
we never encounter strictly periodic sounds, let alone pure sinusoids—musical
sounds are never pure enough and in addition are strictly time limited. In
fact, most musical sounds do not even approximate periodic behavior. To get a
hold on the following topics, we need to classify sounds a bit further, and to
establish an intuition as to how the different types of tones behave and what
they sound like.

Periodic sounds we have already seen. The simplest example is the sine wave.
All periodic sounds repeat over and over, reaching
over all of time. It is clear that such sounds do not really exist, but they
are a neat conceptual tool when analyzing sounds which are locally stable. This
can be done after a system in a sense no longer remembers that some input has started

a finite instance ago, that is, any transient phenomena have diminished
sufficiently. As to why we would go with periodic analysis, periodic signals
have extremely nice properties. For instance, frequency is a concept which is
only defined for signals which are periodic. If we look at the spectrum of a
periodic signal, we quickly learn that only whole multiples of some fundamental
frequency (harmonics) are present. Later,
when stated formally, this notion leads to the classic theorem on Fourier
series.

This does not imply the fundamental or all the harmonics need to be present.
When they are not, the actual frequency (rate of repetition) of the signal can
be higher than the fundamental frequency. In fact, one can always think of a
series of harmonic partials as containing only some of the even harmonics
instead of successive harmonics. This leads to the fundamental being somewhat
lower than before the shift in the point of view. Consequently the concept of
fundamental frequency is not very well‐defined and certainly does not
relate uniquely to the actual frequency of the signal. This permits some
interesting acoustical illusions and even serious musical applications.

Investigating a bit further we find that the relative amplitudes and phases
of constituent harmonics uniquely determine a periodic signal. Later we shall
see that the absolute phases of the harmonics in a periodic sound actually
matter little to us, and even the amplitudes are perceived a bit vaguely. There
is no time information, either. This means that there are actually not so many
perceptually separate periodic tones. Further, all of them sound extremely
dull and sterile.

The importance of periodic signals and their spectra lies in the fact that
they are exceedingly simple mathematically—periodic sounds avoid the
topological complications of Fourier analysis. They lead to the Fourier series
which is discrete and as such quite simple to understand and derive. The
Fourier series serves as a starting point for the construction of the discrete
Fourier transform which is of pivotal importance in DSP. More
about all this in the math section.

In the previous we established that every periodic signal can be constructed
from harmonics of some fundamental. Now, nobody says we cannot add together partials
which are not in harmonic relationship with each other. When we do this, we
obtain quasi‐periodic signals. These
sounds still have discrete spectra, but they need not be periodic. Quasi‐periodic
sounds are more relevant to musical acoustics than periodic ones—locally the
steady‐state
part of an instrumental sound is usually best described as being quasi‐periodic.
Again we assume that all the partials are in the audible range. Unlike periodic
signals, quasi‐periodic ones can have some time content—closely spaced
partials beat against each other, possibly contributing to harshness and time
evolution in the composite tone. Inharmonic partials often lead to bell‐like
or metallic timbres, or even chord or noise like textures if many enough
partials are present. No strict time features emerge, however, because any
transient content would necessarily imply a continuous spectrum. For the same
reason, any sound with a discrete spectrum will reach indefinitely back and
forth in time.

Finally we have signals with continuous (in the strict sense) spectra, i.e. aperiodic signals. Sounds like these can be practically
anything, but they never display truly periodic time‐domain
behavior. Usually white noise is given as an example, but actually all time
localized and discontinuous signals belong to this class. All transients
(because they are time‐localized) and physical signals (because they have
finite energy) also have continuous spectra.

Strictly speaking, *noise* is mathematically defined in terms of its
generating process and some statistical properties of that process. The actual
signals we process are just *examples* (a numerable collection of which
is, in proper mathematical terms, an ensemble) of what such a
process can produce, and should be strictly separated from the process itself.
This means that mathematically derived spectra for stochastic processes are
expectations—they relate to real

spectra like the expected result of
half heads and half tails relates to an actual experimental record of coin
tosses. In statistical analysis, a property called ergodicity
then guarantees that averages taken in the time domain faithfully represent the
properties of the stochastic process across its ensemble, so we can often handwave the distinction between the properties of the
process and the properties of its example output. (Ergodicity
guarantees that time averages taken over one output equal those taken over all
the signals in an ensemble.) One should keep in mind that they are not the same
thing, however. Otherwise one runs into some deep math. To get rid of the
process description and to work solely on time series, one must first consider
such fun subjects as information theory, Kolmogorov
complexity, Bayesian statistics and estimation theory, to mention a few. Those
are topics *well* outside both the scope of this presentation and the
capability of the author.

Copyright © 1996–2002 Sampo Syreeni; Date: 2002–11–21;