Sound is something most of us know and love. But although we hear sounds every single day of our life, there are many aspects to experience that one usually doesn’t pay attention to. These are the basic questions of the nature of sound, the reasons for our hearing anything at all and the mechanics underlying the sensation. It is not entirely clear how sound is actually generated in physical objects, why different objects do not all sound alike or how all this relates to our sensory apparatus. When taking a closer look at sound as a physical phenomenon, many interesting characteristics of sound come to proper focus. The relevant topics include the wave characteristics of sound: diffraction, reflection, interference and so on. Taking into account the peculiarities of the human sensory organ and the psychology of perception in general, one is taken into the field of psychoacoustics—the study of human auditory perception and its underlying mechanisms. This is a field rich in applications and interesting discovery, not all of which seems intuitive at first sight.
As one application perceptual sound codecs could be mentioned—how can an MP3 codec throw 90% of an audio signal away and still reproduce a perceptually near perfect replica of the original?
Sound is a wave phenomenon. That is something we’re all told on high school physics courses. But usually no time is left for a nice intuitive picture of the thing to build up. That’s what we will try to construct, next.
To create waves we need a medium in which to put them. Obviously the medium
needs to be elastic in order to generate any waves at all. Additionally there
must be a kind of
stiffness which holds the adjacent parts of the medium
together. This is what makes wave propagation, a form of energy transfer,
possible. It also determines the kinds of vibration which can propagate in the
medium: unlike solids, gases and liquids lack transverse binding forces and so
only longitudinal energy transfer is possible. This is why sound cannot be polarised. The other important intrinsic property of the
medium is its density—together density and the strength of the binding forces
determine the speed of wave motion in the medium.
Note that both density and the stiffness can vary within the substance: the medium can be inhomogeneous. In addition to this, the binding forces can be different depending on direction. In the latter case we talk about anisotropy, which fortunately does not occur in gases.
In the case of sound, the density is just the usual weight per unit of
volume of gas and the stiffness is borne of a balance of repulsive forces
between molecules and their mean kinetic energy (temperature). In a closed
system of moving particles the sum kinetic energy will stay constant and
statistical physics predicts that the mean distance between the equally
repellant particles will try to even out. Similarly the expected velocity (with
direction) of particles over an arbitrary volume will be zero—at large scales
the gas will tend to stay put even if the molecules themselves move quite a
bit. Only after we upset the balance do the mean properties ripple. These mean
properties are used as a stepping stone to the analytic model of a sound field,
which simply forgets that the molecule level ever existed, assigns a real
velocity vector and a real pressure measure to each point in space and lets
these vary over time. From
Often we make the further assumption that at each point, the velocity vector equals the gradient of the pressure field, though this way we lose part of the generality of the model.
To get a hold of the process of sound propagation, we first look at a simple example: a point source. This is simply one point in which we explicitly control the sound pressure or the velocity vector of the field. In practice, neither of these can be controlled independently of the other. When we excite the medium by creating a disturbance in its structure, coupling between adjacent particles makes the disturbance try to even out, globally. The forces arising from the new inhomogeneity accelerate the particles toward lower pressure. But this, assuming the pressurised region is point‐like, can only mean that the disturbance moves outward. The stuffed particles get pushed off from the center of the region. This makes them move and, consequently, pushes fresh molecules out of the way. Voila: motion. Once this has happened, the pressure is evened out. It is worth pointing out that the pressure wave does not get ironed out in the inward direction. This is due to the inertia of the molecules—once the pressure sets them in motion, the pressure moves in the direction the first molecules go in. What happens is similar to slamming a pool ball against another. Since all this has happened through what are almost 100% elastic collisions of particles, little energy is lost. (Some is, to the mean kinetic energy of particles, raising the gas temperature. This accounts for some of the attenuation that sound experiences while travelling.) As long as no new particles are brought into play the net effect is that of making the pressurised region move outwards from the point source. Note that individual particles do not move appreciable distances in the action but stop after transferring their kinetic energy to the next one. This is a characteristic of wave phenomena: seen at a larger scale, energy moves, not the medium. Put in another way, turning the volume knob east does not produce a tropic storm.
It is very important to separate such concepts as the pressure field (scalar in 3D), the density field (scalar in 3D), the velocity (3D vector in 3D), the gradient fields of the first two scalar ones (3D vector in 3D), the time derivative fields of the scalar ones (scalar in 3D) and all the derived fields on top of these. When bypassing the mathematical notation, it is exceptionally easy to confuse the first and second time derivatives of the pressure field with velocities and accelerations (which are taken relative to spatial coordinates and are, hence, vector fields).
Above, we compressed the air at the source. But the same principle of wave
transmission applies if the opposite is done—namely, if we create a depressurisation zone. In this case, the surrounding
particles move in and the zone moves outward again. Also, the amount of depressurisation is significant—the more violent the
original disturbance, the bigger the propagating
bump in the medium.
Actually, aside from the energy of the original defect getting divided into a
larger area and thus growing fainter and fainter per unit volume, one can
accurately construct a series of more or less violent distrurbances
at a point by measuring the local air pressure at a point some distance removed
from the sound source. This is, roughly, how sound propagates through air in
free space and is experienced from afar.
A few things have to be noted about sound radiation. The first thing is that
we speak about pressures. The importance of this is seen when thinking about a
speaker cone that moves very slowly. In this case, the air has time to escape
from before the cone instead of forming a high pressure zone. Apparently
efficient radiation is not possible. We see that in order to emit considerable
radiation, rapid variations in pressure or large radiators must be used. This
is typical of wave emission—it is why microwave radio transmission requires
only small antennas while low frequency AM radio often employs dipole antennas
that are tens of meters long. From this we come to the second point: in order
to continuously emit sound, we cannot just move the speaker cone further and
further ahead. Instead, the cone has to come back before the air has time to
escape around the edges. In physics, the situation is discussed in terms of
coupled systems and impedance matching. The principle is,
sound emitters work best when the inertia of the medium keeps the medium from
moving appreciably and on the other hand the emitter’s own inertia isn’t large
enough to make it hard to move. Back and forth motion is the normal mode of
wave transmission, not the impulses we have discussed so far. A special case of
such movement occurs when the motion repeats at a constant rate and each cycle
involves the same precise pattern of movement. In this case we speak of periodic
motion and periodic sound/signals. The rate of repetition of a periodic motion
is dubbed the frequency, with Hertz (Hz)
as its unit. Hertz is the SI unit meaning
times per second. As each part
of a wave traverses at a constant velocity and, at each fixed point in space,
the vibratory motion repeats at a constant rate, we see that one period of the
motion is always exactly duplicated in a certain interval of space that depends
only on the speed of the wave motion and the frequency of the wave we drive
through the medium. As we work in a single medium, the speed stays constant, so
the length of our interval depends only on the frequency of our vibration. This
is the wavelength corresponding to the frequency, with an inverse
dependency on it. Because of the properties of what we will become to know as
linear systems, a certain type of periodic wave has a very special position in
our treatment. This wave is the sinusoid. It is the smooth, endless,
periodic function which we bump into in trigonometry. The sine wave has the
property that when put in a linear system (in our case, transmitted through
air), it comes through as a sine wave with the same frequency. The only
variation comes about in the form of a time lag and a change in strength. When
a combination of sine waves of different frequencies is introduced, they go
through as if the other waves weren’t even present.
In later parts of the text when we talk about sound, we usually mean pressure variations measured at a point. This is because we have ears which are relatively small compared to the wavelength of audible sound—we can with good accuracy say that ears are pointlike with regard to sound fields. Thus few humans even fully comprehend the real, complex vibrational patterns which occur in three dimensional spaces—evolution has not equipped our brain to do such analysis. This fact is a double edged sword, really—it would be nice to actively understand all the phenomena involved in sound transmission since all such things affect what we hear but, on the other hand, mathematical description and manipulation of 2+ dimensional wave phenomena quickly becomes quite unwieldy. It is quite a relief to scientists, engineers, technicians and artists that such considerations are not strictly necessary to fool our hearing.
When nonsinusoidal sources and/or a number of radiators and/or closed spaces are considered, things get interesting. At once we note something called interference. It is what happens when more than one source is placed in the same space. At each point in space, the individual contributions of our moving pressure zones (one for each emitter) just add up. We get what is called linear wave transmission. The name comes from mathematics and means, roughly, that given a bunch of signals, we can first add and then feed through a system or first feed through the system and then add, with equal results. To a considerable degree, this is what happens with sound. In spite of its rather technical connotations, linearity is a true friend. Without it, there would be little hope of understanding anything about sound at an undergraduate level.
Said in another way, at small to moderate amplitudes, sound transmission in large scale obeys a second order linear partial differential equation, called the wave equation, which is seen in all branches of physics and is covered early on in physics education. As is well known, once we know some solutions to a linear differential equation, we get more by scaling and summing.
Now, as periodic waves interfere, it is interesting to see what happens in a single, fixed point in space as time evolves. Let’s suppose we have a one‐dimensional string where a single sinusoidal sound source is present. We know that the pressure in a single point reflects that of the source at any place, save the time lag it takes the vibratory motion to reach our point and the attenuation resulting from friction and other damping forces. If we now add a second source with an identical frequency but a different placing on the string, we get standing waves. How does this happen? Think about the peak of one period of the motion. As it leaves the two sources, it travels at a constant velocity away from them. Precisely at the middle, the two waves meet and we let them interfere; they add together. The same applies for the valley parts of the wave. So in the middle, we get twice the amplitude. We say the two sounds are in phase with each other. Let’s take another point, this time choosing it so that the time to get from source 1 to the point is precisely half a cycle time greater than the time to get to our point from source 2, that is, the difference between the distances to the sources is a whole number of wavelengths plus one half. This time, the sinusoids always arrive at our point precisely when they cancel each other out. So in this point, we never observe vibratory motion. Points of these two kinds occur repeatedly over the entire length of our string, with the amplitude of the sinusoid motion varying between them from zero to double the source amplitude.
The last example was very simple, as only one‐dimensional effects were considered. If two‐dimensions are used, we get a nice interference pattern, where our special points recur on points where the distance is, again, a whole multiple of half the wavelength.
We remember from highschool geometry that, given two fixed points, if we draw a curve of those points where the difference of distances from the fixed points is constant, we get a hyperbole. So the knots and humps of our interference pattern on a plane occur on hyperboles with the point sound sources as foci and the spacing of the points determined by the wavelength of the sound. The same deduction goes for the 3D case, only the sound field is quite a lot more difficult to visualise. We get, logically enough, hyperboloids. (To see this, put a line through the two point sources, rotate a plane set through this line and repeate the two dimensional reasoning on this plane.)
One should note that when different frequencies are combined, the result is more complex, since now we cannot combine the resultant vibration pointwise into a single sinusoid. But keeping to two, close frequencies, we get an interesting phenomenon called beating. When two frequencies that are close to each other are combined, we get, not an audible combination of the two, but the frequency in the middle of the two, varying sinusoidally in amplitude at the rate of the difference between the two original frequencies. This is seen as follows. Suppose we have two sine waves with frequencies and and we form their product, . Through a basic trigonometric identity, the result is , which shows the symmetrical placement of the sidebands. (Don’t worry about the cosines, since they have the same form as sines. They are only a bit ahead in time…) The equation works backwards, of course, so adding two sinusoids at near frequencies produces beating at a period of half the separation of the originals.
So now we have multiple sound sources, but still nothing but an empty medium
where our waves travel. How about obstacles? Starting from a single dimension
once more, we send a single pulse wave towards the end of a string which tied
to a rigid wall. What happens? Well, the pulse comes back: it gets reflected.
This is easy to understand—when a pressurized zone meets the wall, it cannot
move it, and the pressure pushes back instead, making for a reflected copy. If
gives in a little and takes a bit energy from the wave (turning
it into heat through friction, usually), the wave still bounces back but gets
attenuated. We say absorption has occurred. Absorption is the reason
rooms do not have indefinitely long echoes. In a sense, absorption is the
precise opposite of radiation. This way it is quite logical that, here too, the
size of the object and the frequency of the wave matter. Usually, though, the
isn’t as much the size of the absorber, here, as it is the scale of detail of
and material used in the object. For example, a paper wall can only hold the
highest frequencies, whereas a soft, heavy curtain can absorb significant mid
and low frequency sounds. In higher dimensions (2+), reflections become much
more difficult to handle. Here approaches similar to ray optics work much
When we combine reflection and interference, interesting things happen. Taking our 1D standing waves, we can now generate them by a single source and a wall that reflects the waves back.
One can think, as in ray optics, that the mirror image of the source now provides the other source. A similar view works in higher dimensions, but gets intractable quite fast when the number of reflections and reflecting objects increases. Even more troublesome is the situation in which the reflecting objects are not infinite, straight planes. At a very basic level the problem with higher dimensional differential equations is precisely the one of curved boundaries, which naturally make no sense in dimension one.
If we put two obstacles and send a pulse between them, a periodic motion arises. If we put a source there, instead, we observe a complex interference pattern as the waves get reflected again and again and interfere with other reflections and the source signal. Again, the same thing happens in higher dimensions, only with more hard to follow patterns. If regular echoes, which reinforce each other, can be produced at some frequency (in the case of periodic sources, this happens when the distance between our two obstacles is a multiple of the wavelength), resonance results. If such resonant frequencies exist, they reinforce sounds of the same frequency. The opposite (and all that is in between) can also happen—destructive interference can greatly damp some frequencies. Resonance gives rise to different modes of vibration—if resonance can happen on different frequencies, complex patterns of vibration can arise. These patterns are taken advantage of in the design of traditional instruments. For instance, only a slight variation in the design of a violin can cause significant variations in its perceived timbre. Since acoustically significant vibrational modes always appear as (composite) standing wave formations in physical media (such as air columns, solids and water), the different modes can often be independently controlled—they all have their own characteristic vibrational shape with humps and knots which gives us the possibility of exciting or damping the modes differently relative to one another. Further, since air columns can vibrate, so can spaces filled with air. This leads, in case, to the issue of room acoustics: if one puts a point source (a very rough estimation of a loudspeaker) in a room, the more the walls reflect sound, the more the room colors the sound (longer echo means more chances for interference). As sound circulating around a room gets reflected many times, it is necessary to ensure that no prominent resonances occur (these are called room modes or just modes and usually result from echoes between opposite walls). Same general principles apply here as in the case of 1D resonance, with the exception of many unusual and inharmonic modes—as such, the placement of speakers, room geometrics and decoration crucially affect the sound field in the room. In addition, psychoacoustical phenomena further complicate matters. Thus, for instance, the more random the direction prominent echoes can be made to come from, the better (as this lessens the effect of room modes, and obvious echo directions get reduced). This is why audiophiles use highly damped and irregularly shaped rooms to achieve a HIFI listening environment. (Basic measures include thick carpets to absorb stray sound, book shelves to absorb and scatter, absorbers in the ceiling and placing of heavy furniture around the rim of the room.)
Until now, we have assumed that the medium in which our waves travel is homogeneous—the speed of travel of wave motion is constant throughout the space. Often this is not the case, though. In the case of sound, the speed depends on what material the waves travel in and its temperature. Often one can ignore the inhomogeneity, but sometimes it produces important effects. The main one is refraction. This means velocity dependent bending of wavefronts. Refraction is most pronounced if sharp boundaries between media of different properties are present—an excellent example is the boundary between water and air. If a wavefront hits such a boundary in an other than a straight angle, the direction of the waves is bent. If the speed decreases in the boundary, the motion bends towards the normal of the boundary. If it increases, bending is away from the normal. If the incident angle is great enough and the waves are getting slowed down, total reflection occurs. All this is precisely analogous to what happens in ray optics. The only difference is that in acoustics, one needs to worry about nonsharp boundaries more often. This is because we are mostly dealing with sound transmission in air in normal atmospheric pressures and in this case, the speed differences usually arise from temperature differences—always a continuous phenomenon. As you can already guess, refraction and total reflection happen with graded boundaries as well. Here they take the form of smooth bending, not abrupt changes of direction. One must also observe the fact that refraction, just like diffraction, is frequency dependent—different frequencies refract differently. What is the significance of all this, then? Most often, at least indoors, none. Outdoors where temperature gradients can be much greater, refraction effects can become significant, though. A prime example is the way sound can propagate over lakes—if the water is warmer than the air above it, a warm‐cold graded boundary can form in the air above the water. This can, under some circumstances, bend sound waves from the other side of the lake and prevent them from escaping. This can lead to the sound propagating unusually long distances over the lake. (The phenomenon is similar to the one employed in graded index optic fibres.)
One final phenomenon is yet to be uncovered, namely, diffraction.
This is something that is often, sadly enough, left to little notice. All waves
behave rather weirdly when they pass around objects. If very thin (compared to
the wavelength) objects are passed, no substantial effects are produced—such
little defects in the medium drown into the large scale wave motion. Very large
objects exhibit reflection, at least locally. But in between (e.g. around
object edges and suitably sized obstacles overall), the wave motion
creating some pretty complex interference patterns. Even in the case
of exceedingly simple geometric objects (e.g. balls, cylinders…), the resulting
interference is difficult to master mathematically. This is a phenomenon that
is specific to 2+ dimensional cases and is something that greatly affects the
behavior of sound in natural environments. Thus, the behavior of sound near
objects and object edges is really quite poorly understood, leading to the term
near field effect being used in situations where such behavior is
significant. Noteworthy examples are the sound field of a loudspeaker and the
field formed around a human head while standing in a larger sound field. The
latter to a considerable degree dominates how we hear sound and mostly
determines how the direction of a sound source affects our perception of it.
Diffraction is something which is not often taken into account when simulating sound behaviour. Reasons for this are multiple. Firstly, diffraction is rather difficult to simulate efficiently. As it is a 2+ dimensional phenomenon, it does not naturally lend itself to the one dimensional abstractions of today’s simulation methods and 2+ dimensional simulations cost dearly in terms of processing power and memory. Secondly, diffraction is heavily frequency dependent—it disperses waves of differing frequencies. This is one of the reasons why accurate prediction of room acoustics is so difficult. Thirdly, there is little need to think about 2+ dimensional effects when analysing static, linear point‐to‐point transmission. Though it may sound like all this is just plain academics, when one tries to create convincing simulations of sound behaviour for reverberation and binaural processing, this is where we usually hit the wall.
Now we know diffraction does not fit in and is difficult to handle. Under
what assumptions, then, can we ignore the problem? Let’s start at the bottom of
things… To get a hold on wave phenomena, one needs to simplify quite a bit. The
most common way is to try to linearize and then
reduce the dimensionality of the problem. The latter part often consists of
building meshes of one dimensional simulations or neglecting the size of
phenomena in certain directions. The latter is the way we arrive at ray optics
and its audio counterpart—if we neglect the fact that our waves have a finite
wavelength, i.e. we
pass it to the limit, many ugly things go away and
we get nice, unidimensional, cleanly behaved rays
instead of multidimensional wavefronts. We can do
this if the waves are very short compared to the feature size of the
surrounding space. In the case of light and natural objects, we can quite
safely assume this to be the case. (The speed of light is high but its
frequency is even higher. This leads to the wavelength being very small. Also,
the relative frequency range of visible electromagnetic radiation is much
narrower than the range for audible sound.) With sound we bump into a
relatively wide frequency range and feature sizes in our environment which sit
right in the middle of audible wavelengths. This means that sound diffraction
in our surroundings is often considerable and can only be neglected if few
obstacles are present, sound sources can be considered point‐like,
enough damping is present and reflective surfaces are simple enough.
Before any mathematical treatment of sound is possible, we must represent it somehow in the language of mathematics. To do this, we note what sound is: it is just time‐dependent pressure variation. Furthermore, by taking a point in space, we can represent sound at this point with a single number, the pressure. When there is no sound, the pressure is just the normal atmospheric pressure (around 100000 Pascals in the average), so it would be a good idea to assign numbers with respect to this level. So we represent the pressure at our point by telling how much the pressure differs from long term average air pressure—rarefaction results in negative values, compression in positive. What scale we use does not much matter—since most DSP is linear, the same basic concepts apply regardless of scale. Now that we have chosen a pressure scale, we just present the pressure as a function of time. If we want a more complete description of the sound field, we take more points and form a vector (a list of numbers, basically) of the pressures in those points and represent this vector as a function of time. Usually we do not use more than two to four points since the resulting description mostly suffices for audio systems. Most people have never had a chance to hear anything exceeding two channels (i.e. stereo).
So we now have functions of time. These we call signals. They can be
represented by voltages or currents on electric circuits and wires (this is the
way microphone cables, amplifiers and most consumer audio equipment works), as
grooves of varying depth on an LP, as numbers of some given precision on a
computer or as numbers encoded in the tiny pits and ridges of a CD.
Mathematically we treat these functions as mappings from real numbers to real
numbers (i.e. for each possible instance of time, we assign an infinitely
accurate measure of pressure). In digital systems, we present a string of
numbers which give a sufficiently accurate measure of the pressure at points
sufficiently close to each other in time (these numbers are called samples
and under proper conditions, they represent the original signal with near
perfect quality). (See the first section of the chapter on DSP for a closer
look at sampling.) Having got used to thinking about sound in terms of signals,
we often equate these. This makes it possible to use mathematical terminology
(which is suitable for signals) to describe what happens or is to be done to
sound. It may sound a bit strange, for instance, to talk about
sound. Thought of as a sequence of numbers, it makes perfect sense. Especially since we aim at understanding DSP as well.
Not every sound has a frequency—no repetition, no frequency. However, measured at a point, every sound has an amplitude. This means roughly the same as the strength of the sound and could be defined in a variety of ways. We pick one and speak of (peak) amplitude, defined as the difference between maximum compression and maximum rarefaction that our sound wave causes during a given period of time. The term can also be used without exact, mathematically defined meaning to mean the (relative) strength of the sound (with respect to another).
When we present some sound to people, we soon realize that amplitude (peak‐to‐peak pressure variation) is not very significant perceptually. Instead, average power seems to be. This is why most volume monitors use an RMS (Root Mean Square) scale.
This is a time localized estimate of the average signal power, and is
calculated by squaring the signal, taking a weighted average over a period of
time and then taking a square root. Why should this work? One reason is that
power is preserved in Fourier decompositions whereas amplitude is not. Since we
process signals mainly in a frequency decomposed form, it is to be expected
that time‐domain characterizations which can be directly
translated to frequency domain should work the best. As the ear seems to do
filterbank analysis (as opposed to real Fourier
analysis which really has
infinite memory), time‐localized
averaging should not come as a surprise, either.
Now, the dynamic range of human hearing is exceptionally wide—the amplitude ratio of the softest sound heard to the loudest noise tolerated is in the vicinity of 100 000 000 to 1 (hundred million to one) with most resolution in the quiet end. Around 1kHz people tend to classify a ten‐fold increase or decrease in sound energy as a doubling or halving, respectively, of perceived loudness. This means that a suitable scale for sound amplitude is not linear, but logarithmic. Values from this scale are called sound pressure levels (SPL) and their unit is the decibel (dB). It is defined as twenty times the base ten logarithm of the ratio of sound pressure variation (effective level) to the one of the softest sound heard by an average human (the threshold of human hearing, defined as 20 micropascals peak variation for a 1kHz sine wave). This means that 0dB equals the threshold and a twenty decibel increase in decibels means a ten‐fold increase in pressure variation. To illustrate, going from 0dB to 140dB means multiplying by , so 140dB SPL equals a variation of 200 Pascals (effective level)—plenty. Ever wonder why going from 80dB to 100dB is considered harmful while 60dB to 80dB isn’t?
Yet another amusing calculation reveals that with a sinusoid of 196dB SPL, the rarefying part of the fluctuation reaches vacuum. This is the theoretical limit on sinusoidal pressure fluctuations in normal atmospheric pressure, then. (Compressive impulses can, of course, reach much higher SPLs; cf. the hydrogen bomb.) Doubling the pressure variation, an increase of 3dB SPL is achieved. When we think a bit, we see that if two sounds with a significant SPL difference (say, over 15dB) are added together, their relative difference is much greater than we would think. In effect, adding a 30dB SPL sound to one of 60dB does not increase the SPL significantly beyond 60dB.
Similarly we define the intensity level (ten times the logarithm
of the ratio of sound intensity to a reference intensity of
Now, although it was established a while ago that not all sounds need to have a properly defined frequency, the concept of frequency still has its uses. This is because, as we shall see later on, it is quite possible to uniquely construct signals from sine waves with definite frequencies. This makes it possible to talk about frequency ranges of any signal—we break the signal into sine waves and discard everything but the frequencies of interest. This can also be accomplished directly. Such ranges (called bands or subbands) can then be processed and analysed separately, which, of course, is precisely what goes on when we watch the spectrum analyzer on a hip soundsystem, crank up the bass on a car stereo or speak through a telephone (which constitutes a severely bandlimited channel). Simultaneously measuring the relative contributions of all the different frequency ranges in a signal gives rise to the spectrum of a sound. Depending on the way in which we extract the subbands, we arrive at different kinds of spectra. Nevertheless, they all give some sort of budget of how much bass, middle and trebble our signal has. Since our ear performs an analysis somewhat reminiscent of the kind described above, spectra are invaluable in discussing and analysing sound and related technology. Even when working with the kind of simple, intuitive definition of the kind given above.
We want to defer the introduction of math, so any rigorous treatment of spectra (amongst other things) is necessarily postponed as well. This will leave some holes and vaguely defined concepts, here. Be forewarned when we use such terms as periodic, continuous, discrete, spectrum and so on.
Most traditional acoustic research has centered around highly reductionistic approaches, such as using anechoic chambers, sinusoid test tones and so on. In the real world, however, we never encounter strictly periodic sounds, let alone pure sinusoids—musical sounds are never pure enough and in addition are strictly time limited. In fact, most musical sounds do not even approximate periodic behavior. To get a hold on the following topics, we need to classify sounds a bit further, and to establish an intuition as to how the different types of tones behave and what they sound like.
Periodic sounds we have already seen. The simplest example is the sine wave.
All periodic sounds repeat over and over, reaching
over all of time. It is clear that such sounds do not really exist, but they
are a neat conceptual tool when analyzing sounds which are locally stable. This
can be done after a system in a sense no longer remembers that some input has
a finite instance ago, that is, any transient phenomena have diminished
sufficiently. As to why we would go with periodic analysis, periodic signals
have extremely nice properties. For instance, frequency is a concept which is
only defined for signals which are periodic. If we look at the spectrum of a
periodic signal, we quickly learn that only whole multiples of some fundamental
frequency (harmonics) are present. Later,
when stated formally, this notion leads to the classic theorem on Fourier
This does not imply the fundamental or all the harmonics need to be present. When they are not, the actual frequency (rate of repetition) of the signal can be higher than the fundamental frequency. In fact, one can always think of a series of harmonic partials as containing only some of the even harmonics instead of successive harmonics. This leads to the fundamental being somewhat lower than before the shift in the point of view. Consequently the concept of fundamental frequency is not very well‐defined and certainly does not relate uniquely to the actual frequency of the signal. This permits some interesting acoustical illusions and even serious musical applications.
Investigating a bit further we find that the relative amplitudes and phases of constituent harmonics uniquely determine a periodic signal. Later we shall see that the absolute phases of the harmonics in a periodic sound actually matter little to us, and even the amplitudes are perceived a bit vaguely. There is no time information, either. This means that there are actually not so many perceptually separate periodic tones. Further, all of them sound extremely dull and sterile.
The importance of periodic signals and their spectra lies in the fact that they are exceedingly simple mathematically—periodic sounds avoid the topological complications of Fourier analysis. They lead to the Fourier series which is discrete and as such quite simple to understand and derive. The Fourier series serves as a starting point for the construction of the discrete Fourier transform which is of pivotal importance in DSP. More about all this in the math section.
In the previous we established that every periodic signal can be constructed from harmonics of some fundamental. Now, nobody says we cannot add together partials which are not in harmonic relationship with each other. When we do this, we obtain quasi‐periodic signals. These sounds still have discrete spectra, but they need not be periodic. Quasi‐periodic sounds are more relevant to musical acoustics than periodic ones—locally the steady‐state part of an instrumental sound is usually best described as being quasi‐periodic. Again we assume that all the partials are in the audible range. Unlike periodic signals, quasi‐periodic ones can have some time content—closely spaced partials beat against each other, possibly contributing to harshness and time evolution in the composite tone. Inharmonic partials often lead to bell‐like or metallic timbres, or even chord or noise like textures if many enough partials are present. No strict time features emerge, however, because any transient content would necessarily imply a continuous spectrum. For the same reason, any sound with a discrete spectrum will reach indefinitely back and forth in time.
Finally we have signals with continuous (in the strict sense) spectra, i.e. aperiodic signals. Sounds like these can be practically anything, but they never display truly periodic time‐domain behavior. Usually white noise is given as an example, but actually all time localized and discontinuous signals belong to this class. All transients (because they are time‐localized) and physical signals (because they have finite energy) also have continuous spectra.
Strictly speaking, noise is mathematically defined in terms of its
generating process and some statistical properties of that process. The actual
signals we process are just examples (a numerable collection of which
is, in proper mathematical terms, an ensemble) of what such a
process can produce, and should be strictly separated from the process itself.
This means that mathematically derived spectra for stochastic processes are
expectations—they relate to
real spectra like the expected result of
half heads and half tails relates to an actual experimental record of coin
tosses. In statistical analysis, a property called ergodicity
then guarantees that averages taken in the time domain faithfully represent the
properties of the stochastic process across its ensemble, so we can often handwave the distinction between the properties of the
process and the properties of its example output. (Ergodicity
guarantees that time averages taken over one output equal those taken over all
the signals in an ensemble.) One should keep in mind that they are not the same
thing, however. Otherwise one runs into some deep math. To get rid of the
process description and to work solely on time series, one must first consider
such fun subjects as information theory, Kolmogorov
complexity, Bayesian statistics and estimation theory, to mention a few. Those
are topics well outside both the scope of this presentation and the
capability of the author.
Copyright © 1996–2002 Sampo Syreeni; Date: 2002–11–21;