Voices and strings: Close
cousins or not?
A. Askenfelt
Introduction
It is obvious that the singing voice and the strings not are close
cousins, as long as
we restrict our
comparison to external properties like the sound generating mechanisms or the
appearance of the bodies. However, at the same time as there are large
differences with regard to how the sound is generated and controlled, there are
indeed some striking similarities in the output from the two instruments which
justifies a formal cousin-ship. In particular, they both seem to possess a
remarkable freedom in the shaping of the sounds.
BASICS STRINGS AND VOICES
The source-filter view
We start our comparison using the familiar source-filter concept (Fast,
1960), see Fig. 1. In the violin the source is the vibrating string(s),
normally excited by the bow, and the filter is the sound box together with the
bridge. For the voice the source is a pulsating airflow produced by the
vibrating vocal folds and the filter is the vocal tract.
The waveform of the string under the bow is in principle triangular,
with the proportions between the short and long parts determined by the
position of the bow relative to the bridge (Helmholtz, 1877), see Fig. 2.
During the flatter part of the curve the string sticks to the bow and moves
with the bow velocity, during the steep part it slips back with a much higher
velocity, the higher the shorter the distance between bow and bridge. So, when
playing with the bow at a sixth of the string length from the bridge-a normal
position-the string will be slipping during a sixth of the cycle, five times
faster than the bow velocity.
SOURCE FILTER STRINGS
Bow-string Sound box VOICE Larynx Vocal tract WINDS Lips-reed Pipe &
bell

Fig. 1. The source - filter model.
In order to get a louder note the string player must increase the
vibration amplitude either by increasing the bow velocity, or alternatively, by
playing closer to the bridge while keeping the same bow velocity. Playing
closer to the bridge requires a higher "bow pressure" (more correctly
"bow force"), a factor which also influences the sharpness of the corners
in the waveform, and consequently the strength of the high frequency partials.
The source waveform of the voice is the glottal flow, in principle a
more or less steep, asymmetric hump, during which the vocal folds are open,
alternating with portions of zero flow during which the folds are closed (see
Fig. 3). In order to get a louder note, the singer must increase the flow by
supplying a higher subglottal pressure from the lungs. A higher flow normally
gives a more abrupt closure of the folds which of course also boosts the high
frequency partials. A similar effect can
also be achieved without increasing the flow by approximating the folds to a
more close condition (adduction).
Fig. 2. Waveforms (displacement and velocity) of the string under the bow.
Also

Fig. 3. Principal sketch of the glottal waveform. (Adapted from Sundberg, 1987).
It is customary to refer the frequency dependence of the radiation
process to the source, although it actually occurs at the lips or at the
surface of the violin body. Including the correction for the radiation, it is a
reasonable approximation to say that the source spectrum of both voice and
strings decreases with 6 dB/oct.
We now leave the source and turn to the filter. The vocal tract is
conveniently described by some four or five resonances, the formants, with
moderate sharpness (see Fig. 4). In contrast, the transmission properties of
the violin body are characterized by set of sharp resonances, so numerous that
only the lowest dozen can be identified, and of this handful only two or three
could be individually controlled by the skilled violin maker (see Fig. 5).
Above 1200 Hz the resonances merge into a statistical continuum. The large difference
in the number of resonances in the filter means that for the strings there are
always several "unused" resonances between two partials, while for
the voice there are several partials within each formant, the high pitches
excluded.
Strings versus winds
Why are we discussing a possible cousin-ship between strings and voices
only? Why are not the winds, like the flute, clarinet, and trumpet, close
cousins to the voice? At least they all need air in order to sound. The
engineering answer is that the coupling between the sound source and the filter
is too strong, or in other terms, the impedance matching between the two parts
is "too good."

Fig. 4. Transmission characteristics of the vocal tract for a male singer's
vowel [a].
This means that for the winds we cannot draw a distinct borderline
between the source (which must have something to do with the vibrating lips or
reed) and the filter (which in some way is defined by the pipe and bell) and
the model collapses. If we still want to think about the winds in terms of the
source-filter model we must add a strong feedback path, but often the winds are
better described in other terms
(Benade, 1976; Dudley &
Strong, 1990). In fact, the coupling
between source and filter in the winds is so strong that the resonances of the
"filter" will control the oscillating frequency. The wind player must
thus change the resonance frequencies of the instrument during playing, and
several ingenious methods for shortening or lengthening a pipe within a
hundredth of a second have been developed (cf. clarinet - trombone). In passing
we must admit that a small amount of source-filter coupling also exists in the
voice and strings, but in principle the string and vocal chords vibrate
independently of the sound box and vocal tract.
The weak coupling between source and filter in the strings and voice
means that the partials will enter and leave the resonances (formants)
according to the selected fundamental frequency, while for the winds the
partials will lock to the resonance frequencies of the pipe. As a consequence,
the relative strengths of the partials in the strings and voice may vary
dramatically even for a small shift in fundamental frequency, in contrast to
the winds where the spectrum envelope would stay essentially constant.

Fig.5
The resonance frequencies in the winds have to be carefully aligned in a
harmonic series by the maker in order to have the instrument sound properly.
This requirement complicates wind instrument making considerably, and turns it
into craftsmanship similar to that of the violin maker. In the strings and
voices no particular relationships between the resonance frequencies are
required to make the instrument play. However, in the attempts of reaching the
level of the old Italian violins, violin makers have
since long been discussing the magics of the relationships between small
integers.
Vibrato A pronounced vibrato is a common
feature of both voice and strings. While the rate of the vibrato is in the same
range, 5 - 7 Hz, the singer often uses much more of this effect. A singer's
vibrato of ± 50 ˘ is by no means excessive, and even ± 100 ˘ (up-and-down a
semitone) is perfectly acceptable in operatic singing. The violinist seldom
exceeds ± 40 ˘ (SOUND EXAMPLE 1). A slightly slower vibrato rate the lower the
pitch range seems to be common characteristic of both strings and singers.
The vibrato causes an amplitude modulation of the partials due to the
peak-andvalley structure of the transmission in the
vocal tract and violin corpus. During the vibrato cycle each partial will
explore a certain range on the filter curve (see Fig. 6). In case of a generous
vibrato as in singing these ranges will overlap for the higher partials.
The amplitude modulation could be expected to be stronger for the
strings than for the voice because of the sharper resonances of the instrument
body. This was also found to be the case. Despite a more restrictive vibrato in
the violin example in Fig. 6, the variation in the partial amplitudes is much
larger for the violin than for the voice (see Fig. 7). It could be assumed that
this fine structure in the partial amplitudes-which
may change drastically even for small changes in fundamental frequency or
vibrato amplitude-is a key characteristic of strings and voices.
The singer's possibilities of changing the rate of the vibrato are
limited; rather it is a personal characteristic which slowly changes over the
years. The violinist on the other hand, can use the vibrato rate as a means of
expression, a faster vibrato emphasizing a note or passage. It is also
interesting to note that the character of tile vibrato in string playing will
be slightly different for different fingerings, depending on the position of
the hand and arm. The most relaxed vibrato is usually achieved when stopping
the string with the middle finger. As for singers the vibrato has a strong
personal quality, which may be enough to identify the player (SOUND EXAMPLE 2).

Fig. 6.
Frequency ranges explored by individual partials during a vibrato cycle.
Synthesized male singer (G3 = 196 Hz), vibrato range ± 6% = ± 100 ˘ (top), and real violin (D4 = 293 Hz),
vibrato range ± 2.5% = ± 45 ˘ (bottom).
Aperiodicity The waveform of the vocal folds or of the string does not repeat exactly identically from period to period, a phenomena usually referred to as waveform perturbations.

Fig. 7. Amplitude modulation of partials due to vibrato; singer (top) and violin (bottom). Same examples as in Fig. 6.
variations
in amplitude.
In the glottis the aperiodicity is due to an irregular distribution of
the tissue in the vocal folds, among other things. In the bowed instruments,
the variability is caused by local variations in the friction properties along
the bow hair, imperfections in the string, as well as rosin and dirt
accumulated on the string. Also the fingering influences the aperiodicity. A
string stopped by a finger shows a much larger amount of jitter than an open
string. The aperiodicity is much larger for the voice than for the strings;
typically the jitter is about ten times higher for a normal voice than for a
stopped string (Askenfelt & Hammarberg, 1986;
McIntyre et al, 1981).
The aperiodicity adds a "live" quality to the sound which
enhances the naturalness of string and voice synthesis. Too much, however,
gives "roughness", which is a characteristic of the pathological
voice as well as the beginner in string playing. Interestingly, the
"unclean" and somewhat wiggly waveform of a real violin string
carries the essential information needed to give a stringy character of a
synthesis (SOUND EXAMPLE 3). After adding a reasonable approximation of the
resonances of the violin body it may be hard to distinguish such an electronic
violin from a normal acoustic violin (SOUND EXAMPLE 4; Mathews, 1973).
Spectral balance
The spectral content of the voice and strings changes with the dynamic level. As mentioned, when increasing the dynamic level from pp to ff not only the amplitude grows but also the amount of high frequency partials (see Fig. 8). Almost all traditional instruments show the same spectral behaviour, although the mechanisms utilized for this purpose are different indeed. Few people would guess that the slight string players old-fashioned use of horse hair rubbed with rosin, but they all serve the same purpose; to obtain a nonlinear sound source in which the high-frequency components increase with dynamic level. It would be tempting to conclude that we appreciate instruments with this spectral behaviour because of the daily indoctrination with speech sounds since our earliest age.

Fig. 8.
Evolution of the spectral content for a violin with increasing dynamic level (PP -mf-ff),G3=196
Hz.
In contrast to other instruments, both singers and string players can
control the balance between the low and high-frequency partials rather
independently of the dynamic level (see Fig. 9). A note at a certain dynamic
level is thus not associated with a fixed harmonic content but can be modulated
according to the musical context. In other instruments this variation is much
more restricted. For example, in the piano the harmonic content is set
exclusively by the dynamic level (as far as we know today; Askenfelt & Jansson, 1990). In that case "the treble control is
directly connected to the volume-knob."
The singer controls the spectral balance by conditioning the larynx. A less tense conditioning with loosely approximated vocal folds gives a "flow" quality, characterised by a pronounced flow of air and relatively strong lower harmonics. A condition with close approximation of the folds on the other hand, gives a "pressed" quality with weaker fundamental and stronger high-frequency harmonics.

Fig. 9.
Comparison of the spectral balance for "press" (circles) and
"flow" (thin line) conditions (G3 = 196 Hz); baritone [al (top), violin (bottom).
The string player exerts the same control via the bow. A rapid bowing
far from the bridge with only a light bow pressure gives a "flow"
quality, while playing close to the bridge with (a necessarily) lower bow
velocity and higher bow force gives a, "pressed" version, without
changing the overall level drastically (Fig. 9 and SOUND EXAMPLE 6).
Pitch control
The singer and the string player are completely free to control the
fundamental frequency continuously, the only restriction being that they should
be performing "in tune." Apparently, our perception has very
different limits for this criterion depending on the circumstances. The singer
may sometimes seem to stray away unconcerned of the accompaniment, and yet it
would be unfair to judge the performance as out of tune (SOUND EXAMPLE 7).
Measurements have indicated that deviations of 50 - 70 ˘ from the expected
values, i.e. with reference to the accompaniment, can be perceived as in tune, in particular when the deviation is on the sharp side
and/or falls on an unaccented beat (Sundberg, 1979). Similarly, the soloist in
a violin concerto may occasionally deviate from what could be called descent
tuning compared to the orchestra, yet the impression is normally not "out
of tune", rather an individual contrasting against a large group.
At the same time as the freedom in pitch control is large, and sometimes
used, the precision can be high when needed. This occurs in ensembles which use
only a slight or even no vibrato at all, such as barber-shop singing and string
quartets (SOUND EXAMPLE 8). In these cases beats between mistuned notes in
chords will be measured spread of notes in repeated barbershop chords is of
tile order ± 3 c (Hagerman & Sundberg, 1980).
In synthesis of string quartet music it has been found that contrasting
intonation of the melody line according to the melodic and harmonic tension
adds to tile performance. These deviations are of tile order ± 2() ˘ (Sundberg et al, 1989). However, during long notes
in all parts it is better to adjust to as heatless tuning as possible, again on
the level bf a few cents.
Radiation The radiation of the sound is a point in which the voice rind
strings differ
Why are the strings also four?
Nature has divided the human voices in four classes covering different
ranges in pitch-soprano, alto, tenor, and baritone/bass. The four sizes of
strings in use today-violin, viola, cello, and double bass-cover approximately
the same total pitch range which seems reasonable, but why do they also come in
four sizes? The easy answer is that the violin cannot be made much smaller, and
the double bass not much bigger without making them very inconvenient to play.
In between it has been found sufficient with two intermediate sizes, although
some compromises have been made. For example, the viola is almost to big to be held under the chin. The
cello, which happens to fit comfortably to the size of a normal player,
struggles with the balance between a full sound in the bass, requiring a
relatively large body, and a sonorous tenor and treble range, which suggests a
smaller instrument with light top and back plates.
Returning to the viola, the body of this super violin-playing a fifth
below the normal violin- ought to be even bigger to make a fair match to the
violin and cello in radiated power. But this is in a sense also true for alto
voice, who often doesn't need a larger body, but
instead longer vocal chords and a longer vocal tract, in order to compete with
the soprano and tenor. So, in both voices and strings the alto part is
relatively weak, in particular in its lower range. This coincidence is probably
not due to pure chance. The number of four is also probably not arbitrary,
because we have had the four-size string group around us so long (about four
centuries) that new members would have entered if seriously needed.
Surprisingly, the traditional string instruments are in fact not very
close brothers or sisters because of small but important differences in the
design. Among other things, the proportions of the bodies are not exactly
similar, and consequently their sets of resonances will be differently grouped,
giving each size a characteristic tone character (Askenfelt, 1982). It is, for
example, not particularly difficult to separate the violin from the viola in a
string quartet, even when they play in tile sallic
range. The same is true for the viola versus the cello.
Rather recently interesting attempts have been made to make a string ensemble
with more than four members, and in which the resonances of the instrument I
are scaled according to their playing range. This ought to give an ensemble in
the individuals are more similar in character and thus blends better together
(; EXAMPLE 9). This New Violin Octet is a unique example of a set of n instruments
which has been given an acoustical design by calculation before, were built,
rather than by the traditional trial-and-error procedure (Hutchins, Time will
answer if the musical potential of this ensemble is high enough to n survive.
THINKING OF MUSIC PERFORMANCE
Phrasing By phrasing we usually mean a grouping
of the events in a piece of music meaningful structure. In order to constitute
a phrase the notes must be kept together in time, have a similar timbre and
also have a smooth evolution amplitude. A break in any
of these respects will give a signal to the listener that the phrase has come to an end.
LUNGS AND BOWS
Phrasing in singing is closely connected with the lung volume, more
specifically with the vital capacity which is about 5 1 for a male and 4 1 for
a female When phonating, part of this volume is used to drive a more or less
steady fl air through the larynx under a controlled pressure, the subglottal
pressure singer controls this pressure by balancing the restoring forces of the
expanded thorax against an applied force from the diaphragm. As the air in the In gradually consumed, the singer has to rely more and more
on the activity diaphragm and eventually the pressure drops. This may be a
catastrophe if the not has reached a phrase boundary, because as mentioned, the
subglottal press the main control of loudness. Even though singers are capable
of maintaining phrase for more than 20 s by a restrictive use of air, most
singers will 1 inconvenient to make a phrase longer than 8 - 10 s (Sundberg,
1987).
The string player does not need to care about the lung volume, but
instead has to worry about the length of the bow, about 60 cm. It is relatively
easy to J phrase of notes which can be played in the same bow stroke, but it
takes a ma continue the phrase over the bow change. (The term "bow
change" denotes the direction of the bow motion
is reversed.) At moderate dynamic levels a bow may typically last for 6 s.
However, if needed the player can "save bow" and 1 a stroke to about
15 s (Askenfelt, 1986).
In contrast to the singer who performs only during the exhalation of
air, the player can make use of the bow in either direction,
downbow and u respectively. Strangely enough, the character of a downbow is
completely different from that of an upbow. This is in
large part due to way the bow is held; between tile fingers in such a way that
it can pivot and act as a lever. Tile weight of tile bow thus contributes much
more to the bow force when close to the handle, the frog, than at the tip. For
example, in order to obtain an even bow force, and hence an even toile quality,
during a full stroke tile player has to balance the contribution from gravity
by pressing oil top of the bow stick with tile index or small finger
respectively. When close to the tip the player must help gravity by pressing
with tile index finger in front of the fulcrum (defined roughly by the thumb
and middle finger), and when close to the frog gravity must be counterbalanced
by pressing with tile small finger behind the fulcrum. Another consequence of
the bow grip is that loud passages requiring a high bow force are preferably
played on the lower half of tile bow (towards the frog), and vice versa
(Askenfelt, 1989). The complex task of supplying an adequate bow force at all
times during the bow strokes gives strong indications on how to organize the
patterns of up and down-bows in a piece of string music. In this way the music
itself suggests a "natural" phrasing via tile bowing.
SPECTRAL MICRO-CONTROL
As mentioned, the string player as well as the singer has a unique
control of tile spectral properties both with regard to the overall balance,
tile extremes being tile "flow - press" conditions, as well as the
temporal fine structure via the vibrato. It is probably so that this ability,
when used to form phrases during which the spectrum balance and vibrato
continuously are given slightly different shadings (without chopping the phrase
in parts), is one very prominent similarity between tile strings and voices.
Most one-voiced instruments are capable of generating a smooth amplitude
contour with notes of steady quality, which in itself is enough to signal the
phrase structure, but the micro variations within the phrase is a completely
other matter. This is probably one of the reasons why most concert halls
announce at least half a dozen violin concertos before a solo trumpet or
clarinet is allowed.
Conclusions Despite large differences with regard to how the sound is
produced and controlled, the strings and voice seem to share some key
characteristics. Of these tile following two could be assumed to be particularily important.
1. The sound source-the larynx or the bow-string-is clearly separated
from the filtering part-the vocal tract or the sound box. This allows the
partials to enter and leave the resonance peaks in the filter according to the
selected fundamental frequency. In particular, the vibrato causes a pronounced
modulation of the partial amplitudes due to the multiple resonance peaks in the
filter curves.
2. The spectral balance can be controlled rather independently of the
dynamic level. This opens a communication dimension with the "flow -
press" qualities as tile ,xtremes.
Other properties discussed which show more or less pronounced
similarities between strings and voice include: rate and extent of vibrato,
degree of aperiodicity, accuracy in pitch control, and the possibility of
forming continuous streams of notes (phrases). However, there are reasons to
believe that discrepancies in these respects are of secondary importance
compared to the two similarities first mentioned.
Coda Admittedly, the presentation has been
somewhat vague in pointing out distinctive acoustic characteristics which could
justify a formal cousin-ship between tile strings and the singing voice.
However, despite of considerable gaps in the chain of acoustic evidences for
such a relationship it is the author's personal opinion that they indeed are
close relatives. The last sound example may perhaps convince sorile readers
(SOUND EXAMPLE 10). An excerpt from a song without words is performed, first by
a soprano, somewhat affettato, followed
by a string baritone, a little rough and serious-minded. If you still not agree
on that the singing voice and strings are close cousins after listening to
these examples, acoustic data will never convince you!
References Askenfelt, A. (1982). Eigenmodes and tone quality of the double bass. Speech Transmission
Laboratory Quarterly Progress and Status Report, Royal Institute of Technology,
Askenfelt, A. (1986). Measurement of bow motion and
bow force in violin playing. J. Acoust. Soc.
Am., 80,1007-1015.
Askenfelt, A. (1989). Measurement of the bowing parameters in violin
playing II: Bow-bridge distance, dynamic range, and limits of bow force. J. Acoust. Soc. Am., 86, 503-516.
Askenfelt, A. & Jansson,
E. (1990). In Askenfelt A. (ed.), Five Lectures
oil the Acoustics of the Piano,
Benade, A. H. (1976). Fundamentals of Musical
Acoustics,
Dudley, D. & Strong, W. (1990). A computer study of the effects of harmonicity in a brass wind
instrument: Impedance curve, impulse response, and mouthpiece pressure with a
hypothetical periodic input, Applied
Acoustics. 30, 117-132.
Fant, G. (1960). Acoustic theory of speech
production, Mouton, The Hauge. Hagerman, B. & Sundberg, J. (1980). Fundamental
frequency adjustment in barbershop singing. J. of Research in Singing, 4(1), 3-17.
Helmholtz von, H. (1877). On the Sensations of Tone (English
ed.),
Hutchins, C.M. (1967). Founding a Family of Fiddles.
Physics Today, 20, 23-37. Marshall, A.H. & Meyer, J. (1985). The
directivity and auditory impressions of singers. Acustica,
58, 130-140.
Mathews, M.V. & Kohut,
1. (1973). Electronic simulation of
violin resonances, .1. Acoust. Soc. Am., 53, 1620-1626.