Voices and strings: Close cousins or not?

A. Askenfelt



It is obvious that the singing voice and the strings not are close cousins, as long as

we restrict our comparison to external properties like the sound generating mechanisms or the appearance of the bodies. However, at the same time as there are large differences with regard to how the sound is generated and controlled, there are indeed some striking similarities in the output from the two instruments which justifies a formal cousin-ship. In particular, they both seem to possess a remarkable freedom in the shaping of the sounds.


The source-filter view

We start our comparison using the familiar source-filter concept (Fast, 1960), see Fig. 1. In the violin the source is the vibrating string(s), normally excited by the bow, and the filter is the sound box together with the bridge. For the voice the source is a pulsating airflow produced by the vibrating vocal folds and the filter is the vocal tract.

The waveform of the string under the bow is in principle triangular, with the proportions between the short and long parts determined by the position of the bow relative to the bridge (Helmholtz, 1877), see Fig. 2. During the flatter part of the curve the string sticks to the bow and moves with the bow velocity, during the steep part it slips back with a much higher velocity, the higher the shorter the distance between bow and bridge. So, when playing with the bow at a sixth of the string length from the bridge-a normal position-the string will be slipping during a sixth of the cycle, five times faster than the bow velocity.


Bow-string Sound box VOICE Larynx Vocal tract WINDS Lips-reed Pipe & bell

Fig. 1. The source - filter model.

In order to get a louder note the string player must increase the vibration amplitude either by increasing the bow velocity, or alternatively, by playing closer to the bridge while keeping the same bow velocity. Playing closer to the bridge requires a higher "bow pressure" (more correctly "bow force"), a factor which also influences the sharpness of the corners in the waveform, and consequently the strength of the high frequency partials.

The source waveform of the voice is the glottal flow, in principle a more or less steep, asymmetric hump, during which the vocal folds are open, alternating with portions of zero flow during which the folds are closed (see Fig. 3). In order to get a louder note, the singer must increase the flow by supplying a higher subglottal pressure from the lungs. A higher flow normally gives a more abrupt closure of the folds which of course also boosts the high frequency partials. A similar effect can also be achieved without increasing the flow by approximating the folds to a more close condition (adduction).


Fig. 2. Waveforms (displacement and velocity) of the string under the bow. Also


Fig. 3. Principal sketch of the glottal waveform. (Adapted from Sundberg, 1987).

It is customary to refer the frequency dependence of the radiation process to the source, although it actually occurs at the lips or at the surface of the violin body. Including the correction for the radiation, it is a reasonable approximation to say that the source spectrum of both voice and strings decreases with 6 dB/oct.

We now leave the source and turn to the filter. The vocal tract is conveniently described by some four or five resonances, the formants, with moderate sharpness (see Fig. 4). In contrast, the transmission properties of the violin body are characterized by set of sharp resonances, so numerous that only the lowest dozen can be identified, and of this handful only two or three could be individually controlled by the skilled violin maker (see Fig. 5). Above 1200 Hz the resonances merge into a statistical continuum. The large difference in the number of resonances in the filter means that for the strings there are always several "unused" resonances between two partials, while for the voice there are several partials within each formant, the high pitches excluded.

Strings versus winds

Why are we discussing a possible cousin-ship between strings and voices only? Why are not the winds, like the flute, clarinet, and trumpet, close cousins to the voice? At least they all need air in order to sound. The engineering answer is that the coupling between the sound source and the filter is too strong, or in other terms, the impedance matching between the two parts is "too good."

Fig. 4. Transmission characteristics of the vocal tract for a male singer's vowel [a].

This means that for the winds we cannot draw a distinct borderline between the source (which must have something to do with the vibrating lips or reed) and the filter (which in some way is defined by the pipe and bell) and the model collapses. If we still want to think about the winds in terms of the source-filter model we must add a strong feedback path, but often the winds are better described in other terms

(Benade, 1976; Dudley & Strong, 1990). In fact, the coupling between source and filter in the winds is so strong that the resonances of the "filter" will control the oscillating frequency. The wind player must thus change the resonance frequencies of the instrument during playing, and several ingenious methods for shortening or lengthening a pipe within a hundredth of a second have been developed (cf. clarinet - trombone). In passing we must admit that a small amount of source-filter coupling also exists in the voice and strings, but in principle the string and vocal chords vibrate independently of the sound box and vocal tract.

The weak coupling between source and filter in the strings and voice means that the partials will enter and leave the resonances (formants) according to the selected fundamental frequency, while for the winds the partials will lock to the resonance frequencies of the pipe. As a consequence, the relative strengths of the partials in the strings and voice may vary dramatically even for a small shift in fundamental frequency, in contrast to the winds where the spectrum envelope would stay essentially constant.


The resonance frequencies in the winds have to be carefully aligned in a harmonic series by the maker in order to have the instrument sound properly. This requirement complicates wind instrument making considerably, and turns it into craftsmanship similar to that of the violin maker. In the strings and voices no particular relationships between the resonance frequencies are required to make the instrument play. However, in the attempts of reaching the level of the old Italian violins, violin makers have since long been discussing the magics of the relationships between small integers.

Vibrato A pronounced vibrato is a common feature of both voice and strings. While the rate of the vibrato is in the same range, 5 - 7 Hz, the singer often uses much more of this effect. A singer's vibrato of ± 50 ˘ is by no means excessive, and even ± 100 ˘ (up-and-down a semitone) is perfectly acceptable in operatic singing. The violinist seldom exceeds ± 40 ˘ (SOUND EXAMPLE 1). A slightly slower vibrato rate the lower the pitch range seems to be common characteristic of both strings and singers.

The vibrato causes an amplitude modulation of the partials due to the peak-and­valley structure of the transmission in the vocal tract and violin corpus. During the vibrato cycle each partial will explore a certain range on the filter curve (see Fig. 6). In case of a generous vibrato as in singing these ranges will overlap for the higher partials.

The amplitude modulation could be expected to be stronger for the strings than for the voice because of the sharper resonances of the instrument body. This was also found to be the case. Despite a more restrictive vibrato in the violin example in Fig. 6, the variation in the partial amplitudes is much larger for the violin than for the voice (see Fig. 7). It could be assumed that this fine structure in the partial amplitudes-which may change drastically even for small changes in fundamental frequency or vibrato amplitude-is a key characteristic of strings and voices.

The singer's possibilities of changing the rate of the vibrato are limited; rather it is a personal characteristic which slowly changes over the years. The violinist on the other hand, can use the vibrato rate as a means of expression, a faster vibrato emphasizing a note or passage. It is also interesting to note that the character of tile vibrato in string playing will be slightly different for different fingerings, depending on the position of the hand and arm. The most relaxed vibrato is usually achieved when stopping the string with the middle finger. As for singers the vibrato has a strong personal quality, which may be enough to identify the player (SOUND EXAMPLE 2).

Fig. 6. Frequency ranges explored by individual partials during a vibrato cycle. Synthesized male singer (G3 = 196 Hz), vibrato range ± 6% = ± 100 ˘ (top), and real violin (D4 = 293 Hz), vibrato range ± 2.5% = ± 45 ˘ (bottom).

 Aperiodicity The waveform of the vocal folds or of the string does not repeat exactly identically from period to period, a phenomena usually referred to as waveform perturbations.

Fig. 7. Amplitude modulation of partials due to vibrato; singer (top) and violin (bottom). Same examples as in Fig. 6.

variations in amplitude.

In the glottis the aperiodicity is due to an irregular distribution of the tissue in the vocal folds, among other things. In the bowed instruments, the variability is caused by local variations in the friction properties along the bow hair, imperfections in the string, as well as rosin and dirt accumulated on the string. Also the fingering influences the aperiodicity. A string stopped by a finger shows a much larger amount of jitter than an open string. The aperiodicity is much larger for the voice than for the strings; typically the jitter is about ten times higher for a normal voice than for a stopped string (Askenfelt & Hammarberg, 1986; McIntyre et al, 1981).

The aperiodicity adds a "live" quality to the sound which enhances the naturalness of string and voice synthesis. Too much, however, gives "roughness", which is a characteristic of the pathological voice as well as the beginner in string playing. Interestingly, the "unclean" and somewhat wiggly waveform of a real violin string carries the essential information needed to give a stringy character of a synthesis (SOUND EXAMPLE 3). After adding a reasonable approximation of the resonances of the violin body it may be hard to distinguish such an electronic violin from a normal acoustic violin (SOUND EXAMPLE 4; Mathews, 1973).

Spectral balance

The spectral content of the voice and strings changes with the dynamic level. As mentioned, when increasing the dynamic level from pp to ff not only the amplitude grows but also the amount of high frequency partials (see Fig. 8). Almost all traditional instruments show the same spectral behaviour, although the mechanisms utilized for this purpose are different indeed. Few people would guess that the slight string players old-fashioned use of horse hair rubbed with rosin, but they all serve the same purpose; to obtain a nonlinear sound source in which the high-frequency components increase with dynamic level. It would be tempting to conclude that we appreciate instruments with this spectral behaviour because of the daily indoctrination with speech sounds since our earliest age.

Fig. 8. Evolution of the spectral content for a violin with increasing dynamic level (PP -mf-ff),G3=196 Hz.


In contrast to other instruments, both singers and string players can control the balance between the low and high-frequency partials rather independently of the dynamic level (see Fig. 9). A note at a certain dynamic level is thus not associated with a fixed harmonic content but can be modulated according to the musical context. In other instruments this variation is much more restricted. For example, in the piano the harmonic content is set exclusively by the dynamic level (as far as we know today; Askenfelt & Jansson, 1990). In that case "the treble control is directly connected to the volume-knob."

The singer controls the spectral balance by conditioning the larynx. A less tense conditioning with loosely approximated vocal folds gives a "flow" quality, characterised by a pronounced flow of air and relatively strong lower harmonics. A condition with close approximation of the folds on the other hand, gives a "pressed" quality with weaker fundamental and stronger high-frequency harmonics.

Fig. 9. Comparison of the spectral balance for "press" (circles) and "flow" (thin line) conditions (G3 = 196 Hz); baritone [al (top), violin (bottom).

The string player exerts the same control via the bow. A rapid bowing far from the bridge with only a light bow pressure gives a "flow" quality, while playing close to the bridge with (a necessarily) lower bow velocity and higher bow force gives a, "pressed" version, without changing the overall level drastically (Fig. 9 and SOUND EXAMPLE 6).

Pitch control

The singer and the string player are completely free to control the fundamental frequency continuously, the only restriction being that they should be performing "in tune." Apparently, our perception has very different limits for this criterion depending on the circumstances. The singer may sometimes seem to stray away unconcerned of the accompaniment, and yet it would be unfair to judge the performance as out of tune (SOUND EXAMPLE 7). Measurements have indicated that deviations of 50 - 70 ˘ from the expected values, i.e. with reference to the accompaniment, can be perceived as in tune, in particular when the deviation is on the sharp side and/or falls on an unaccented beat (Sundberg, 1979). Similarly, the soloist in a violin concerto may occasionally deviate from what could be called descent tuning compared to the orchestra, yet the impression is normally not "out of tune", rather an individual contrasting against a large group.

At the same time as the freedom in pitch control is large, and sometimes used, the precision can be high when needed. This occurs in ensembles which use only a slight or even no vibrato at all, such as barber-shop singing and string quartets (SOUND EXAMPLE 8). In these cases beats between mistuned notes in chords will be measured spread of notes in repeated barbershop chords is of tile order ± 3 c (Hagerman & Sundberg, 1980).

In synthesis of string quartet music it has been found that contrasting intonation of the melody line according to the melodic and harmonic tension adds to tile performance. These deviations are of tile order ± 2() ˘ (Sundberg et al, 1989). However, during long notes in all parts it is better to adjust to as heatless tuning as possible, again on the level bf a few cents.

Radiation The radiation of the sound is a point in which the voice rind strings differ drastically. The radiation properties of the voice (which normally can be well approximated by a vibrating piston mounted in a rigid sphere), follows a rather straightforward pattern (Marshall & Meyer, 1985). The voice is omnidirectional at low frequencies, gradually focussing in the direction of the singer for the higher partials. The strings on the other hand, shows a much more complex pattern with several major radiation lobes, which distribute the partials nonuniformly (Meyer, 1972).

Why are the strings also four?

Nature has divided the human voices in four classes covering different ranges in pitch-soprano, alto, tenor, and baritone/bass. The four sizes of strings in use today-violin, viola, cello, and double bass-cover approximately the same total pitch range which seems reasonable, but why do they also come in four sizes? The easy answer is that the violin cannot be made much smaller, and the double bass not much bigger without making them very inconvenient to play. In between it has been found sufficient with two intermediate sizes, although some compromises have been made. For example, the viola is almost to big to be held under the chin. The cello, which happens to fit comfortably to the size of a normal player, struggles with the balance between a full sound in the bass, requiring a relatively large body, and a sonorous tenor and treble range, which suggests a smaller instrument with light top and back plates.

Returning to the viola, the body of this super violin-playing a fifth below the normal violin- ought to be even bigger to make a fair match to the violin and cello in radiated power. But this is in a sense also true for alto voice, who often doesn't need a larger body, but instead longer vocal chords and a longer vocal tract, in order to compete with the soprano and tenor. So, in both voices and strings the alto part is relatively weak, in particular in its lower range. This coincidence is probably not due to pure chance. The number of four is also probably not arbitrary, because we have had the four-size string group around us so long (about four centuries) that new members would have entered if seriously needed.

Surprisingly, the traditional string instruments are in fact not very close brothers or sisters because of small but important differences in the design. Among other things, the proportions of the bodies are not exactly similar, and consequently their sets of resonances will be differently grouped, giving each size a characteristic tone character (Askenfelt, 1982). It is, for example, not particularly difficult to separate the violin from the viola in a string quartet, even when they play in tile sallic range. The same is true for the viola versus the cello.

Rather recently interesting attempts have been made to make a string ensemble with more than four members, and in which the resonances of the instrument I are scaled according to their playing range. This ought to give an ensemble in the individuals are more similar in character and thus blends better together (; EXAMPLE 9). This New Violin Octet is a unique example of a set of n ­instruments which has been given an acoustical design by calculation before, were built, rather than by the traditional trial-and-error procedure (Hutchins, Time will answer if the musical potential of this ensemble is high enough to n survive.


Phrasing By phrasing we usually mean a grouping of the events in a piece of music meaningful structure. In order to constitute a phrase the notes must be kept together in time, have a similar timbre and also have a smooth evolution amplitude. A break in any of these respects will give a signal to the listener that the phrase has come to an end.


Phrasing in singing is closely connected with the lung volume, more specifically with the vital capacity which is about 5 1 for a male and 4 1 for a female When phonating, part of this volume is used to drive a more or less steady fl air through the larynx under a controlled pressure, the subglottal pressure singer controls this pressure by balancing the restoring forces of the expanded thorax against an applied force from the diaphragm. As the air in the In gradually consumed, the singer has to rely more and more on the activity diaphragm and eventually the pressure drops. This may be a catastrophe if the not has reached a phrase boundary, because as mentioned, the subglottal press the main control of loudness. Even though singers are capable of maintaining phrase for more than 20 s by a restrictive use of air, most singers will 1 inconvenient to make a phrase longer than 8 - 10 s (Sundberg, 1987).

The string player does not need to care about the lung volume, but instead has to worry about the length of the bow, about 60 cm. It is relatively easy to J phrase of notes which can be played in the same bow stroke, but it takes a ma continue the phrase over the bow change. (The term "bow change" denotes the direction of the bow motion is reversed.) At moderate dynamic levels a bow may typically last for 6 s. However, if needed the player can "save bow" and 1 a stroke to about 15 s (Askenfelt, 1986).

In contrast to the singer who performs only during the exhalation of air, the player can make use of the bow in either direction, downbow and u respectively. Strangely enough, the character of a downbow is completely different from that of an upbow. This is in large part due to way the bow is held; between tile fingers in such a way that it can pivot and act as a lever. Tile weight of tile bow thus contributes much more to the bow force when close to the handle, the frog, than at the tip. For example, in order to obtain an even bow force, and hence an even toile quality, during a full stroke tile player has to balance the contribution from gravity by pressing oil top of the bow stick with tile index or small finger respectively. When close to the tip the player must help gravity by pressing with tile index finger in front of the fulcrum (defined roughly by the thumb and middle finger), and when close to the frog gravity must be counterbalanced by pressing with tile small finger behind the fulcrum. Another consequence of the bow grip is that loud passages requiring a high bow force are preferably played on the lower half of tile bow (towards the frog), and vice versa (Askenfelt, 1989). The complex task of supplying an adequate bow force at all times during the bow strokes gives strong indications on how to organize the patterns of up and down-bows in a piece of string music. In this way the music itself suggests a "natural" phrasing via tile bowing.


As mentioned, the string player as well as the singer has a unique control of tile spectral properties both with regard to the overall balance, tile extremes being tile "flow - press" conditions, as well as the temporal fine structure via the vibrato. It is probably so that this ability, when used to form phrases during which the spectrum balance and vibrato continuously are given slightly different shadings (without chopping the phrase in parts), is one very prominent similarity between tile strings and voices. Most one-voiced instruments are capable of generating a smooth amplitude contour with notes of steady quality, which in itself is enough to signal the phrase structure, but the micro variations within the phrase is a completely other matter. This is probably one of the reasons why most concert halls announce at least half a dozen violin concertos before a solo trumpet or clarinet is allowed.

Conclusions Despite large differences with regard to how the sound is produced and controlled, the strings and voice seem to share some key characteristics. Of these tile following two could be assumed to be particularily important.

1. The sound source-the larynx or the bow-string-is clearly separated from the filtering part-the vocal tract or the sound box. This allows the partials to enter and leave the resonance peaks in the filter according to the selected fundamental frequency. In particular, the vibrato causes a pronounced modulation of the partial amplitudes due to the multiple resonance peaks in the filter curves.

2. The spectral balance can be controlled rather independently of the dynamic level. This opens a communication dimension with the "flow - press" qualities as tile ,xtremes.

Other properties discussed which show more or less pronounced similarities between strings and voice include: rate and extent of vibrato, degree of aperiodicity, accuracy in pitch control, and the possibility of forming continuous streams of notes (phrases). However, there are reasons to believe that discrepancies in these respects are of secondary importance compared to the two similarities first mentioned.

Coda Admittedly, the presentation has been somewhat vague in pointing out distinctive acoustic characteristics which could justify a formal cousin-ship between tile strings and the singing voice. However, despite of considerable gaps in the chain of acoustic evidences for such a relationship it is the author's personal opinion that they indeed are close relatives. The last sound example may perhaps convince sorile readers (SOUND EXAMPLE 10). An excerpt from a song without words is performed, first by a soprano, somewhat affettato, followed by a string baritone, a little rough and serious-minded. If you still not agree on that the singing voice and strings are close cousins after listening to these examples, acoustic data will never convince you!

References Askenfelt, A. (1982). Eigenmodes and tone quality of the double bass. Speech Transmission Laboratory Quarterly Progress and Status Report, Royal Institute of Technology, Stockholm, STL-QPSR 4/1982, 149-174.

Askenfelt, A. (1986). Measurement of bow motion and bow force in violin playing. J. Acoust. Soc. Am., 80,1007-1015.

Askenfelt, A. (1989). Measurement of the bowing parameters in violin playing II: Bow-bridge distance, dynamic range, and limits of bow force. J. Acoust. Soc. Am., 86, 503-516.

Askenfelt, A. & Jansson, E. (1990). In Askenfelt A. (ed.), Five Lectures oil the Acoustics of the Piano, Royal Swedish Academy of Music, Stockholm, 39-57. Askenfelt, A. & Hammarberg, B. (1986). Speech waveform perturbation analysis. .1. of Speech and Hearing Research, 29, 50-64.

Benade, A. H. (1976). Fundamentals of Musical Acoustics, Oxford University Press, New York.

Dudley, D. & Strong, W. (1990). A computer study of the effects of harmonicity in a brass wind instrument: Impedance curve, impulse response, and mouthpiece pressure with a hypothetical periodic input, Applied Acoustics. 30, 117-132.

Fant, G. (1960). Acoustic theory of speech production, Mouton, The Hauge. Hagerman, B. & Sundberg, J. (1980). Fundamental frequency adjustment in barbershop singing. J. of Research in Singing, 4(1), 3-17.

Helmholtz von, H. (1877). On the Sensations of Tone (English ed.), Dover, New York, 1885, reprint 1954.

Hutchins, C.M. (1967). Founding a Family of Fiddles. Physics Today, 20, 23-37. Marshall, A.H. & Meyer, J. (1985). The directivity and auditory impressions of singers. Acustica, 58, 130-140.

Mathews, M.V. & Kohut, 1. (1973). Electronic simulation of violin resonances, .1. Acoust. Soc. Am., 53, 1620-1626.