A Proposal for a More Perceptually Uniform Control Stereophonic Music
Recording Studios
Philip R Newell Moana,
Keith R Holland
ISVR,
Improvements in
domestic sound reproduction equipment have begun to highlight the inconsistencies
of studio monitoring conditions. This paper discusses the evidence that many of
the roots of monitoring inconsistency lie not only in some erroneous and
outdated beliefs relating to reverberation time, but also in attempts to
extract more from stereophony than it can simultaneously supply. Some
performance details are presented for some rooms built using the concepts
proposed herein.
Introduction
Judging by the wildly different tonal balances on CD recordings for sale
to the public, the state of room to room compatibility of the listening
conditions in recording studio control rooms still appears to fall far short of
what a professional industry should by now be achieving. Digital recording has
brought its share of problems, but one problem which we must credit it with
removing is the variability inherent within the recording media. We may still
have differences in the sonic performance of A to D and D to A converters, but
the consistency of the digits which leave the recording studios and arrive in
peoples' homes is now guaranteed. The vagaries of analogue transfer to, and
recovery from, vinyl disc or magnetic tape have been consigned to the past, so
the excuses for much of the previous variability can no longer be used in
defence of the current situation. Some variability or inconsistency in the
spectral balance of recordings is no doubt within a range of artistic
interpretations by recording staff; it can be dependent upon what they feel is
appropriate for any given track, but careful listening to much of the available
recorded material will soon reveal that most of the variability is not
intentional. The source of the problem certainly lies, to a large degree, in
the variability of monitoring conditions in the control rooms of the recording
studios in which they are mixed.
There is a general over-reliance on inexpensive, close-field monitoring
loudspeakers, and one of the reasons for the existence of this state of affairs
is the lack of faith in the widely disparate range of combinations of large
studio monitors and different control room philosophies. As the differing
points of view as to what is "right" have continued to exist amongst
many experienced designers of both the loudspeakers and the rooms, it is little
wonder that the recording staff have continued to show uncertainty about what they feel comfortable with. They have,
in many cases, opted for highly personalised solutions which work for them,
individually, on the most usual types of music that they record.
Studio designers are all aware of the problems
inherent in the use of small, close-field monitors, such as the inability to
produce the lower frequencies at appropriate levels, or the inconsistency in
room positioning, and hence the uncertainty about how the room modes will, or
will not, be driven. There are also the attendant problems of reflexions from the mixing console and other related
equipment. What is more, it is true that except for x very few types of
commonly used small monitors, their ability to resolve fine detail is sadly lacking.
As a result of this, noises, bad edits, the operation of gates, the clashing of
phase distorted effects artefacts (which often lead to undue harshness in the
sound) and a host of other problems, all too frequently add themselves to the
irregularities of spectral balance which occur due to entire octaves of the
musical spectrum being left unmonitored.
Sources of Uncertainty
We cannot, and nor should we, be too dictatorial about all that occurs
in an artistic industry, for there needs to be a range of concepts of what is
right in order to accommodate the individualities of different producers,
musicians, and listeners alike. All should be free to make their decisions for
themselves, but those decisions can only be valid as long as they are aware of the
decisions that they are making, and that they are not being misled into them by
ill-conceived control room monitoring conditions. In many cases, the designs
have been based on a long held belief that a control room should possess a
reverberation time (RT) which approximates to an
average of domestic listening conditions. RT is, of
course, not an accurate concept for use in most small rooms, and certainly is
not applicable to highly absorbent rooms, but nonetheless, RT60, T60 or
whatever other decay description has been used has frequently had some domestic
point of reference.
In turn, many other design principles have been based on aspects of
stereo perception in the rooms in which the commercially available recordings
will eventually be played. This shows admirable concern for the record buying
public, who keep the music industry alive, but it is also apparent that much of
the good intent has been mis-guided. In trying to
take into account the extraordinary number of factors that are involved in the
domestic listening process, and, especially in the light of the fact that some
requirements for the optimal reproduction of different types of music and
recording techniques are mutually exclusive, the efforts have frequently not
achieved their goals. By trying to take into account so many of the variables
in the reproduce end of the chain, the production end has itself suffered a
lack of certainty that has, unfortunately, served to introduce even more
uncertainty into the reproduce end of the chain.
In a paper to the 8th International Conference of the AES, in 1990 [1], Floyd Toole
presented a paper, "Loudspeakers and Rooms for Stereophonic
Reproduction". The abstract began thus, "Stereophonic reproduction
attempts to reconstruct, in the minds of listeners, replicas of the timbral and spacial effects of
acoustical events that have occurred at earlier times and other places. It
matters not whether the 'live" event consisted of musicians in a natural
acoustical environment, or a multi-track creation monitored in a control room.
In all cases, musicians and production personnel presumably heard a
stereophonic reproduction that met their artistic and technical expectations.
Assuming that the necessary information has been preserved in the recording, a
replication can be successful only to the extent that the loudspeakers are
capable of reproducing the appropriate sounds, and that the listening rooms are
capable of conveying those sounds to the ears of listeners. Variations in
loudspeakers and rooms create many difficulties in achieving this goal.
Although it has been traditional to consider the loudspeakers and room as
separate entities, this approach is no longer justified. The loudspeakers, room
and listener comprise a system within which the sounds and spacial
illusions of stereo are decoded, and they must be considered together."
The above first part of Toole's abstract is a
lucid, concise, and powerful summing-up of a complex situation in real life.
The last sentence of the above quotation is very significant: "The
loudspeakers, room and listener [to the end product, ex. commercial CD]
comprise a system within which the sounds and spacial
illusions of stereo are decoded.........". Decoded! If they are being decoded, then in the production process
they must have been, in some way, considered to have been encoded. No encode /
decode process can be expected to work optimally unless the decoder can track
the encode process, and in order to do this, the encode process must be known,
but in reality, the monitoring (encode) conditions during mixing are rarely
known to the CD listener.
Recordings are not sold in the shops with instructions about on which
type of loudspeaker, and in which size, shape, or other property of room they
should be auditioned. There is also absolutely nothing that can be expected
from the foreseeable future that will be likely to reduce the range of domestic
listening equipment and conditions. Furthermore, music is not the most
important thing in the lives of the majority of people, so it will continue to
be normal for people to buy houses not for the acoustics of their rooms, but
for a multitude of other priorities, and then find appropriate rooms in which
to listen to the music of their choice. Different loudspeakers will suit
different rooms, different types of music, different recording techniques and
media, different budgets, and different personal tastes on precisely what
people like to hear in order to achieve the most enjoyment that they can from
the music of their choice during its reproduction. There is, therefore,
adequate justification in having available a good choice of reproduction
equipment.
Taking many of these variables into account, there will be the
audiophiles who choose their systems and listening conditions with great care,
and who may well enjoy optimising them for their favourite types of music,
recording techniques and storage media. The most appropriate loudspeakers and
listening conditions for rock music recorded on an analogue system, will, in
all probability, be different from those most suited to the enjoyment of middle
/ side, stereo microphone technique recordings of digitally recorded classical
music. However, even within the latter, quite highly defined set of recording
conditions, there will be a wide range of available and appropriate
reproduction systems and environments, and all will sound different.
There is an argument for the case that recordings of each of the above
two, very different, music recording styles should be made in control rooms
optimised for their own specific characteristics. However, the variation within
the sub-groups is so great that to standardise on one arbitrarily chosen type
of loudspeaker, at a given distance and at a given level, will only transfer
many of the vagaries of the reproduction environment into the production end of
the chain. This is because the decision made as to which equipment and
condition to use for the production process may be largely based on the results
on similar equipment in the reproduction end of the chain, and this could lead
to very volatile standards as fashions change.
What this document proposes is an abandonment of the attempts in the
control rooms to try to accommodate too many of the variables in the
reproduction systems, and to concentrate on a stripped down version of the more
fundamental aspects of the production and quality control needs. This will
allow recordings to be monitored in more detail, with more consistency; and,
with the knowledge and skill of the recording staff, to make it easier to
predict what types of reproduction conditions would be best suited to optimally
decode what the production staff were hoping that the final listeners would
hear.
Removing a Variable
In 1994, a paper was presented to the UK, Institute of Acoustics [2]
entitled "Control Room Reverberation is Unwanted Noise". The paper
put forward the concept of the "NonEnvironment
/ Monitor Dead" rooms, which sought to provide monitoring conditions as
close as could be achieved to free-field conditions. These rooms can reduce the
decay time of reflexions and modal energy to such low
levels that the perception of many recording defects becomes much easier. The
paper also contained a discussion of the majority of the other widely used
control room acoustic control philosophies. It noted the fact that, due to the
sensitivity of the human hearing systems, most attempts at producing optimised
decay conditions for music monitoring had yielded control rooms which sounded
subjectively very different, and tended to lead to different musical
conclusions when mixing the same piece of music, in different rooms.

Figure I Plan of
Non-Environment Control Room Shaded Areas are Wide Band Absorber Systems
Figure 2 Side Elevation of Non-Environment Control Room, Showing (a)
Horizontal Rear Absorbers and (b) Vertical Rear Absorbers
Figures 1 and 2 show the general concept of the
"Non-Environment" or "Monitor Dead" approach [3,4]. It can
be seen that the side walls, the rear wall, and the ceiling are made as
acoustically dead as possible to as low a frequency as possible. The front wall
is hard, dense and reflective, and the floor is also hard. These two surfaces,
together with the hard surfaces of any equipment which may be facing the
listener, provide all the acoustic life, for sounds produced within the room,
to alleviate any sense of being in an anechoic chamber. The loudspeakers are
mounted flush in the solid front wall, so are not actually in the room, but
form a part of one of its perimeter surfaces. The front wall provides a large
baffle against which the loudspeakers can push, thus aiding the efficiency and
linearity of low frequency radiation. The flush mounting also removes any
irregularities caused by cabinet edge diffractions, or by path length anomalies
of waves which may seek to travel behind a cabinet mounted within a room, and
which return to
the room from a front wall with an irregular phase relationship.
Except for the floor, and any equipment placed within the room, the
monitors face something approximating to an anechoic chamber. The acoustic
conditions provided by the room are thus dependent upon whether a sound was
produced within the room, or from one of its boundaries. In the two cases, the
overall decay characteristics of the room would be very different. From the
monitoring direction, the reflexion problems from
recording equipment can be dealt with by angling the equipment such that reflexions pass away from the listener and into an
absorbent surface. If this cannot be done directly, then the offending surface
can be protected, either by an
1 Introduction
Judging by the wildly different tonal balances on CD recordings for sale
to the public, the state of room to room compatibility of the listening
conditions in recording studio control rooms still appears to fall far short of
what a professional industry should by now be achieving. Digital recording has
brought its share of problems, but one problem which we must credit it with
removing is the variability inherent within the recording media. We may still
have differences in the sonic performance of A to D and D to A converters, but
the consistency of the digits which leave the recording studios and arrive in
peoples' homes is now guaranteed. The vagaries of analogue transfer to, and
recovery from, vinyl disc or magnetic tape have been consigned to the past, so
the excuses for much of the previous variability can no longer be used in
defence of the current situation. Some variability or inconsistency in the
spectral balance of recordings is no doubt within a range of artistic
interpretations by recording staff; it can be dependent upon what they feel is
appropriate for any given track, but careful listening to much of the available
recorded material will soon reveal that most of the variability is not
intentional. The source of the problem certainly lies, to a large degree, in the
variability of monitoring conditions in the control rooms of the recording
studios in which they are mixed.
There is a general over-reliance on inexpensive, close-field monitoring
loudspeakers, and one of the reasons for the existence of this state of affairs
is the lack of faith in the widely disparate range of combinations of large
studio monitors and different control room philosophies. As the differing
points of view as to what is "right" have continued to exist amongst
many experienced designers of both the loudspeakers and the rooms, it is little
wonder that the recording staff have continued to show uncertainty about what they feel comfortable with. They have,
in many cases, opted for highly personalised solutions which work for them,
individually, on the most usual types of music that they record.
Studio designers are all aware of the
problems inherent in the use of small, close-field monitors such as the
inability to produce the lower frequencies at appropriate levels, or the
inconsistency in room positioning, and hence the uncertainty about how the room
modes will, or will not, be driven. There are also the attendant problems of reflexions from the mixing console and other related
equipment. What is more, it is true that except for a very few types of
commonly used small monitors, their ability to resolve fine detail is sadly
lacking. As a result of this, noises, bad edits, the operation of gates, the
clashing of phase distorted effects artefacts (which often lead t( undue
harshness in the sound) and a host of other problems, all too frequently add
themselves tc the irregularities of spectral balance
which occur due to entire octaves of the musical spectrum being left
unmonitored.
Sources of Uncertainty
We cannot, and nor should we, be too dictatorial about all that occurs
in an artistic industry, for there needs to be a range of concepts of what is
right in order to accommodate the individualities: of different producers,
musicians, and listeners alike. All should be free to make their decisions for
themselves, but those decisions can only be valid as long as they are aware of
the decisions that they are making, and that they are not being misled into
them by ill-conceived control room monitoring conditions. In many cases, the
designs have been based on a long held belief that a control room should
possess a reverberation time (RT) which approximates
to an average of domestic listening conditions. RT
is, of course, not an accurate concept for use in most small rooms, and
certainly is not applicable to highly absorbent rooms, but nonetheless, RT60,
T60 or whatever other decay description has been used has frequently had some
domestic point of reference.
In turn, many other design principles have been based on aspects of
stereo perception in the rooms in which the commercially available recordings
will eventually be played. This shows admirable concern for the record buying
public, who keep the music industry alive, but it is also apparent that much of
the good intent has been mis-guided. In trying to take
into account the extraordinary number of factors that are involved in the
domestic listening process, and, especially in the light of the fact that some
requirements for the optimal reproduction of different types of music and
recording techniques are mutually exclusive, the efforts have frequently not
achieved their goals. By trying to take into account so many of the variables
in the reproduce end of the chain, the production end has itself suffered a
lack of certainty that has, unfortunately, served to introduce even more
uncertainty into the reproduce end of the chain.
In a paper to the 8th International Conference of the AES, in 1990 [1], Floyd
Toole presented a paper, "Loudspeakers and Rooms
for Stereophonic Reproduction". The abstract began thus, "Stereophonic
reproduction attempts to reconstruct, in the minds of listeners, replicas of
the timbral and spacial
effects of acoustical events that have occurred at earlier times and other
places. It matters not whether the 'live" event consisted of musicians in
a natural acoustical environment, or a multi-track creation monitored in a
control room. In all cases, musicians and production personnel presumably heard
a stereophonic reproduction that met their artistic and technical expectations.
Assuming that the necessary information has been preserved in the absorbent
shield, or by a streamlining devices that will deflect the incident waves
around or away from the object, and prevent them from, in particular, returning
to the front, reflective wall and thence back to the listener. The aim is to
monitor the output from the loudspeakers, and nothing more.
By means of these techniques, rooms can be built which, by virtue of
their relative absence of monitoring acoustics, can achieve very high degrees of
room to room compatibility. The studio designer Tom Hidley,
has been pursuing techniques of controlling the room modes down to frequencies
as low as IO Hz for his Hidley Infrasound rooms [5],
but some of the processes involved in achieving this very low frequency
absorption lend themselves to the control of the more "audible" low
frequencies in much smaller rooms. This would seem to be important, because so
many of the control rooms currently in use, around The World, are in the
25-35m2 region, and it has typically been this range of room size which has
suffered so badly from inter room incompatibility. Whilst the small
Non-Environment rooms, of different shapes and sizes, have different ambient
characteristics for general speech and noises produced within the rooms, (due
to the different natures of the reflective materials and the different reflexion times in different sizes of rooms) they all have
remarkably common monitoring characteristics, which are essentially those of
the loudspeakers, modified by whatever small ambient aberrations remain.

Figure 3 shows the frequency response function of one monitor
loudspeaker at a distance of 2m in a relatively small Non-Environment room, and
figure 4 shows a similar measurement taken of a similar loudspeaker in a large
Non-Environment room. The two plots are remarkably similar considering the
different room sizes (the dip at about lkHz in figure
4 is due to a crossover discrepancy that was subsequently resolved). Figures 5
and 6 show the step responses of the monitors in the small and large rooms
respectively.
In brief, the rooms are made highly absorbent at the mid and high
frequencies by the use of conventional fibrous absorbent materials. Low-mid
absorption is provided by acoustic labyrinths through which the waves are
guided, diffused and diffracted, before being finally forced to drag their way
through large, absorbent-lined ducts. The lowest frequencies are addressed by
means of large panel absorbers, air damped, constrained layer membrane
absorbers, and dead sheet membrane absorbers, which effectively line the room
with a heavy, acoustically-dead, semi-limp bag. The overall control is provided
by the whole system of absorption, but for the purposes of this discussion, the
concept can be likened to an anechoic chamber, with one wall replaced by a hard
wall in which the loudspeakers are mounted, and a hard floor. The floor, may or
may not, have openings at the front and rear of the room for the utilisation of
under floor absorption. This concept will suffice for the remainder of this
discussion.
Irrespective of size or shape, such a termination will be highly
uniform. The low frequencies may vary slightly with room size and dimensions,
but with suitably effective absorption, they are likely to do so to a
considerably smaller degree than with most other, current control room designs.
What is more, the response perturberances caused by
the rooms are likely to be at lower relative levels to the direct sound than is
the case with most other rooms. This was one of the main benefits being
proposed in the IOA paper [2]. By the reduction of
room artefacts, the
lower level details in the recorded music can be more
readily perceived, and any unwanted aspects of the recordings, such as the
audible operation of gates, can be dealt with before they become embarrassingly
evident to the record-buying public.
Limitations, Real and Imaginary
Over the years, a number of criticisms have circulated about the room
concept being discussed here. Some of these comments have had substantial grounds
to support them, but others have been based on misconceived theories. Examples
of the latter type are comments such that the lack of modal support will
produce rooms which are subjectivity lacking in bass, and that an over-dead
monitoring acoustic will lead to the excessive use of reverberation when
mixing. The lack of modal support would only produce bass-light mixes if the
decay time at the middle and high frequencies remained typical of more
conventional control rooms. This was the case with some of the control rooms of
the 1970s and early 80s, where the excessive use of bass traps wa. incorporated into rooms which still possessed
significant decay times at higher frequencies. The Non-Environment rooms,
however, are all-trapped, not just bass-trapped, and, low as it is, the LF
decay is still predominant in the time / frequency response. A person who is
used to working in more lively rooms may initially be unaccustomed to the low
decay time, but it is usually rapidly adjusted to, and the clarity and impact
which the low frequencies posses is a revelation. If it is all considered to be
too dry, then that is what is on the tape, in which case either the mixes can
be given reverberation, according to taste, or the recording acoustics or
microphone location can perhaps be changed.
This brings us nicely to the second of the misguided criticisms, that a
low decay time room may lead to the excessive use of reverberation or effects
in the mixing process. The fact is that reverberation added, even in a totally dead
room, is unlikely to become excessive when played in a more reverberant space,
as the differences in the decay times of any reasonable control rooms are
subtle, by comparison to the lengths and quantities of reverberation effects
that are usually applied to recordings or mixes. One thing which is often
noticed, though, is just how clear the reverberation tails can be heard in very
low decay time rooms, and it is just as well to monitor these carefully, as
synthetic reverberation can produce some undesirable decay tail artefacts,
which all too frequently go unnoticed: until it is too late! In low decay time
rooms, the sound of the rooms in which the microphones were placed, or the
different effects processor which had been used on a recording, become clearly
recognisable, to a degree which is normally only detected on headphones. What
is more, every different conventional control room will produce different
perceived ratios of transient and quasi-steady-state sounds, certainly beyond
any critical distance. The transients fall off at 6dB per doubling of distance
from the near field of the source, but the quasi-steady-state signals may be
supported by the modal decay characteristics of the rooms. This is a fact of
life in all but the most acoustically dead monitoring conditions, so it
suggests, once again, that this is another area where a dead monitoring
acoustic will be the only one where any general consistency of quality control
could be achieved.
Spacial Anomalies
Three of the more substantial criticisms of the low decay time
monitoring conditions are that they lack a sense of spaciousness; they are not
representative of normal listening conditions; and that, in the smaller rooms
of this type, they fail to support an adequately wide area of stereo imaging.
The last of the three points will be dealt with, in some detail, a little
later, so let us first consider the question of spaciousness. An accurate
rendering of spaciousness can only be achieved by multiple, lateral reflexions, arriving from the directions, and with time
delays, that are appropriate to the performance space, whether that space was
real, or imaginary. A less accurate sense of spaciousness, which is perhaps a
more realistic goal, can still only be achieved by reflexions
coming from a direction other than that of the stereo loudspeakers. It is not
inherent in a conventional stereo recording, and will be dependent in its
nature upon the reproduction acoustics. It can therefore never be truly
representatively monitored at the time of mixing. Surround sound helps us to
tackle this problem somewhat more reasonably.
Spaciousness, and the perception of detail, tend to be mutually
exclusive, whether it is in the performing space, the microphone technique or
the reproduction chain. Orchestral conductors hear more detail from their
rostra than the audiences hear from the seats in the auditoria. The conductors
need to hear the detail to be able to do their job, but most audiences like to
hear the all-enveloping sound from the auditoria, because it pleases them.
Distant, stereo microphone arrangements, such as spaced omnis,
produce a greater richness of sound than close, multimicrophone
techniques, but the latter can produce more fine detail, and perhaps, more
dynamic impact. The choice of which technique to use will be a creative
decision by the people responsible for the recording. In a room for critical
monitoring, however, where the same compromises exist, it would seem that
experienced personnel could far more realistically achieve their aims in rooms
in which they could hear the fine detail, and then interpret how things would sound in a more reflective space,
rather than in rooms in which they could hear a spacious sound, but could only
guess at what problems may lurk in the low level detail, masked by the
spaciousness artefacts of the room. In any case, it would not be too difficult
a task to introduce suitable reflectors into a relatively acoustically dead
room for a final and more spacious auditioning of the end result; once, that
is, any problems in the finer details had been monitored, and resolved.
The criticism about not being representative of domestic listening
conditions would appear to be irrelevant. To date, all too many rooms which do
attempt such domestic commonality often fail to produce the intended
compatibility in the end result. Averages in themselves need not be
representative. The average of the integers from 1 to 10 does not represent,
even within 20%, more than 2 of the 10 integers. The majority of the integers
would not be closely represented by the average. The World-wide range of
domestic listening conditions is far too wide for any "average" control room to represent. Motor cars and
headphones, which now form a large part of the international listening environment,
are also not represented by any average room.
In fact, none of the normal arguments for control room specifications
have much relevance for cars, headphones, or a wide range of domestic
loudspeaker listening. What this seems to suggest is that we ought to know more
about what is actually on the storage medium. This needs to be known, and
recorded in a more predictable manner, in order for the disparate reproduction
systems to be able to make more reliable attempts to decode the intentions of
the recording personnel more consistently and appropriately, according to the
reproduction requirements and conditions. The effect of possible reproduction
environments must be deduced from the audiological
and psycho-acoustic cues in the recording, and how they will relate to the
various listening conditions. In other words, the recordings should allow the
maximum to be reliably extracted from them, without bias to any particular set
of reproduction conditions, unless, that is, the recordings are being made for
some highly specific purpose, such as television commercials.
Solutions
By taking the control room acoustics out of the recording chain, the
emphasis of the burden of monitoring linearity shifts on to the loudspeakers.
As loudspeaker performance has been converging faster than room performance,
this simplifies the task of producing more compatible control room monitoring.
Furthermore, much of the effort in loudspeaker design research has been
involved with the amelioration of the problems caused by a typical loudspeaker
/ room interface. Loudspeakers designed for monitoring in Non-Environment rooms
can concentrate on the optimisation of axial impulse response performance, with
less emphasis needing to be placed on the directivity problems well off-axis.
It is often the constraints of producing smooth, wide-angle directivity
/ frequency performance which restricts the choice of drivers in a monitor
system. The fewer restraints that there are on driver choice, the easier it is
to choose drivers for their sonic neutrality, low non-linear distortions,
achievable SPL and many other parameters that the
usual need for off-axis directivity control frequently does much to compromise.
Simpler monitor systems, of excellent ability to reveal fine detail and work at
high SPLs, could become more reasonably priced, and
therefore spread the availability of more neutral monitoring conditions to a
greater proportion of the industry. The current state of control room
monitoring is frequently so "hit and miss" as to barely warrant the
use of the terms "control" and "monitoring". Furthermore,
an affordable means of achieving a more linear and consistent performance from
the middle order of recording studios would be likely to have more effect on
the recording industry's overall output than would be
achieved by seeking to refine, ever further, the upper echelon of elite
studios; though that work should continue for its own valid reasons. One of the
great benefits of Monitor-Dead / Non-Environment rooms is that the techniques
are not unduly expensive, and apply, with only minimal changes, to control
rooms of all sizes.
In the rooms being described here, phase responses become very
important, as the absence of reflexions in the
overall sound allows the detection of phase characteristics which even a single
lateral reflexion can render inaudible. Many of these
phase products, which are at the root of the harshness of many modern
recordings, often go unnoticed, and hence also go uncorrected when using low
resolution monitors in conventional rooms. With the absence of room
characteristics in the monitor chain, the use of high resolution, linear
monitor systems makes it much easier than is currently usual to achieve not
only the desired timbral balance of individual
instruments, but also the desired balance between
the instruments. It also makes evident any non-linear distortions and the
effect of any poor acoustics in the original recording spaces. The degree of
openness and spaciousness contained within the recording, such as
characteristics of transparency and depth, can also be more easily assessed.
Stereo Imaging Constraints
Let us now turn to the other major point which has been raised in
relation to these control rooms; their stereo imaging. Figure 7 shows a typical
stereo perception area from a pair of loudspeakers situated in two different
sized rooms. The area is a function of geometry, so its actual size is
determined by the distances between and from the loudspeakers, certainly up to
a point where the inter-channel time delays become so great as to make stereo
perception impossible. In large rooms with, say, 4 metres between the
loudspeakers (C and D), and 5 metres to the mixing console, the area available
for stereo perception is sufficiently large to cover the persons likely to be
working behind the central 3 metres or so of mixing console (positions A, G and
B). As the above dimensions decrease, so does the area of good stereo
localisation. In very small rooms (loudspeakers at E and F), and at close listening
distances, the area of true stereo perception is perhaps only large enough for
one or two people to appreciate comfortably (positions A and posibly G). However, this should perhaps not be seen as a
limitation of the room, but a clearer than normal demonstration of how
two-speaker stereo should behave.
The Concept of Stereo as
Currently Used
If we look back at the early history of stereo, there were two
significant attempts at the reproduction of a "solid" sound;
"stereos" being the Greek word for solid: a wall of sound, in other
words. The first experiments relevant to the development of current
stereophonic sound recording and reproduction took place in the 1930s, by Snow,
Fletcher and Steinberg at Bell Laboratories in the USA, and by Alan Blumlein at what was to become EMI in the UK. The Bell
scientists worked towards the reproduction of the originally recorded wavefront, on a macro scale, in the listening area, by
using multiple spaced microphones and multiple loudspeakers. Blumlein, realising that a two channel system was all that
would be commercially practicable in the then foreseeable future, considered
the Bell proposals to be too much to ask of a domestically realisable system.
He therefore opted for the implementation of a system relying on a set of
psycho-acoustic criteria that could reproduce, in the area of a stereo seat, a realistic frontal sound stage using
only a two channel record / reproduce process.
The work at Bell Laboratories envisaged the likelihood of the use of at
least three loudspeakers for reproduction, which they quite rightly considered
superior "by eliminating the recession of the centre-stage position, and
in reducing the differences in localisation for various observing
positions". In the 1970s, '80s and '90s Michael Gerzon
[6] put forward much work on new proposals for the three-speaker reproduction
of stereo, some of which were totally compatible with two channel recording
systems. Although these more advanced proposals of Gerzon's
would cover a considerable listening area, the early proposals of Bell
Laboratories was still quite narrow, subtending an angle of only 35° at the
listening position, so this was aimed at reproduction in larger spaces, such as
in cinemas, where the listeners could be at some distance from the loudspeakers.
AT EMI, Blumlein's aim was
only to produce acoustical signals in a limited space around the head of one listener in a "stereo
seat". This was intended to form an accurate virtual image of the source,
by means of reproduction via two loudspeakers subtending an angle of 60° at the
listening position. Blumlein’s system constitutes the
basis of what is now the well established procedure known as Intensity Stereo,
which held that simple level differences at the loudspeakers would create both
the necessary level and phase differences at the ears of the listener to
produce a stereo image. This only occurs if each ear hears both loudspeakers,
which is one reason why the stereo perception via headphones of a
loudspeaker-derived mix can be so different, as no such inter-aural cross-talk
exists with headphones. Shufflers can go some way u resolving this headphone
problem, but they can also introduce problems of their own, such as position
dependent frequency responses.
It was, indeed, possible for Blumlein's system
to produce stable images between the loudspeakers by choosing suitable level
differences between the left and right loudspeakers. [We use left and right,
here, because the effect is an aspect of human perception: the image supporting
ability is not an inherent property of a pair of loudspeakers. The failure to
fully appreciate this was one of the reasons for the failure of the many
quadraphonic systems of the 1970s, where the assumption was often made,
wrongly, that panning between a front / back, single sided pair of loudspeakers
would produce an analogous effect, which it does not.] The Intensity Stereo
system is the one which the pan-pots of most mixing consoles employ, and which
must surely be used in over 99% of all current recording processes. It is the
implementation of Bauer's Stereophonic Law of Sines.
There is nothing limiting in the way that Non-Environment rooms present the
stereo images, as the images perform exactly as one would expect them to
perform, according to the way that the Intensity Stereo system was envisaged
and implemented. (Incidentally, the Intensity Stereo referred to here has
nothing to do with the psycho-acoustic theories claiming intensity differences
to be the key factor in localisation: here it merely relates to the level
differences at the loudspeakers.)
Much work has been done in control room design to try to expand the area
in which stable stereo imaging can be achieved, and the provision of certain
lateral reflexions can serve to reinforce stereo
localisation. Davis referred to "Haas Kickers" [7] which are strong reflexions appearing after a suitably reflexion
free period, and which help to maintain imaging. However, in many such ways,
the means of supporting a wider stereo listening area are not the development
of the concept of Intensity Stereo, but are psycho-acoustic "tricks"
to help to extract more than the system inherently is capable of supporting. If
a property is not inherent in the recording, then perhaps the enhancement
techniques are best left for the listening rooms, and not the control rooms.
The problem in this is that the techniques tend to come at the price of
compromises that must be made in other areas of monitoring. This latter point
can be disturbing, as in the term "control room monitoring" the words
"monitoring" and "control" both imply some sort of
reference to a standard, which can hardly be the case if varying techniques are
used to support the insupportable. What is more, if the control and monitoring
are not defined at the recording stage, then what standards do the domestic
equipment manufacturers have to design their own products to comply with? In
the "Studio Monitor System" and "Control Room" surely we
must aim at some sort of tighter reference if the present unacceptably large
range of end-product frequency balances are to be brought to a more repeatable
equilibrium.
Conflicts and Definitions
There are a number of factors in studio monitoring which directly
contradict domestic hi-fi requirements. Studio Monitors are usually desired to
show up flaws and problems in the sound. They have an analytical requirement
that is not normally necessary when listening to music solely for pleasure.
Control rooms are for quality control, as well as for assessment of
compatibility with the outside world. They are also, of course, creative
environments, and that is a further aspect which makes its own demands.
However, in almost all cases, the quality control function is degraded when
attempts are made to imitate arbitrary domestic conditions, or to artificially
support the stereo image stability over a wider area than was ever envisaged
when the concept was formulated. It would thus seem that the only way to
control the "encode" side of the recording process is in rooms which
simplify, to the greatest extent, the monitoring of the signal which is being
captured by the recording medium.
Once there is a more reliable definition of the encode side of the
system, then it gives the manufacturer of, and the listeners to, domestic hi-fi
equipment a better reference from which to make their own choices and
decisions, to get their desired "best" out of the recording. The
wider the tolerances are at the encode side of the system, however, then the
less consistent will be the ability of the reproduction systems to faithfully
decode what the artistes and producers intended the listeners to hear.
Arbitrarily designed control rooms do not aid the search for better standards
of reproduction, because they are dependent upon far too many variables.
Toole highlighted the above point very forcibly in section 2.4 of reference [1], from which we will quote:
"Reflections and Absorption of Sound - Effects in
Time and Space. This is not a simple subject, because:
1) The sounds radiated from loudspeakers in different directions are not the same,
2) the frequency-dependent absorption properties of reflecting surfaces are not thesame,
3) listeners respond differently to
sounds of different frequency,
4) listeners
respond differently to sounds of different temporal structure, eg impulsive or sustained,
5) listeners
respond differently to sound arriving at different times relative to the direct
sound,
6) listeners respond differently to sounds arriving from different directions,
7) listeners respond differently to sounds in the presence of reverberation,
8) listeners have many different perceptual responses, and
9) all of the
preceding interact with each other and, to some extent, with the recording that
is being auditioned."
That these interrelationships exist in domestic situations
is incontrovertible, but
surely, all efforts should be made to remove as many of them as possible
from the
control rooms. The Non Environment approach goes a long way towards
achieving the lowest realistic number of room related variables.
Most domestic listeners want to hear music in a way that is pleasing,
which is an absolutely valid requirement as they are seeking enjoyment, and
they are at liberty to manipulate the above variables to suit their own
requirements. However, what is pleasing should not be confused with what is on
the recording medium. Stereo spaciousness can be very pleasing, but its
presence in a domestic environment, or if created in a control room of any
given design, is by no means necessarily an inherent property of what is on the
recording. The use of early reflexions and
reverberation can increase the stereo listening area, enhance the stereo
listening pleasure, and extend it beyond the normal "stereo seat"
position [8], but such techniques often compromise the detection of fine detail
in low level signals, which, in a monitoring situation, risks allowing problems
to pass by unnoticed.
In most truly professional studios, control rooms have already tended
towards being less reflective than domestic listening rooms, undoubtedly
because of a number of the above mentioned reasons. Many professional recording
personnel also tend to prefer a more direct sound, even when listening for
pleasure, as reported by Flindell et al [9].
In the paper "Subjective
Evaluation of Preferred Loudspeaker Directivity" they noticed that when
their listening test results were separated into groups of naive and
professional listeners, the preferences of the two groups were very different.
A few of the professional listeners even preferred frequency contoured
reflected energy, which mimicked the conditions frequently encountered with
more directional loudspeakers in many control rooms. Many of the naive
listeners strongly favoured the spaciousness, and extra high frequencies in the
reflected sound, which were more typical of omni-directional (or
multi-directional) loudspeakers in conventional rooms. No doubt there is a
considerable degree of conditioning influencing the results for the
professional listeners: spending much time working in the conditions in which
they do, perhaps makes them more accustomed to hearing direct sounds. On the
other hand, it is equally possible that as they are accustomed to listening for
detail, such habits travel home with them.
The record and reproduce (studio and home) ends of the recording process
have always been making their different demands, and it does not logically
follow that the listening environment should be the same in both situations.
Again quoting from Toole's paper [1], "Strong
reflected or diffused sounds from behind can seriously impair the clarity of
the virtual sound images between the loudspeakers. Even at what appear to be
safe distances the same can be true if reflecting or diffusing surfaces are
large. A simple test is to reproduce monophonic pink noise at equal levels
through both loudspeakers. For a listener on the axis of symmetry, the result
should be a compact auditory image midway between the loudspeakers. Moving the
head slightly to the left and right should reveal a symmetrical brightening, as
the acoustical cross-talk interference is changed, and the stereo axis should
"lock in" with great precision. Start close to the loudspeaker and
then move further away. It would seem to be a fundamental (minimum?)
requirement that one should be able to find a stereo axis, and hear a clear
centre image, in any position where critical judgements are to be made.
If the new generation of cross-talk cancelling binaural and 3-D
simulation systems are to be truly successful, a "clean" acoustical
path to the ears may be an absolute necessity. If a listening room garbles the
cross-talk itself, it will most certainly garble the cancellation."
In a paper in the JAES [10] in 1986, Jim Wrightson wrote "The problem in the context of studio
monitoring is that, regardless of the conditions, the room-monitor loudspeaker
combination places its indelible imprint on all that transpires. For this
reason a control room should be neutral, it should add as few sonic
colourations as possible to the sound generated by the monitor loudspeakers. In
this context, poorly designed loudspeakers should exhibit their flaws; well
designed loudspeakers should demonstrate their assets. The aural purpose of a
control room is to provide the best possible free-air representation of the
signals carried by the studio's audio system."
Surely, the above conditions are most ideally met by the rooms of the
type being proposed here. In the Non-Environment type of room, the conditions
for neutrality and room to room compatibility would seem to be considerably
greater than for any other concept of control room currently on offer. The
number of variables in Toole's list in the previous
paragraphs significantly reduced.
1) Off-axis anomalies play little part in the proceedings. 2)
Loudspeaker design is simplified.
3) As most reputable monitors have reasonably linear on-axis responses,
the perceived difference when mixing with different monitors should be less
than is all too often currently the case.
4) Reduced room decay time prevents the masking of low
level detail, an important factor in the "quality control" process.
5) Reduced room decay minimises timbral
colouration caused by the room. 6) Reduced room reflexions
enable precise stereo imaging, albeit over an area which is a function of room
size and monitoring geometry.
7) Reduced room reflexions allow the detection
of unwanted phase anomalies which can result from the over use, or
inappropriate use, of effects processors. 8) Minimising room effects allows the
various persons in the room to perceive the same musical balance between the
instruments.
9) Reduced room effect allows the clearer perception
of the ambience of the recording spaces or the use of effects, and hence their
appropriateness, or otherwise, to the recording.
10) Reduced room effect gives a greater possibility of
working in other rooms of similar nature on a single recording project, even if
the rooms are physically quite different, with a minimum of acclimatisation to
the new location.
If the greatest price that must be paid for these advantages is a more
restricted stable stereo imaging area in the smaller rooms, then it would seem
to be a small price. When a mix is being built-up, the desired timbre of an
instrument can sometimes need to be changed in order to avoid masking by other
instruments as they are introduced. Similarly, the optimum balance between the
individual instruments can change. The instrumental balance of a rhythm section
may need to be adjusted as other instruments, perhaps with similar tonal
content, are introduced into a mix, and have their effect on the perception of
some of the rhythm instruments. Just about the only thing which is usually
static during the build-up of a mix, is the localisation of instruments in i stereo panorama. Even in the smallest rooms, where the
stereo imaging will be true over perhaps only the space of one seat, then that
seat is always available for occasional reference. Nothing about the imaging
will suddenly change due to the dynamics of the mixing process.
In very small rooms however, one should also consider the fact that the
monitoring loudspeaker are often forced into positions where they cannot
possibly subtend an angle of 60, or less, at th
monitoring position. This, in itself, will degrade the stereo imaging stability,
irrespective of th type of room in which they are
being used. To subtend an angle of less than 60 in a small roor
would be likely to put any mixing personnel, other than the person on the
centre line, outside c the loudspeaker pair, and this situation would be less
desirable, overall, than the unstable imaging produced by the greater subtended
angle created by the wider spacing of the loudspeakers. Any comparisons of the
stereo imaging in rooms of different design concepts should always take into account
any differences in subtended loudspeaker angles, or the comparisons would be
irrelevant.
Obviously, for the studios involved in the production of radio dramas,
or the like, where much more movement of the sound images is likely, the order
of monitoring priorities may be somewhat different. In those cases, the greater
use of dynamic panning, plus the possibility of having more people involved in
the mixing process, may lead to a requirement for a large listening area over
which the stereo sound-stage was more stable. Perhaps this would take priority
over the need for more absolute knowledge of the timbre of the sounds, however,
the title of this document did state ....for Stereophonic Music Recording Studios.
Surely, it is better that if there is one thing likely to be less easy
to constantly monitor, then it should be the one thing which is least likely to
vary. It should also be remembered that in largt
rooms, the problem does not exist, and in the low decay time rooms, the true
imaging is better than in many other rooms, with all their attendant individual
characteristics. Non-Environment rooms show stereo as it is recorded. If stereo
is not enough, at least over two loudspeakers, the it is the format which
should be criticised, not the rooms which show its failings. Surround sound
systems are addressing this limitation to good effect.
Also noted in Toole's paper [1] were studies
by Kuhl and Plantz [ll] and Kishinaga et al [12]. Kuhl and Plantz, using only
professional sound engineers as listeners, found that for dance and popular
music, plus voice and radio drama, the preferences were for monitoring what was
essentially the direct sound from the loudspeakers. On the other hand, at home,
the majority of these same listeners, if listening to symphonic music,
preferred a more reflective environment. Kishinaga et
al concluded from their investigations "that in designing a listening
room, optimum arrangement of absorbing and reflecting materials differs
depending on the purpose of listening Recording / quality control and listening
for enjoyment are very different purposes. Toole went
on to say "some recordings are clearly better matched to certain styles of
reproduction than others. The situation [standardised listening conditions]
would appear to be far from resolved".
Indeed so, at the "decode" or reproduction end at least, where
tastes and preferences lead to different conditions for maximum enjoyment of
the music. However, if these same variables are allowed to affect the encode
process in the studio control room, then it only leads to chaos in trying to
decode-to-taste any set of non-standard encodings. Again quoting from Toole, "In studio monitoring the general rule is to
provide listeners with a sound-field that is predominantly direct. In these
conditions, the principal impression of direction, image size and space are
those that can be provided by the stereo signal itself'.
Surely, this is all that we can aim for in the studios. If we
concentrate on what is on the tape, then the provision of a more consistently
monitored product will allow the record buying public to optimise their own
listening conditions to suit their own pockets and preferences. Trying to guess
what these conditions may be does nothing but harm to the encode process, and leads
to absurd magnifications of the problems at the decode end. This being the
case, the monitoring of the stereo in the Non-Environment rooms, without any
enhancement or embellishments for greater enjoyment, would seem to be ideally
suited to the production of recordings to a more consistent standard of
reference, which should in turn make life easier for mastering facilities and
the manufacturers of domestic equipment. Whatever that equipment may seek to
achieve, its design and production would be made much easier without the often
unintentional variability of the recorded material, affected as it is by the
vagaries of current control room monitoring.
A Parallel Issue
In 1986, Stanley Lipshitz published a paper
[13] on the subject of the spaciousness and airiness of different techniques of
recording using spaced microphone techniques. The following quotations are
taken from that paper, and many parallels can be drawn between the lack of
detail and false spaciousness of spaced microphone techniques, and the loss of
detail perception associated with the false spaciousness which results from
anything less than direct monitoring.
On perceived spaciousness:
"I believe that spaced-microphone techniques are fundamentally
flawed, although highly regarded in some quarters, and that
coincident-microphone recordings are the correct way to go. The air and depth
so valued in spaced-microphone recordings are shown to be largely the artefacts
of phasiness due to the microphone spacing, and not
acoustic ambience at all.
"I shall try to make a strong case for the use of
single-point (i.e. coincident) stereophonic microphone techniques in preference
to widely spaced microphone configurations.
"I am aware that I am treading on
dangerous ground here, in that an aesthetic judgement is called for when
attempting to rate stereophonic recording as good or bad.
"Often it is the case that the more ethereal the
sound images appear, then the better the system is appreciated. Such systems
can be regarded, however, only as attempts at pseudo-stereophony.
"I consider such blurring to be a defect,
although I will admit that some people like soft-focus lenses. [In
photography.]"
On stereo reproduction:
"The problem of freeing the listener from the
stereo seat by enlarging the region within whit the image remains reasonably
free from distortion, is in my view a reproduction related que:
rather than one bearing directly upon the recording technique.
"If more than two transmission channels are available, one can do
much better.
"For such reproduction systems (for example Ambisonics) an acoustically dead listening roc would be
preferable. It is my belief that as more sophisticated reproduction systems
become available, the correct trend will be toward more anechoic listening
environments."
On the psycho-acoustics of stereo:
"Of primary concern is the fact that the ear on the side of the
earlier loudspeaker need not receive the louder signal, and indeed at low
frequencies does not! So the interaural level
differences produced at low frequencies do not always reinforce the imaging
produced by impulsive sounds. Sometimes, the low frequency image pulls in the
opposite direction from image of the transient, broadening and smearing the
overall image.
"So we must consider stereo hearing as distinct
from natural hearing, and actually quite unnatural - it is in fact an
artificial creation."
And, on the impact of modem recording technology:
"The last few years have seen a dramatic improvement in our ability
to accurately record, distribute, and reproduce musical signals, and the
benefits of this digital technology are now available to consumers in their
homes.
"What is on the master tapes is now laid bare
without the masking effects of the earlier technology, and what the consumer
can now hear is frequently unpleasant.
"I feel that the source material [not referring to electronic music
here] is now the weakest line in the chain from the
artist to the listener, and that improvement here requires an enlightened
reassessment of what goes on in the process of capturing the original sound and
reproducing through two loudspeakers."
All of the above quotations from Stanley Lipshitz
would seem to point to the need for detailed and direct monitoring as the only
means of hearing into what is really on the recording medium
and that spaciousness should, as discussed elsewhere in this document, be an
aspect of the fu reproduction environment. For detailed monitoring however, it
would appear that spacious and the resolution of fine detail are largely
mutually exclusive. It should be recognised, however, that the authors may
possess a sensory bias towards the more detailed types of monitoring, as they
admit to having a general dislike for soft focus photography: but also, it
would seem, does Stanley Lipshitz.
11 Reference;
Floyd E Toole "Loudspeakers and Rooms for
Stereophonic Sound Reproduction", AES 8th
International Conference, Washington DC, 1990 P R Newell, K R Holland, T Hidley "Control Room Reverberation is Unwanted
Noise", Proc Institute of Acoustics, Vol 16,
part 4, pp 365-373, Reproduced Sound 10 Conference, Windermere, UK, 1994
Philip Newell "The Non-Environment
Control Room", Studio Sound, November 1991, pp 22-29
Philip Newell "Studio Monitoring Design", Focal Press, 1995
[5] Eric Stark
"The Hidley Infrasound Era", Studio Sound,
pages 52-56. December 1995
Michael Gerzon "Three
Channels, The Future of Stereo?", Studio Sound, pp 112-121, June 1990
[7] D Davis, C
Davis "Sound System Engineering", 2nd edition, Howard Sams, Indianapolis IN, USA, 1987
[8] D Moulton, M Ferralli,
S Hebrock, M Pezzo
"The Localization of Phantom Images in an Omni-Directional Stereophonic
Loudspeaker System", AES 81st Convention,
pre-print No 2371, 1986
[9] I H Flindell,
A R McKenzie, H Negishi, M Jewitt,
P Ward "Subjective Evaluations of Preferred Loudspeaker Directivity",
AES 90th Convention, pre-print No 3076, page 6, Paris
1991
[10] Jack Wrightson "Psychoacoustic Consideration in the Design
of Studio Control Rooms", JAES, Vol 34, No 10,
pp 789-795, 1986
[11] W Kuhl, R
Plantz "The Significance of the Diffuse Sound
Radiated from Loudspeakers for the Subjective Hearing Event", Acustica, Vol 40, pp 182-190, July 1978
[12] S Kishinaga,
Y Shimizu, S Ando, K Yomaguchi "On the Acoustic
Design of Listening Rooms", presented at the 64th Convention of the Audio
Engineering Society, pre-print No 1524, Nov 1979
[13] Stanley P Lipshitz,
Stereo Microphone Techniques ... Are the Purists Wrong? Journal of the Audio
Engineering Society, Vol 34, No 9, pp 716-735 (September 1986)

Figure 5 Step Response of Monitor Loudspeaker at 2m in
Small Non-Environment
of Monitor
Loudspeaker at 2m in Large Non-Environment Room

end