A Proposal for a More Perceptually Uniform Control Stereophonic Music Recording Studios

 

Philip R Newell Moana, Spain

Keith R Holland

ISVR, University of Southampton, UK

 

Improvements in domestic sound reproduction equipment have begun to highlight the inconsistencies of studio monitoring conditions. This paper discusses the evidence that many of the roots of monitoring inconsistency lie not only in some erroneous and outdated beliefs relating to reverberation time, but also in attempts to extract more from stereophony than it can simultaneously supply. Some performance details are presented for some rooms built using the concepts proposed herein.

 

 Introduction

Judging by the wildly different tonal balances on CD recordings for sale to the public, the state of room to room compatibility of the listening conditions in recording studio control rooms still appears to fall far short of what a professional industry should by now be achieving. Digital recording has brought its share of problems, but one problem which we must credit it with removing is the variability inherent within the recording media. We may still have differences in the sonic performance of A to D and D to A converters, but the consistency of the digits which leave the recording studios and arrive in peoples' homes is now guaranteed. The vagaries of analogue transfer to, and recovery from, vinyl disc or magnetic tape have been consigned to the past, so the excuses for much of the previous variability can no longer be used in defence of the current situation. Some variability or inconsistency in the spectral balance of recordings is no doubt within a range of artistic interpretations by recording staff; it can be dependent upon what they feel is appropriate for any given track, but careful listening to much of the available recorded material will soon reveal that most of the variability is not intentional. The source of the problem certainly lies, to a large degree, in the variability of monitoring conditions in the control rooms of the recording studios in which they are mixed.

 

There is a general over-reliance on inexpensive, close-field monitoring loudspeakers, and one of the reasons for the existence of this state of affairs is the lack of faith in the widely disparate range of combinations of large studio monitors and different control room philosophies. As the differing points of view as to what is "right" have continued to exist amongst many experienced designers of both the loudspeakers and the rooms, it is little wonder that the recording staff have continued to show uncertainty about what they feel comfortable with. They have, in many cases, opted for highly personalised solutions which work for them, individually, on the most usual types of music that they record.

 

Studio designers are all aware of the problems inherent in the use of small, close-field monitors, such as the inability to produce the lower frequencies at appropriate levels, or the inconsistency in room positioning, and hence the uncertainty about how the room modes will, or will not, be driven. There are also the attendant problems of reflexions from the mixing console and other related equipment. What is more, it is true that except for x very few types of commonly used small monitors, their ability to resolve fine detail is sadly lacking. As a result of this, noises, bad edits, the operation of gates, the clashing of phase distorted effects artefacts (which often lead to undue harshness in the sound) and a host of other problems, all too frequently add themselves to the irregularities of spectral balance which occur due to entire octaves of the musical spectrum being left unmonitored.

 

 Sources of Uncertainty

We cannot, and nor should we, be too dictatorial about all that occurs in an artistic industry, for there needs to be a range of concepts of what is right in order to accommodate the individualities of different producers, musicians, and listeners alike. All should be free to make their decisions for themselves, but those decisions can only be valid as long as they are aware of the decisions that they are making, and that they are not being misled into them by ill-conceived control room monitoring conditions. In many cases, the designs have been based on a long held belief that a control room should possess a reverberation time (RT) which approximates to an average of domestic listening conditions. RT is, of course, not an accurate concept for use in most small rooms, and certainly is not applicable to highly absorbent rooms, but nonetheless, RT60, T60 or whatever other decay description has been used has frequently had some domestic point of reference.

 

In turn, many other design principles have been based on aspects of stereo perception in the rooms in which the commercially available recordings will eventually be played. This shows admirable concern for the record buying public, who keep the music industry alive, but it is also apparent that much of the good intent has been mis-guided. In trying to take into account the extraordinary number of factors that are involved in the domestic listening process, and, especially in the light of the fact that some requirements for the optimal reproduction of different types of music and recording techniques are mutually exclusive, the efforts have frequently not achieved their goals. By trying to take into account so many of the variables in the reproduce end of the chain, the production end has itself suffered a lack of certainty that has, unfortunately, served to introduce even more uncertainty into the reproduce end of the chain.

 

In a paper to the 8th International Conference of the AES, in 1990 [1], Floyd Toole presented a paper, "Loudspeakers and Rooms for Stereophonic Reproduction". The abstract began thus, "Stereophonic reproduction attempts to reconstruct, in the minds of listeners, replicas of the timbral and spacial effects of acoustical events that have occurred at earlier times and other places. It matters not whether the 'live" event consisted of musicians in a natural acoustical environment, or a multi-track creation monitored in a control room. In all cases, musicians and production personnel presumably heard a stereophonic reproduction that met their artistic and technical expectations. Assuming that the necessary information has been preserved in the recording, a replication can be successful only to the extent that the loudspeakers are capable of reproducing the appropriate sounds, and that the listening rooms are capable of conveying those sounds to the ears of listeners. Variations in loudspeakers and rooms create many difficulties in achieving this goal. Although it has been traditional to consider the loudspeakers and room as separate entities, this approach is no longer justified. The loudspeakers, room and listener comprise a system within which the sounds and spacial illusions of stereo are decoded, and they must be considered together."

 

The above first part of Toole's abstract is a lucid, concise, and powerful summing-up of a complex situation in real life. The last sentence of the above quotation is very significant: "The loudspeakers, room and listener [to the end product, ex. commercial CD] comprise a system within which the sounds and spacial illusions of stereo are decoded.........". Decoded! If they are being decoded, then in the production process they must have been, in some way, considered to have been encoded. No encode / decode process can be expected to work optimally unless the decoder can track the encode process, and in order to do this, the encode process must be known, but in reality, the monitoring (encode) conditions during mixing are rarely known to the CD listener.

 

Recordings are not sold in the shops with instructions about on which type of loudspeaker, and in which size, shape, or other property of room they should be auditioned. There is also absolutely nothing that can be expected from the foreseeable future that will be likely to reduce the range of domestic listening equipment and conditions. Furthermore, music is not the most important thing in the lives of the majority of people, so it will continue to be normal for people to buy houses not for the acoustics of their rooms, but for a multitude of other priorities, and then find appropriate rooms in which to listen to the music of their choice. Different loudspeakers will suit different rooms, different types of music, different recording techniques and media, different budgets, and different personal tastes on precisely what people like to hear in order to achieve the most enjoyment that they can from the music of their choice during its reproduction. There is, therefore, adequate justification in having available a good choice of reproduction equipment.

 

Taking many of these variables into account, there will be the audiophiles who choose their systems and listening conditions with great care, and who may well enjoy optimising them for their favourite types of music, recording techniques and storage media. The most appropriate loudspeakers and listening conditions for rock music recorded on an analogue system, will, in all probability, be different from those most suited to the enjoyment of middle / side, stereo microphone technique recordings of digitally recorded classical music. However, even within the latter, quite highly defined set of recording conditions, there will be a wide range of available and appropriate reproduction systems and environments, and all will sound different.

 

There is an argument for the case that recordings of each of the above two, very different, music recording styles should be made in control rooms optimised for their own specific characteristics. However, the variation within the sub-groups is so great that to standardise on one arbitrarily chosen type of loudspeaker, at a given distance and at a given level, will only transfer many of the vagaries of the reproduction environment into the production end of the chain. This is because the decision made as to which equipment and condition to use for the production process may be largely based on the results on similar equipment in the reproduction end of the chain, and this could lead to very volatile standards as fashions change.

What this document proposes is an abandonment of the attempts in the control rooms to try to accommodate too many of the variables in the reproduction systems, and to concentrate on a stripped down version of the more fundamental aspects of the production and quality control needs. This will allow recordings to be monitored in more detail, with more consistency; and, with the knowledge and skill of the recording staff, to make it easier to predict what types of reproduction conditions would be best suited to optimally decode what the production staff were hoping that the final listeners would hear.

 

 Removing a Variable

In 1994, a paper was presented to the UK, Institute of Acoustics [2] entitled "Control Room Reverberation is Unwanted Noise". The paper put forward the concept of the "Non­Environment / Monitor Dead" rooms, which sought to provide monitoring conditions as close as could be achieved to free-field conditions. These rooms can reduce the decay time of reflexions and modal energy to such low levels that the perception of many recording defects becomes much easier. The paper also contained a discussion of the majority of the other widely used control room acoustic control philosophies. It noted the fact that, due to the sensitivity of the human hearing systems, most attempts at producing optimised decay conditions for music monitoring had yielded control rooms which sounded subjectively very different, and tended to lead to different musical conclusions when mixing the same piece of music, in different rooms.

 

Figure I Plan of Non-Environment Control Room Shaded Areas are Wide Band Absorber Systems

 

Figure 2 Side Elevation of Non-Environment Control Room, Showing (a) Horizontal Rear Absorbers and (b) Vertical Rear Absorbers

 

Figures 1 and 2 show the general concept of the "Non-Environment" or "Monitor Dead" approach [3,4]. It can be seen that the side walls, the rear wall, and the ceiling are made as acoustically dead as possible to as low a frequency as possible. The front wall is hard, dense and reflective, and the floor is also hard. These two surfaces, together with the hard surfaces of any equipment which may be facing the listener, provide all the acoustic life, for sounds produced within the room, to alleviate any sense of being in an anechoic chamber. The loudspeakers are mounted flush in the solid front wall, so are not actually in the room, but form a part of one of its perimeter surfaces. The front wall provides a large baffle against which the loudspeakers can push, thus aiding the efficiency and linearity of low frequency radiation. The flush mounting also removes any irregularities caused by cabinet edge diffractions, or by path length anomalies of waves which may seek to travel behind a cabinet mounted within a room, and which return to

the room from a front wall with an irregular phase relationship.

Except for the floor, and any equipment placed within the room, the monitors face something approximating to an anechoic chamber. The acoustic conditions provided by the room are thus dependent upon whether a sound was produced within the room, or from one of its boundaries. In the two cases, the overall decay characteristics of the room would be very different. From the monitoring direction, the reflexion problems from recording equipment can be dealt with by angling the equipment such that reflexions pass away from the listener and into an absorbent surface. If this cannot be done directly, then the offending surface can be protected, either by an

 

 

1 Introduction

Judging by the wildly different tonal balances on CD recordings for sale to the public, the state of room to room compatibility of the listening conditions in recording studio control rooms still appears to fall far short of what a professional industry should by now be achieving. Digital recording has brought its share of problems, but one problem which we must credit it with removing is the variability inherent within the recording media. We may still have differences in the sonic performance of A to D and D to A converters, but the consistency of the digits which leave the recording studios and arrive in peoples' homes is now guaranteed. The vagaries of analogue transfer to, and recovery from, vinyl disc or magnetic tape have been consigned to the past, so the excuses for much of the previous variability can no longer be used in defence of the current situation. Some variability or inconsistency in the spectral balance of recordings is no doubt within a range of artistic interpretations by recording staff; it can be dependent upon what they feel is appropriate for any given track, but careful listening to much of the available recorded material will soon reveal that most of the variability is not intentional. The source of the problem certainly lies, to a large degree, in the variability of monitoring conditions in the control rooms of the recording studios in which they are mixed.

There is a general over-reliance on inexpensive, close-field monitoring loudspeakers, and one of the reasons for the existence of this state of affairs is the lack of faith in the widely disparate range of combinations of large studio monitors and different control room philosophies. As the differing points of view as to what is "right" have continued to exist amongst many experienced designers of both the loudspeakers and the rooms, it is little wonder that the recording staff have continued to show uncertainty about what they feel comfortable with. They have, in many cases, opted for highly personalised solutions which work for them, individually, on the most usual types of music that they record.

Studio designers are all aware of the problems inherent in the use of small, close-field monitors such as the inability to produce the lower frequencies at appropriate levels, or the inconsistency in room positioning, and hence the uncertainty about how the room modes will, or will not, be driven. There are also the attendant problems of reflexions from the mixing console and other related equipment. What is more, it is true that except for a very few types of commonly used small monitors, their ability to resolve fine detail is sadly lacking. As a result of this, noises, bad edits, the operation of gates, the clashing of phase distorted effects artefacts (which often lead t( undue harshness in the sound) and a host of other problems, all too frequently add themselves tc the irregularities of spectral balance which occur due to entire octaves of the musical spectrum being left unmonitored.

 

Sources of Uncertainty

We cannot, and nor should we, be too dictatorial about all that occurs in an artistic industry, for there needs to be a range of concepts of what is right in order to accommodate the individualities: of different producers, musicians, and listeners alike. All should be free to make their decisions for themselves, but those decisions can only be valid as long as they are aware of the decisions that they are making, and that they are not being misled into them by ill-conceived control room monitoring conditions. In many cases, the designs have been based on a long held belief that a control room should possess a reverberation time (RT) which approximates to an average of domestic listening conditions. RT is, of course, not an accurate concept for use in most small rooms, and certainly is not applicable to highly absorbent rooms, but nonetheless, RT60, T60 or whatever other decay description has been used has frequently had some domestic point of reference.

In turn, many other design principles have been based on aspects of stereo perception in the rooms in which the commercially available recordings will eventually be played. This shows admirable concern for the record buying public, who keep the music industry alive, but it is also apparent that much of the good intent has been mis-guided. In trying to take into account the extraordinary number of factors that are involved in the domestic listening process, and, especially in the light of the fact that some requirements for the optimal reproduction of different types of music and recording techniques are mutually exclusive, the efforts have frequently not achieved their goals. By trying to take into account so many of the variables in the reproduce end of the chain, the production end has itself suffered a lack of certainty that has, unfortunately, served to introduce even more uncertainty into the reproduce end of the chain.

 

In a paper to the 8th International Conference of the AES, in 1990 [1], Floyd Toole presented a paper, "Loudspeakers and Rooms for Stereophonic Reproduction". The abstract began thus, "Stereophonic reproduction attempts to reconstruct, in the minds of listeners, replicas of the timbral and spacial effects of acoustical events that have occurred at earlier times and other places. It matters not whether the 'live" event consisted of musicians in a natural acoustical environment, or a multi-track creation monitored in a control room. In all cases, musicians and production personnel presumably heard a stereophonic reproduction that met their artistic and technical expectations. Assuming that the necessary information has been preserved in the absorbent shield, or by a streamlining devices that will deflect the incident waves around or away from the object, and prevent them from, in particular, returning to the front, reflective wall and thence back to the listener. The aim is to monitor the output from the loudspeakers, and nothing more.

 

By means of these techniques, rooms can be built which, by virtue of their relative absence of monitoring acoustics, can achieve very high degrees of room to room compatibility. The studio designer Tom Hidley, has been pursuing techniques of controlling the room modes down to frequencies as low as IO Hz for his Hidley Infrasound rooms [5], but some of the processes involved in achieving this very low frequency absorption lend themselves to the control of the more "audible" low frequencies in much smaller rooms. This would seem to be important, because so many of the control rooms currently in use, around The World, are in the 25-35m2 region, and it has typically been this range of room size which has suffered so badly from inter ­room incompatibility. Whilst the small Non-Environment rooms, of different shapes and sizes, have different ambient characteristics for general speech and noises produced within the rooms, (due to the different natures of the reflective materials and the different reflexion times in different sizes of rooms) they all have remarkably common monitoring characteristics, which are essentially those of the loudspeakers, modified by whatever small ambient aberrations remain.

 

Figure 3 shows the frequency response function of one monitor loudspeaker at a distance of 2m in a relatively small Non-Environment room, and figure 4 shows a similar measurement taken of a similar loudspeaker in a large Non-Environment room. The two plots are remarkably similar considering the different room sizes (the dip at about lkHz in figure 4 is due to a crossover discrepancy that was subsequently resolved). Figures 5 and 6 show the step responses of the monitors in the small and large rooms respectively.

In brief, the rooms are made highly absorbent at the mid and high frequencies by the use of conventional fibrous absorbent materials. Low-mid absorption is provided by acoustic labyrinths through which the waves are guided, diffused and diffracted, before being finally forced to drag their way through large, absorbent-lined ducts. The lowest frequencies are addressed by means of large panel absorbers, air damped, constrained layer membrane absorbers, and dead sheet membrane absorbers, which effectively line the room with a heavy, acoustically-dead, semi-limp bag. The overall control is provided by the whole system of absorption, but for the purposes of this discussion, the concept can be likened to an anechoic chamber, with one wall replaced by a hard wall in which the loudspeakers are mounted, and a hard floor. The floor, may or may not, have openings at the front and rear of the room for the utilisation of under floor absorption. This concept will suffice for the remainder of this discussion.

 

Irrespective of size or shape, such a termination will be highly uniform. The low frequencies may vary slightly with room size and dimensions, but with suitably effective absorption, they are likely to do so to a considerably smaller degree than with most other, current control room designs. What is more, the response perturberances caused by the rooms are likely to be at lower relative levels to the direct sound than is the case with most other rooms. This was one of the main benefits being proposed in the IOA paper [2]. By the reduction of room artefacts, the

lower level details in the recorded music can be more readily perceived, and any unwanted aspects of the recordings, such as the audible operation of gates, can be dealt with before they become embarrassingly evident to the record-buying public.

 

Limitations, Real and Imaginary

Over the years, a number of criticisms have circulated about the room concept being discussed here. Some of these comments have had substantial grounds to support them, but others have been based on misconceived theories. Examples of the latter type are comments such that the lack of modal support will produce rooms which are subjectivity lacking in bass, and that an over-dead monitoring acoustic will lead to the excessive use of reverberation when mixing. The lack of modal support would only produce bass-light mixes if the decay time at the middle and high frequencies remained typical of more conventional control rooms. This was the case with some of the control rooms of the 1970s and early 80s, where the excessive use of bass traps wa. incorporated into rooms which still possessed significant decay times at higher frequencies. The Non-Environment rooms, however, are all-trapped, not just bass-trapped, and, low as it is, the LF decay is still predominant in the time / frequency response. A person who is used to working in more lively rooms may initially be unaccustomed to the low decay time, but it is usually rapidly adjusted to, and the clarity and impact which the low frequencies posses is a revelation. If it is all considered to be too dry, then that is what is on the tape, in which case either the mixes can be given reverberation, according to taste, or the recording acoustics or microphone location can perhaps be changed.

 

This brings us nicely to the second of the misguided criticisms, that a low decay time room may lead to the excessive use of reverberation or effects in the mixing process. The fact is that reverberation added, even in a totally dead room, is unlikely to become excessive when played in a more reverberant space, as the differences in the decay times of any reasonable control rooms are subtle, by comparison to the lengths and quantities of reverberation effects that are usually applied to recordings or mixes. One thing which is often noticed, though, is just how clear the reverberation tails can be heard in very low decay time rooms, and it is just as well to monitor these carefully, as synthetic reverberation can produce some undesirable decay tail artefacts, which all too frequently go unnoticed: until it is too late! In low decay time rooms, the sound of the rooms in which the microphones were placed, or the different effects processor which had been used on a recording, become clearly recognisable, to a degree which is normally only detected on headphones. What is more, every different conventional control room will produce different perceived ratios of transient and quasi-steady-state sounds, certainly beyond any critical distance. The transients fall off at 6dB per doubling of distance from the near field of the source, but the quasi-steady-state signals may be supported by the modal decay characteristics of the rooms. This is a fact of life in all but the most acoustically dead monitoring conditions, so it suggests, once again, that this is another area where a dead monitoring acoustic will be the only one where any general consistency of quality control could be achieved.

 

 Spacial Anomalies

 

Three of the more substantial criticisms of the low decay time monitoring conditions are that they lack a sense of spaciousness; they are not representative of normal listening conditions; and that, in the smaller rooms of this type, they fail to support an adequately wide area of stereo imaging. The last of the three points will be dealt with, in some detail, a little later, so let us first consider the question of spaciousness. An accurate rendering of spaciousness can only be achieved by multiple, lateral reflexions, arriving from the directions, and with time delays, that are appropriate to the performance space, whether that space was real, or imaginary. A less accurate sense of spaciousness, which is perhaps a more realistic goal, can still only be achieved by reflexions coming from a direction other than that of the stereo loudspeakers. It is not inherent in a conventional stereo recording, and will be dependent in its nature upon the reproduction acoustics. It can therefore never be truly representatively monitored at the time of mixing. Surround sound helps us to tackle this problem somewhat more reasonably.

 

Spaciousness, and the perception of detail, tend to be mutually exclusive, whether it is in the performing space, the microphone technique or the reproduction chain. Orchestral conductors hear more detail from their rostra than the audiences hear from the seats in the auditoria. The conductors need to hear the detail to be able to do their job, but most audiences like to hear the all-enveloping sound from the auditoria, because it pleases them. Distant, stereo microphone arrangements, such as spaced omnis, produce a greater richness of sound than close, multi­microphone techniques, but the latter can produce more fine detail, and perhaps, more dynamic impact. The choice of which technique to use will be a creative decision by the people responsible for the recording. In a room for critical monitoring, however, where the same compromises exist, it would seem that experienced personnel could far more realistically achieve their aims in rooms in which they could hear the fine detail, and then interpret how things would sound in a more reflective space, rather than in rooms in which they could hear a spacious sound, but could only guess at what problems may lurk in the low level detail, masked by the spaciousness artefacts of the room. In any case, it would not be too difficult a task to introduce suitable reflectors into a relatively acoustically dead room for a final and more spacious auditioning of the end result; once, that is, any problems in the finer details had been monitored, and resolved.

 

The criticism about not being representative of domestic listening conditions would appear to be irrelevant. To date, all too many rooms which do attempt such domestic commonality often fail to produce the intended compatibility in the end result. Averages in themselves need not be representative. The average of the integers from 1 to 10 does not represent, even within 20%, more than 2 of the 10 integers. The majority of the integers would not be closely represented by the average. The World-wide range of domestic listening conditions is far too wide for any "average" control room to represent. Motor cars and headphones, which now form a large part of the international listening environment, are also not represented by any average room.

 

In fact, none of the normal arguments for control room specifications have much relevance for cars, headphones, or a wide range of domestic loudspeaker listening. What this seems to suggest is that we ought to know more about what is actually on the storage medium. This needs to be known, and recorded in a more predictable manner, in order for the disparate reproduction systems to be able to make more reliable attempts to decode the intentions of the recording personnel more consistently and appropriately, according to the reproduction requirements and conditions. The effect of possible reproduction environments must be deduced from the audiological and psycho-acoustic cues in the recording, and how they will relate to the various listening conditions. In other words, the recordings should allow the maximum to be reliably extracted from them, without bias to any particular set of reproduction conditions, unless, that is, the recordings are being made for some highly specific purpose, such as television commercials.

 

 Solutions

By taking the control room acoustics out of the recording chain, the emphasis of the burden of monitoring linearity shifts on to the loudspeakers. As loudspeaker performance has been converging faster than room performance, this simplifies the task of producing more compatible control room monitoring. Furthermore, much of the effort in loudspeaker design research has been involved with the amelioration of the problems caused by a typical loudspeaker / room interface. Loudspeakers designed for monitoring in Non-Environment rooms can concentrate on the optimisation of axial impulse response performance, with less emphasis needing to be placed on the directivity problems well off-axis.

 

It is often the constraints of producing smooth, wide-angle directivity / frequency performance which restricts the choice of drivers in a monitor system. The fewer restraints that there are on driver choice, the easier it is to choose drivers for their sonic neutrality, low non-linear distortions, achievable SPL and many other parameters that the usual need for off-axis directivity control frequently does much to compromise. Simpler monitor systems, of excellent ability to reveal fine detail and work at high SPLs, could become more reasonably priced, and therefore spread the availability of more neutral monitoring conditions to a greater proportion of the industry. The current state of control room monitoring is frequently so "hit and miss" as to barely warrant the use of the terms "control" and "monitoring". Furthermore, an affordable means of achieving a more linear and consistent performance from the middle order of recording studios would be likely to have more effect on the recording industry's overall output than would be achieved by seeking to refine, ever further, the upper echelon of elite studios; though that work should continue for its own valid reasons. One of the great benefits of Monitor-Dead / Non-Environment rooms is that the techniques are not unduly expensive, and apply, with only minimal changes, to control rooms of all sizes.

 

In the rooms being described here, phase responses become very important, as the absence of reflexions in the overall sound allows the detection of phase characteristics which even a single lateral reflexion can render inaudible. Many of these phase products, which are at the root of the harshness of many modern recordings, often go unnoticed, and hence also go uncorrected when using low resolution monitors in conventional rooms. With the absence of room characteristics in the monitor chain, the use of high resolution, linear monitor systems makes it much easier than is currently usual to achieve not only the desired timbral balance of individual instruments, but also the desired balance between the instruments. It also makes evident any non-linear distortions and the effect of any poor acoustics in the original recording spaces. The degree of openness and spaciousness contained within the recording, such as characteristics of transparency and depth, can also be more easily assessed.

 

 Stereo Imaging Constraints

Let us now turn to the other major point which has been raised in relation to these control rooms; their stereo imaging. Figure 7 shows a typical stereo perception area from a pair of loudspeakers situated in two different sized rooms. The area is a function of geometry, so its actual size is determined by the distances between and from the loudspeakers, certainly up to a point where the inter-channel time delays become so great as to make stereo perception impossible. In large rooms with, say, 4 metres between the loudspeakers (C and D), and 5 metres to the mixing console, the area available for stereo perception is sufficiently large to cover the persons likely to be working behind the central 3 metres or so of mixing console (positions A, G and B). As the above dimensions decrease, so does the area of good stereo localisation. In very small rooms (loudspeakers at E and F), and at close listening distances, the area of true stereo perception is perhaps only large enough for one or two people to appreciate comfortably (positions A and posibly G). However, this should perhaps not be seen as a limitation of the room, but a clearer than normal demonstration of how two-speaker stereo should behave.

 

 The Concept of Stereo as Currently Used

If we look back at the early history of stereo, there were two significant attempts at the reproduction of a "solid" sound; "stereos" being the Greek word for solid: a wall of sound, in other words. The first experiments relevant to the development of current stereophonic sound recording and reproduction took place in the 1930s, by Snow, Fletcher and Steinberg at Bell Laboratories in the USA, and by Alan Blumlein at what was to become EMI in the UK. The Bell scientists worked towards the reproduction of the originally recorded wavefront, on a macro scale, in the listening area, by using multiple spaced microphones and multiple loudspeakers. Blumlein, realising that a two channel system was all that would be commercially practicable in the then foreseeable future, considered the Bell proposals to be too much to ask of a domestically realisable system. He therefore opted for the implementation of a system relying on a set of psycho-acoustic criteria that could reproduce, in the area of a stereo seat, a realistic frontal sound stage using only a two channel record / reproduce process.

 

The work at Bell Laboratories envisaged the likelihood of the use of at least three loudspeakers for reproduction, which they quite rightly considered superior "by eliminating the recession of the centre-stage position, and in reducing the differences in localisation for various observing positions". In the 1970s, '80s and '90s Michael Gerzon [6] put forward much work on new proposals for the three-speaker reproduction of stereo, some of which were totally compatible with two channel recording systems. Although these more advanced proposals of Gerzon's would cover a considerable listening area, the early proposals of Bell Laboratories was still quite narrow, subtending an angle of only 35° at the listening position, so this was aimed at reproduction in larger spaces, such as in cinemas, where the listeners could be at some distance from the loudspeakers.

 

AT EMI, Blumlein's aim was only to produce acoustical signals in a limited space around the head of one listener in a "stereo seat". This was intended to form an accurate virtual image of the source, by means of reproduction via two loudspeakers subtending an angle of 60° at the listening position. Blumlein’s system constitutes the basis of what is now the well established procedure known as Intensity Stereo, which held that simple level differences at the loudspeakers would create both the necessary level and phase differences at the ears of the listener to produce a stereo image. This only occurs if each ear hears both loudspeakers, which is one reason why the stereo perception via headphones of a loudspeaker-derived mix can be so different, as no such inter-aural cross-talk exists with headphones. Shufflers can go some way u resolving this headphone problem, but they can also introduce problems of their own, such as position dependent frequency responses.

 

It was, indeed, possible for Blumlein's system to produce stable images between the loudspeakers by choosing suitable level differences between the left and right loudspeakers. [We use left and right, here, because the effect is an aspect of human perception: the image supporting ability is not an inherent property of a pair of loudspeakers. The failure to fully appreciate this was one of the reasons for the failure of the many quadraphonic systems of the 1970s, where the assumption was often made, wrongly, that panning between a front / back, single sided pair of loudspeakers would produce an analogous effect, which it does not.] The Intensity Stereo system is the one which the pan-pots of most mixing consoles employ, and which must surely be used in over 99% of all current recording processes. It is the implementation of Bauer's Stereophonic Law of Sines. There is nothing limiting in the way that Non-Environment rooms present the stereo images, as the images perform exactly as one would expect them to perform, according to the way that the Intensity Stereo system was envisaged and implemented. (Incidentally, the Intensity Stereo referred to here has nothing to do with the psycho-acoustic theories claiming intensity differences to be the key factor in localisation: here it merely relates to the level differences at the loudspeakers.)

 

Much work has been done in control room design to try to expand the area in which stable stereo imaging can be achieved, and the provision of certain lateral reflexions can serve to reinforce stereo localisation. Davis referred to "Haas Kickers" [7] which are strong reflexions appearing after a suitably reflexion free period, and which help to maintain imaging. However, in many such ways, the means of supporting a wider stereo listening area are not the development of the concept of Intensity Stereo, but are psycho-acoustic "tricks" to help to extract more than the system inherently is capable of supporting. If a property is not inherent in the recording, then perhaps the enhancement techniques are best left for the listening rooms, and not the control rooms. The problem in this is that the techniques tend to come at the price of compromises that must be made in other areas of monitoring. This latter point can be disturbing, as in the term "control room monitoring" the words "monitoring" and "control" both imply some sort of reference to a standard, which can hardly be the case if varying techniques are used to support the insupportable. What is more, if the control and monitoring are not defined at the recording stage, then what standards do the domestic equipment manufacturers have to design their own products to comply with? In the "Studio Monitor System" and "Control Room" surely we must aim at some sort of tighter reference if the present unacceptably large range of end-product frequency balances are to be brought to a more repeatable equilibrium.

 

 Conflicts and Definitions

There are a number of factors in studio monitoring which directly contradict domestic hi-fi requirements. Studio Monitors are usually desired to show up flaws and problems in the sound. They have an analytical requirement that is not normally necessary when listening to music solely for pleasure. Control rooms are for quality control, as well as for assessment of compatibility with the outside world. They are also, of course, creative environments, and that is a further aspect which makes its own demands. However, in almost all cases, the quality control function is degraded when attempts are made to imitate arbitrary domestic conditions, or to artificially support the stereo image stability over a wider area than was ever envisaged when the concept was formulated. It would thus seem that the only way to control the "encode" side of the recording process is in rooms which simplify, to the greatest extent, the monitoring of the signal which is being captured by the recording medium.

 

Once there is a more reliable definition of the encode side of the system, then it gives the manufacturer of, and the listeners to, domestic hi-fi equipment a better reference from which to make their own choices and decisions, to get their desired "best" out of the recording. The wider the tolerances are at the encode side of the system, however, then the less consistent will be the ability of the reproduction systems to faithfully decode what the artistes and producers intended the listeners to hear. Arbitrarily designed control rooms do not aid the search for better standards of reproduction, because they are dependent upon far too many variables.

Toole highlighted the above point very forcibly in section 2.4 of reference [1], from which we will quote:­

"Reflections and Absorption of Sound - Effects in Time and Space. This is not a simple subject, because:

 

1) The sounds radiated from loudspeakers in different directions are not the same,

2) the frequency-dependent absorption properties of reflecting surfaces are not thesame,

3) listeners respond differently to sounds of different frequency,

4) listeners respond differently to sounds of different temporal structure, eg impulsive or sustained,

5) listeners respond differently to sound arriving at different times relative to the direct sound,

6) listeners respond differently to sounds arriving from different directions,

7) listeners respond differently to sounds in the presence of reverberation,

8) listeners have many different perceptual responses, and

9) all of the preceding interact with each other and, to some extent, with the recording that is being auditioned."

That these interrelationships exist in domestic situations is incontrovertible, but

surely, all efforts should be made to remove as many of them as possible from the

control rooms. The Non Environment approach goes a long way towards achieving the lowest realistic number of room related variables.

 

Most domestic listeners want to hear music in a way that is pleasing, which is an absolutely valid requirement as they are seeking enjoyment, and they are at liberty to manipulate the above variables to suit their own requirements. However, what is pleasing should not be confused with what is on the recording medium. Stereo spaciousness can be very pleasing, but its presence in a domestic environment, or if created in a control room of any given design, is by no means necessarily an inherent property of what is on the recording. The use of early reflexions and reverberation can increase the stereo listening area, enhance the stereo listening pleasure, and extend it beyond the normal "stereo seat" position [8], but such techniques often compromise the detection of fine detail in low level signals, which, in a monitoring situation, risks allowing problems to pass by unnoticed.

In most truly professional studios, control rooms have already tended towards being less reflective than domestic listening rooms, undoubtedly because of a number of the above mentioned reasons. Many professional recording personnel also tend to prefer a more direct sound, even when listening for pleasure, as reported by Flindell et al [9].

 

 In the paper "Subjective Evaluation of Preferred Loudspeaker Directivity" they noticed that when their listening test results were separated into groups of naive and professional listeners, the preferences of the two groups were very different. A few of the professional listeners even preferred frequency contoured reflected energy, which mimicked the conditions frequently encountered with more directional loudspeakers in many control rooms. Many of the naive listeners strongly favoured the spaciousness, and extra high frequencies in the reflected sound, which were more typical of omni-directional (or multi-directional) loudspeakers in conventional rooms. No doubt there is a considerable degree of conditioning influencing the results for the professional listeners: spending much time working in the conditions in which they do, perhaps makes them more accustomed to hearing direct sounds. On the other hand, it is equally possible that as they are accustomed to listening for detail, such habits travel home with them.

 

The record and reproduce (studio and home) ends of the recording process have always been making their different demands, and it does not logically follow that the listening environment should be the same in both situations. Again quoting from Toole's paper [1], "Strong reflected or diffused sounds from behind can seriously impair the clarity of the virtual sound images between the loudspeakers. Even at what appear to be safe distances the same can be true if reflecting or diffusing surfaces are large. A simple test is to reproduce monophonic pink noise at equal levels through both loudspeakers. For a listener on the axis of symmetry, the result should be a compact auditory image midway between the loudspeakers. Moving the head slightly to the left and right should reveal a symmetrical brightening, as the acoustical cross-talk interference is changed, and the stereo axis should "lock in" with great precision. Start close to the loudspeaker and then move further away. It would seem to be a fundamental (minimum?) requirement that one should be able to find a stereo axis, and hear a clear centre image, in any position where critical judgements are to be made.

If the new generation of cross-talk cancelling binaural and 3-D simulation systems are to be truly successful, a "clean" acoustical path to the ears may be an absolute necessity. If a listening room garbles the cross-talk itself, it will most certainly garble the cancellation."

 

In a paper in the JAES [10] in 1986, Jim Wrightson wrote "The problem in the context of studio monitoring is that, regardless of the conditions, the room-monitor loudspeaker combination places its indelible imprint on all that transpires. For this reason a control room should be neutral, it should add as few sonic colourations as possible to the sound generated by the monitor loudspeakers. In this context, poorly designed loudspeakers should exhibit their flaws; well designed loudspeakers should demonstrate their assets. The aural purpose of a control room is to provide the best possible free-air representation of the signals carried by the studio's audio system."

Surely, the above conditions are most ideally met by the rooms of the type being proposed here. In the Non-Environment type of room, the conditions for neutrality and room to room compatibility would seem to be considerably greater than for any other concept of control room currently on offer. The number of variables in Toole's list in the previous paragraphs significantly reduced.

1) Off-axis anomalies play little part in the proceedings. 2) Loudspeaker design is simplified.

3) As most reputable monitors have reasonably linear on-axis responses, the perceived difference when mixing with different monitors should be less than is all too often currently the case.

4) Reduced room decay time prevents the masking of low level detail, an important factor in the "quality control" process.

5) Reduced room decay minimises timbral colouration caused by the room. 6) Reduced room reflexions enable precise stereo imaging, albeit over an area which is a function of room size and monitoring geometry.

7) Reduced room reflexions allow the detection of unwanted phase anomalies which can result from the over use, or inappropriate use, of effects processors. 8) Minimising room effects allows the various persons in the room to perceive the same musical balance between the instruments.

9) Reduced room effect allows the clearer perception of the ambience of the recording spaces or the use of effects, and hence their appropriateness, or otherwise, to the recording.

10) Reduced room effect gives a greater possibility of working in other rooms of similar nature on a single recording project, even if the rooms are physically quite different, with a minimum of acclimatisation to the new location.

 

If the greatest price that must be paid for these advantages is a more restricted stable stereo imaging area in the smaller rooms, then it would seem to be a small price. When a mix is being built-up, the desired timbre of an instrument can sometimes need to be changed in order to avoid masking by other instruments as they are introduced. Similarly, the optimum balance between the individual instruments can change. The instrumental balance of a rhythm section may need to be adjusted as other instruments, perhaps with similar tonal content, are introduced into a mix, and have their effect on the perception of some of the rhythm instruments. Just about the only thing which is usually static during the build-up of a mix, is the localisation of instruments in i stereo panorama. Even in the smallest rooms, where the stereo imaging will be true over perhaps only the space of one seat, then that seat is always available for occasional reference. Nothing about the imaging will suddenly change due to the dynamics of the mixing process.

In very small rooms however, one should also consider the fact that the monitoring loudspeaker are often forced into positions where they cannot possibly subtend an angle of 60, or less, at th monitoring position. This, in itself, will degrade the stereo imaging stability, irrespective of th type of room in which they are being used. To subtend an angle of less than 60 in a small roor would be likely to put any mixing personnel, other than the person on the centre line, outside c the loudspeaker pair, and this situation would be less desirable, overall, than the unstable imaging produced by the greater subtended angle created by the wider spacing of the loudspeakers. Any comparisons of the stereo imaging in rooms of different design concepts should always take into account any differences in subtended loudspeaker angles, or the comparisons would be irrelevant.

Obviously, for the studios involved in the production of radio dramas, or the like, where much more movement of the sound images is likely, the order of monitoring priorities may be somewhat different. In those cases, the greater use of dynamic panning, plus the possibility of having more people involved in the mixing process, may lead to a requirement for a large listening area over which the stereo sound-stage was more stable. Perhaps this would take priority over the need for more absolute knowledge of the timbre of the sounds, however, the title of this document did state ....for Stereophonic Music Recording Studios.

Surely, it is better that if there is one thing likely to be less easy to constantly monitor, then it should be the one thing which is least likely to vary. It should also be remembered that in largt rooms, the problem does not exist, and in the low decay time rooms, the true imaging is better than in many other rooms, with all their attendant individual characteristics. Non-Environment rooms show stereo as it is recorded. If stereo is not enough, at least over two loudspeakers, the it is the format which should be criticised, not the rooms which show its failings. Surround sound systems are addressing this limitation to good effect.

Also noted in Toole's paper [1] were studies by Kuhl and Plantz [ll] and Kishinaga et al [12]. Kuhl and Plantz, using only professional sound engineers as listeners, found that for dance and popular music, plus voice and radio drama, the preferences were for monitoring what was essentially the direct sound from the loudspeakers. On the other hand, at home, the majority of these same listeners, if listening to symphonic music, preferred a more reflective environment. Kishinaga et al concluded from their investigations "that in designing a listening room, optimum arrangement of absorbing and reflecting materials differs depending on the purpose of listening Recording / quality control and listening for enjoyment are very different purposes. Toole went on to say "some recordings are clearly better matched to certain styles of reproduction than others. The situation [standardised listening conditions] would appear to be far from resolved".

Indeed so, at the "decode" or reproduction end at least, where tastes and preferences lead to different conditions for maximum enjoyment of the music. However, if these same variables are allowed to affect the encode process in the studio control room, then it only leads to chaos in trying to decode-to-taste any set of non-standard encodings. Again quoting from Toole, "In studio monitoring the general rule is to provide listeners with a sound-field that is predominantly direct. In these conditions, the principal impression of direction, image size and space are those that can be provided by the stereo signal itself'.

Surely, this is all that we can aim for in the studios. If we concentrate on what is on the tape, then the provision of a more consistently monitored product will allow the record buying public to optimise their own listening conditions to suit their own pockets and preferences. Trying to guess what these conditions may be does nothing but harm to the encode process, and leads to absurd magnifications of the problems at the decode end. This being the case, the monitoring of the stereo in the Non-Environment rooms, without any enhancement or embellishments for greater enjoyment, would seem to be ideally suited to the production of recordings to a more consistent standard of reference, which should in turn make life easier for mastering facilities and the manufacturers of domestic equipment. Whatever that equipment may seek to achieve, its design and production would be made much easier without the often unintentional variability of the recorded material, affected as it is by the vagaries of current control room monitoring.

 

 A Parallel Issue

In 1986, Stanley Lipshitz published a paper [13] on the subject of the spaciousness and airiness of different techniques of recording using spaced microphone techniques. The following quotations are taken from that paper, and many parallels can be drawn between the lack of detail and false spaciousness of spaced microphone techniques, and the loss of detail perception associated with the false spaciousness which results from anything less than direct monitoring.

On perceived spaciousness:

"I believe that spaced-microphone techniques are fundamentally flawed, although highly regarded in some quarters, and that coincident-microphone recordings are the correct way to go. The air and depth so valued in spaced-microphone recordings are shown to be largely the artefacts of phasiness due to the microphone spacing, and not acoustic ambience at all.

"I shall try to make a strong case for the use of single-point (i.e. coincident) stereophonic microphone techniques in preference to widely spaced microphone configurations.

"I am aware that I am treading on dangerous ground here, in that an aesthetic judgement is called for when attempting to rate stereophonic recording as good or bad.

"Often it is the case that the more ethereal the sound images appear, then the better the system is appreciated. Such systems can be regarded, however, only as attempts at pseudo-stereophony.

"I consider such blurring to be a defect, although I will admit that some people like soft-focus lenses. [In photography.]"

On stereo reproduction:

"The problem of freeing the listener from the stereo seat by enlarging the region within whit the image remains reasonably free from distortion, is in my view a reproduction related que: rather than one bearing directly upon the recording technique.

"If more than two transmission channels are available, one can do much better.

"For such reproduction systems (for example Ambisonics) an acoustically dead listening roc would be preferable. It is my belief that as more sophisticated reproduction systems become available, the correct trend will be toward more anechoic listening environments."

On the psycho-acoustics of stereo:

"Of primary concern is the fact that the ear on the side of the earlier loudspeaker need not receive the louder signal, and indeed at low frequencies does not! So the interaural level differences produced at low frequencies do not always reinforce the imaging produced by impulsive sounds. Sometimes, the low frequency image pulls in the opposite direction from image of the transient, broadening and smearing the overall image.

"So we must consider stereo hearing as distinct from natural hearing, and actually quite unnatural - it is in fact an artificial creation."

And, on the impact of modem recording technology:

"The last few years have seen a dramatic improvement in our ability to accurately record, distribute, and reproduce musical signals, and the benefits of this digital technology are now available to consumers in their homes.

"What is on the master tapes is now laid bare without the masking effects of the earlier technology, and what the consumer can now hear is frequently unpleasant.

"I feel that the source material [not referring to electronic music here] is now the weakest line in the chain from the artist to the listener, and that improvement here requires an enlightened reassessment of what goes on in the process of capturing the original sound and reproducing through two loudspeakers."

All of the above quotations from Stanley Lipshitz would seem to point to the need for detailed and direct monitoring as the only means of hearing into what is really on the recording medium and that spaciousness should, as discussed elsewhere in this document, be an aspect of the fu reproduction environment. For detailed monitoring however, it would appear that spacious and the resolution of fine detail are largely mutually exclusive. It should be recognised, however, that the authors may possess a sensory bias towards the more detailed types of monitoring, as they admit to having a general dislike for soft focus photography: but also, it would seem, does Stanley Lipshitz.

 

11 Reference;

Floyd E Toole "Loudspeakers and Rooms for Stereophonic Sound Reproduction", AES 8th International Conference, Washington DC, 1990 P R Newell, K R Holland, T Hidley "Control Room Reverberation is Unwanted Noise", Proc Institute of Acoustics, Vol 16, part 4, pp 365-373, Reproduced Sound 10 Conference, Windermere, UK, 1994

Philip Newell "The Non-Environment Control Room", Studio Sound, November 1991, pp 22-29

Philip Newell "Studio Monitoring Design", Focal Press, 1995

[5] Eric Stark "The Hidley Infrasound Era", Studio Sound, pages 52-56. December 1995

Michael Gerzon "Three Channels, The Future of Stereo?", Studio Sound, pp 112-121, June 1990

[7] D Davis, C Davis "Sound System Engineering", 2nd edition, Howard Sams, Indianapolis IN, USA, 1987

[8] D Moulton, M Ferralli, S Hebrock, M Pezzo "The Localization of Phantom Images in an Omni-Directional Stereophonic Loudspeaker System", AES 81st Convention, pre-print No 2371, 1986

[9] I H Flindell, A R McKenzie, H Negishi, M Jewitt, P Ward "Subjective Evaluations of Preferred Loudspeaker Directivity", AES 90th Convention, pre-print No 3076, page 6, Paris 1991

[10] Jack Wrightson "Psychoacoustic Consideration in the Design of Studio Control Rooms", JAES, Vol 34, No 10, pp 789-795, 1986

[11] W Kuhl, R Plantz "The Significance of the Diffuse Sound Radiated from Loudspeakers for the Subjective Hearing Event", Acustica, Vol 40, pp 182-190, July 1978

[12] S Kishinaga, Y Shimizu, S Ando, K Yomaguchi "On the Acoustic Design of Listening Rooms", presented at the 64th Convention of the Audio Engineering Society, pre-print No 1524, Nov 1979

[13] Stanley P Lipshitz, Stereo Microphone Techniques ... Are the Purists Wrong? Journal of the Audio Engineering Society, Vol 34, No 9, pp 716-735 (September 1986)

 

 

Figure 5 Step Response of Monitor Loudspeaker at 2m in Small Non-Environment  Figure 6 Step Response

of Monitor Loudspeaker at 2m in Large Non-Environment Room

 

 

 

 

end