Paul White goes beyond the final frontier with Roland's latest 3D sound processor.
If you read our QSound interview in SOS November 1995, you will know that there's currently a great deal of interest in the methods used to produce a 3‑dimensional soundspace from a conventional 2‑speaker stereo replay system. The basic idea is intriguing in that when you first think about it, it seems impossible to create a sound that you perceive as being behind you, when it has come from speakers positioned in front of you. In real life, we hear everything through just one pair of ears, yet we can pinpoint the position of sounds all around us, both in front and behind.
The way that we perceive sound direction is central to understanding how to fake it electronically, and unlike the simple pan pot used in mixers, which affects only left/right level, we rely on a number of different auditory clues, involving level, frequency content and phase, to deduce the direction from which a sound is arriving. If a sound comes from our left, we hear it arrive at the left ear first, and then a millisecond or so later, it arrives at the right ear. However, as the head is between the sound source and the right ear, the perceived sound is lower in level, and is also spectrally altered by the masking effect of the head which acts as a high‑cut filter. The shape of the outer ear is also involved, but as far as the eardrum/brain system is concerned, everything is deduced from the relative phase, level and frequency spectrum of the signals arriving at the two eardrums.
Following on from this, it seems reasonable that you could stick two mics inside the ears of a dummy head, make a recording, and then play the sound back through headphones to recreate the original 3D sound experience. In fact, this form of binaural recording does work pretty well, but when played back over loudspeakers, the results are not so good. The reason is that with headphones, there is no crosstalk between the two ears, but with loudspeakers, both ears hear both speakers — in other words, there's a great deal of crosstalk.
Roland's approach to achieving 3D over loudspeakers is to use phase, delay and filtering to process a mono sound into a binaural format, then generate additional out‑of‑phase signals to cancel out the crosstalk that normally occurs. By manipulating the relative left/right phase, filtering and levels in real time, the sound source can be made to appear to move, and in good listening conditions, it is possible to create the effect of a sound moving around the listener. Vertical sound movement is also possible by modelling the tonal changes that occur when a sound is moved further away from an acoustically reflective floor. The way in which the processing works is broadly similar to QSound, the main difference being that the QSound system is based on empirical experiments involving panels of listeners, rather than on predicting the effects entirely by mathematics.
The problem with all 3D sound systems is that they have poor mono compatibility, and as the Roland system includes a large amount of relative phase shifting, summing the sound to mono produces tonal changes as the sound moves. I can't see any way around this — basically, real life isn't mono‑compatible, and as long as we're tied to mono compatibility issues for the benefit of steam radio and mono TV listeners, 3D sound, and even conventional stereo, can never be properly explored. In practical terms though, the need for mono compatibility restricts the use of systems such as RSS to non‑vital parts of an audio mix, such as effects, additional percussion, glittery synth pads and so on. Drums, bass, lead instruments and vocals tend to be left unprocessed. Even so, by treating just a few select sounds, a mix can be widened to beyond the loudspeakers, and can be given greater depth.
The original RSS system was horrendously expensive, and the only spin‑offs, until recently, have been the RSS algorithms used in some Roland effects units, but now the RSS10 makes the full process available to studio users for a much more realistic price. In addition to processing sounds for loudspeaker reproduction, the RSS10 can also output signals in binaural format for headphone use, and I imagine this will be useful for developers of portable computer games designed to be used with headphones rather than loudspeakers.
In addition to processing sounds for loudspeaker reproduction, the RSS10 can also output signals in binaural format for headphone use...
Whereas the original RSS system provided four channels of dynamic processing, the RSS10 provides single‑channel operation where the sound source is required to move, or dual‑channel operation where the sound sources are to be fixed in one position. The unit may be controlled directly from MIDI using an optional hardware interface, such as Roland's own MCR8, and up to 16 RSS10s may be linked together to operate as a single system. Unfortunately, no in‑depth MIDI spec is provided, so if you want to use another interface, you'll have to hassle Roland for the SysEx details. Alternatively, computer control is possible from both Mac and PC computers, and software for both platforms is included in the basic package.
On the hardware front, the unit is a simple 1U box with the bare minimum of controls. Concentric knobs set the input gain for the two channels, and metering is provided for both inputs and outputs — a good move as the output pan and level values will obviously differ from those of the input when a sound is being moved. A handful of buttons select the various operational modes and can initiate a demo sequence to get some idea of what RSS is all about. There is also a button to set the device ID, which is necessary when two or more RSS10s are being used in the same system. A large perspex window includes the metering, a simple numeric display window, and illuminated Function Mode and Output Mode indicators.
The system has three basic function modes: Stationary, Flying and Transaural. Stationary mode, as its name suggests, is where you take an input signal and then process it to make it appear as though it's coming from a particular point in space. Each RSS10 can process two channels in this mode, so with the maximum 16 units, you can have 32‑channel operation for around the cost of a mid‑range Mercedes.
Flying mode is rather more fun, as it allows you to vary the 3D position of the sound during the course of a mix, but as this takes more processing power, a single RSS10 can only handle one channel at a time. What's more, you can only link up to four units to work together in this mode via the supplied software, which gives you a system roughly equivalent to the original RSS system for less than one third of the price.
Transaural mode isn't an effect as such, but rather a means of processing binaural recordings made using dummy head techniques so that they can be reproduced over loudspeakers. I would imagine that this mode simply generates the required inter‑channel crosstalk cancellation signals required to make binaural recordings work on speakers.
This brings us neatly onto the three output modes: Speaker, Headphone and Binaural. Speaker mode is used for normal loudspeaker playback, but there's a Headphone option which appears to be an enhanced version of the Binaural mode. Binaural mode allows signals to be output as conventional binaural signals, which is useful for those wishing to produce versions both for loudspeaker and headphone use. By recording the output in Binaural format, it can be reprocessed later via the RSS10 to provide either Speaker or Headphone formats, but if you record the output in Loudspeaker format only, you have no way of going back to create a Headphone version.
On the back of the box are audio ins and outs in both balanced XLR and quarter‑inch jack formats, though the XLRs are wired to the pin 3 Hot standard, rather than the more usual pin 2 Hot. This is no problem if you're working balanced both in and out, but if you mix balanced and unbalanced wiring, you could end up with unexpected phase inversions, unless you make up special cables. A mini multipin connector is used to connect to a computer serial port for computer control, and a slide switch selects Mac or PC operation, though you have to find your own serial cable. MIDI In, Out and Thru sockets are also provided for external MIDI control and for outputting MIDI notes.
Being a Mac owner, I opted to check out the system using the included Mac software, which comprises two separate programs — RSS FX for moving sounds, and RSS Stage for fixed‑position sounds. To get the Mac to talk to the hardware, you have to install Apple's rather unpopular MIDI Manager (included), which can cause problems with some MIDI interfaces, and precludes the running of most software sequencers while the RSS10 is operational.
RSS Stage allows you to move the sound source around as you're setting up, so that you can hear the effect in different positions, but unfortunately, the sound is muted while you're actually moving the source. Three head icons are used to represent the three possible axes of sound movement, and you can select any one before moving the sound in that plane. Once you've positioned the sound, you can play with the virtual room parameters, which include room size and room reverb characteristics. A similar set of parameters is available in the RSS FX package, and as far as I can tell, the early reflections are recalculated according to the room size and the sound position, so no user‑intervention is required — or, indeed, permitted. In the RSS FX software, the early reflections change as the sound moves, simulating what would happen in a real environment.
In the RSS FX environment, the sound movement is 'drawn' in real time in a window containing a graphic representation of the virtual room, and then stored as a 'Phrase'. A MIDI note (or chord) may also be stored as part of a Phrase, allowing external sampler or synth sounds to be triggered directly from the RSS10. The maximum length of a phrase seems to be determined by how fast and how far the sound is moved, but this is no real restriction, as several phrases can be used in succession by putting them onto a time line in the Sequence window. However, there is no moving position indicator to show you where your virtual sound is supposed to be coming from, which seems a crazy omission.
The start time of each Phrase can be specified in a sequence to within 50ms, and the whole sequence can be run from the RSS10's internal clock, or be externally sync'd to a sequencer using any MTC format. Phrases are not allowed to overlap, but then that wouldn't make sense anyway. Roland include a library of Phrases ready for call‑up, or you can add your own Phrases to the library for future use. Various editing features are provided to allow you to modify the path you've drawn in for a sound source, but I could find no control to speed up or slow down Phrases, and no control to allow me to loop Phrases continuously — a feature which might have been useful, especially while setting up.
The manuals lack a proper introduction — they dive straight in, telling you about windows and Phrases without actually introducing you to the product. A few lines to the effect that this is an XYZ and it's supposed to allow you to do A, B and C would have helped tremendously, especially as the manual is obviously translated from Japanese by the Japanese. Fortunately, once you've used the included Patchbay software to get your Mac talking to the RSS10, via either the modem or printer port, it isn't too hard to work out what's going on.
I've heard countless demos of RSS, and the results have varied from 'What effect?', to 'Wow, that nearly hit me!'. You need to be seated the correct distance from the speakers, as close to the centre line as possible, and it helps if the wall behind you is fairly dead. (Side wall reflections also tend to dilute the effect.) As a very general rule of thumb, any sound which is strongly affected when you pass it through a flanger works well with RSS, while single‑frequency sounds tend to yield less dramatic results. Harmonically rich pad sounds wander around the room quite nicely, though how the effect is perceived when you try to move it behind you varies from listener to listener and from sound to sound.
Surprisingly, I found the effect far more dramatic when the reverb level was mixed right down, or turned off altogether, and it also seemed more pronounced when the Doppler shift option was bypassed. If you draw a sound path that passes close to your head, the sound really looms out of the speakers at you. What doesn't work so well is trying to position a non‑moving sound behind you — in my experience, unless the sound keeps moving, the effect of being behind you collapses and the stereo image returns to the front of the room. Stationary sounds seem to work well up to about twice the spacing of the monitors, but after that no‑one hears the same thing.
Does RSS change the sound? The short answer has to be yes. In real life, sounds change in spectral content quite dramatically as they are perceived from different angles, but as this has happened to us every day since we were born, we don't even notice it. Put the effect on record though, where there's a different set of expectations, and the underlying flanging effect can stick out like a sore thumb. In the context of effects and sonic 'icing sugar', this doesn't matter too much, but if you try to process a lead vocal, it can sound odd — and If you listen to the result in mono, it can sound as though somebody has gone wild with a flanger and an EQ unit!
The only problem I experienced using the RSS10 was that the Mac disk drive interfered with the audio — you also have to take care to run the audio cables well away from the computer monitor, otherwise a bit of buzz may be picked up, especially if they're unbalanced.
Too much recorded music is mechanistic these days, so a little untamed magic is to be welcomed.
Even at this new low cost, RSS is still a very expensive luxury, but in a business where you have to have an edge to succeed, RSS could be your answer, especially if you compose music for games or AV presentations. Although I find the RSS effect somewhat unpredictable, it is nearly always more interesting than listening to the same sound without processing, so perhaps it doesn't matter whether the sound goes around your head or through it. If you're working to picture, as you might be in a computer game environment, then the visual stimulus helps to keep the audio positioning stable — there's a lot of psycho in psychoacoustics! If anything, visual cues are even more important when you try to move the sound source up or down, because without the visuals, you can hear a change in timbre, but you can't really pinpoint the vertical angle of the source.
The inclusion of both Mac and PC software is good news, although I feel the software could have been thought‑out and documented a little better. The provision of MIDI control is obviously welcome, but why the full MIDI details aren't provided in the manual is beyond me.
Too many people seem happy to knock 3D sound systems because they don't provide the same effect as a true multi‑speaker surround system, but the reality is that most people don't want rear speakers in their homes. Maybe Dolby surround TV will change all that, and who knows, maybe all CDs in the future will be mixed in Dolby surround, or some similar format, but getting back to the present, it's a case of making do with what we have, or experimenting with the likes of RSS or Qsound to push back the boundaries of conventional 2‑speaker stereo. I tend to look upon 3D sound as a bit of an adventure, and if its unpredictability makes it more of an art than a science, then I guess I don't mind that either. Too much recorded music is mechanistic these days, so a little untamed magic is to be welcomed.
What about creating RSS‑treated samples? In theory, there's nothing to stop you creating samples using RSS processing, and providing you have a stereo sampler, you'll be able to play them back through normal equipment and still retain the 3D effect. There is one drawback though — if you transpose the sound from its original pitch, the virtual room size, virtual head width, and everything else will change too, and in most cases, that means the effect will fall apart once you've moved more than a semitone or two from the starting point. RSS samples are therefore best limited to things such as sound effects and percussive noises that can be used at their originally sampled pitch.
The RSS10 offers rather more control over position than the original RSS system, as room reflections and reverb can now be modelled. Changes in early reflections provide clues as to sound movement, and this should, in theory, make the whole effect more believable.
The Distance parameter sets the distance from the listener to the virtual sound source, while Elevation determines how far below or above the listener the sound originates. Azimuth sets the angle of the sound source in the horizontal plane relative to the listener, and various reverb parameters are provided, including room size and the reflective characteristics of the walls and floor.
The closer you move toward a sound, the louder it grows, so if you were to move a sound right into your ear, it would, in theory, become infinitely loud. This is clearly not desirable for real‑world applications, so a Clipping Area parameter has been introduced to set a limit as to how loud a sound can get when it's brought near. Another nice addition is the inclusion of Absolute and Relative delay modes, so that you can choose whether a moving sound undergoes Doppler shift or not. If you want Doppler shift, then select Absolute mode.
Tucked away in the control software is also a section that relates to the listening setup and allows you to optimise the output for specific speaker angles. While this is obviously useful in some instances, the practicality of the situation is that you have no control over the type of listening system the end user will have, and the variability of listening environments is the real Achilles heel of any such system.
- A unique and often spectacular process.
- Far less costly than the original.
- Can be expanded by adding more RSS10s.
- Mac and PC software included.
- Only one channel of moving sound can be processed per unit, which, for most of us, means processing sounds to tape one at a time.
- Software manuals need to be more explicit.
- Poor mono compatibility — if that matters to you.
- Insufficient information provided for users to arrange their own MIDI control.