The Sonic Demands of Virtual Reality

With the current mainstream adoption of virtual reality technology, there has become an increased demand for a system that allows the accurate spatialisation of multiple sound sources in three dimensions. It must allow for head movement to ensure that sound sources move realistically relative to the player. Such a system must also be scalable in order that it does not limit the number of sounds that can be spatialised, and also computationally efficient. This article will examine a particular implementation of ambisonics that allows for all of these criteria to be met.

Ambisonics – An Introduction

Ambisonics is a method of creating three-dimensional audio playback via a matrix of loudspeakers. It works by reproducing or synthesising a sound field in the way it would be experienced by a listener, with multiple sounds travelling in different directions. A sound field can be thought of as the superposition of an infinite number of plane waves, meaning that theoretically any sound field can be recreated via an infinite number of loudspeakers placed in a sphere around the listener. In practice a close approximation of the sound field can be reproduced using a finite number of loudspeakers. This differs from conventional surround systems such as 5.1 in that each of the channels present in the ambisonic B Format are not routed directly to a discrete loudspeaker. B Format signals are an encoded representation of the sound field that can be decoded for playback on a speaker array of any size, providing that there are at least as many speakers as there are channels in the B Format audio stream. The larger the number of loudspeakers in the array, the more accurately the sound field is reproduced. Mono, stereo, and 5.1 mixes are also easily decoded for conventional reproduction systems (Noisternig et al 2003, p. 1).

Ambisonics In Interactive Games

The Playstation 3 release of Colin McRae DiRT (Codemasters 2007) utilised ambisonics because of its ability to be easily decoded for any type of reproduction from mono to multichannel surround. This allowed the developers to create a single ambisonic mix of the game that could would be reliably reproduced on all systems from televisions to 7.1 surround (Horsburgh et al 2011 p. 3). In spite of its flexibility in this regard ambisonics has failed to find widespread adoption, however this may be about to change with the mainstream adoption of virtual reality technology.

Ambisonics In Virtual Reality

Loudspeaker arrays are impractical for virtual reality systems. In practice the sound field is only accurately created in a small “sweet spot” in the centre of the array, with the sweet spot being limited to a few centimetres at high frequency (Frank et al 2015 p. 1). Requiring the player to keep their head in a very specific area during VR play would obviously compromise their experience to such an extent that it would render loudspeaker reproduction useless. Instead, a system was proposed by Noisternig et al (2003) that allowed for ambisonic reproduction over headphones using head-related transfer functions.

Head-Related Transfer Functions

Blauert (2001 p. 373) writes: “ The external ears impose linear distortions on the incoming signals, which, in each case, are specific for the direction of incidence of the sound wave and the source distance. In this way, spatial information is encoded into the signals that are received by the eardrums.” A head-related transfer function (HRTF) is is essentially a filter that recreates these distortions for a sound emanating from a location relative to the listener. By applying a HRTF to a monophonic sound, the sound can be made to appear to emanate from the same location. This method works for static audio sources, but if head movement is to be incorporated for convincing three-dimensional audio then multiple HRTFs must be used and interpolated between (Hartung et al 1999 p.1). This swiftly becomes highly computationally expensive, as each individual sound source must have the relevant HRTF applied dynamically. With interactive games often featuring large numbers of sounds playing simultaneously, this approach requires that the number of spatialised sounds be strictly limited to avoid placing too much demand on the CPU. This is not ideal, especially for virtual reality where the player expectation may be that every sound present in the game should be appropriately spatialised.

Ambisonics with HRTFs

This is where an ambisonic approach to three dimensional spatial audio presents significant advantages. The system proposed by Noisternig et al in 2003 creates a virtual ambisonic loudspeaker array around the player via headphones. Each virtual loudspeaker requires one HRTF, and due to the fact that the virtual array moves with the player’s headphones there is no interpolation required. This means that the number of HRTFs remains fixed regardless of the number of sounds that require spatialisation. Allowing player head movement becomes very simple also, requiring only that the ambisonic field be rotated around the virtual speaker array in opposition to the player’s head movements. (Noisternig et al 2003, p. 4). This has the added benefit that the player’s head is always positioned exactly within the sweet spot at the centre of the virtual array, regardless of head movement. This approach was implemented by Google for their VR SDK (Kammerl et al 2016) and later their Resonance Audio SDK (Google 2017).

Conclusion

Evidently ambisonics provides a viable solution for spatialising sound for VR, however there are many other factors that still warrant investigation. First and foremost everybody has a differently shaped head and ears, meaning that one set of HRTFs will not suffice for all players. One common issue with mis-matched HRTFs is confusion between signals coming from the front and rear (Thresh et al 2017, p. 6). This confusion would be highly disorienting for the player, and so a solution must be found. A calibration session before the game begins could ascertain the closest matching HRTFs for the player from a much larger data set, or some method of capturing the player’s HRTFs would have to be implemented. Of the two the former seems far more practical, although would still be prone to error. The issues of realistic occlusion, reverberation, proximity effect and other acoustic phenomena will also need to be addressed in order to create a truly realistic three-dimensional audio experience. As a starting point for three dimensional audio, ambisonics appears to be a robust and efficient solution.

BIBLIOGRAPHY

Blauert, Jens, 2001: Spatial Hearing: The Psychophysics of Human Sound Localization – The MIT Press, Cambridge MA

Frank, Matthias et al, 2015: Producing 3D Audio in Ambisonics – AES 57th international conference, Hollywood.

Google Inc. 2017: Resonance Audio: Fundamental Concepts – Online Resource [Available at: https://developers.google.com/resonance-audio/discover/concepts] Accessed 28/02/18.

Hartung, Klaus et al, 1999: Comparisons of Different Methods For the Interpolation of Head-Related Transfer Functions AES 16th International Conference, Finland.

Horsburgh et al, 2011: A perspective on the Adoption of Ambisonics for Games – AES 41st International Conference, London.

Kammerl, Julius et al, 2016: Spatial Audio and Immersion: VR’s Second Sense – Online Resource [Available at:https://www.youtube.com/watch?v=Na4DYI-WjlI&feature=youtu.be] Accessed 28/02/18.

Thresh, Lewis et al, 2017: A Direct Comparison of Localisation Performance When Using First, Third and Fifth Order Ambisonics For Real Loudspeaker And Virtual Loudspeaker Rendering – AES 143rd Convention, New York.