Tridimensional Reproduction of Sound – Pure Stereo 3D audio™

Slightly out of the line here, nevertheless it might get us thinking how do we want to listen. Are we seeking for realistic sound experiences?

This article was written in 2011 for the Acoustics course in the final year of my graduation. It is a theoretical explanation of the reproduction system developed recently by Edgar Choueiri. I’ve been through the published scientific papers trying to understand the reasoning of this process’s development instead of just being overwhelmed by its results which everyone can happily have a tiny, but valuable, experience here:

Introduction

Pure Stereo 3D Audio™, under Prof. Edgar Choueiri, in 3D Audio and Applied Accoustics Lab in Princeton’s University, is based on the application of BACCH filters (a XTC’s development)  in the fairly common reproduction stereo systems, providing the listener with a three-dimensional sonorous experience.

Through normal speakers the obtained audio image allows the listener the clear identification of individual elements localization – a single instrument on an orchestra or a moving object.

Prof. Choueiri distinguishes this format from the surround in that the latter restricts to the speakers’ localization not allowing, for example, the perception of a sound source coming towards the listener.

The basis of this technology has its roots, of course, in the way human beings percept sonorous phenomena.

Recording techniques

Theoretically, all stereo recordings (2.0) have what it takes to be represented in this system through the application of BACCH filters. Ideally, binaural recordings, whose recording physical configurations are similar to the way we listen, will have a more faithful three-dimensional reproduction. We then conclude that there are no restrictive conditions to the recording process.

Reproduction Techniques and Guide Lines

On common stereo reproduction systems the right speaker has the necessary cues for the right ear, and the left speaker works in the same way for the left side. Considering so, why is the audition corrupted? The left ear receives the audio signals from the right speaker and vice-versa: this is the crosstalk phenomena.

The reproduction system developed by Prof. Edgar Choueiri implies the use of a filter (BACCH) [10] which works as a wall between the two speakers, dividing the audio signal of each and route it to the respective ear. This method doesn’t add any other information to the signal – it actually subtracts the artifacts that emanate in the common reproduction with two speakers.

The required processor operates @ 96 kHz with 64 bit rate, followed by a D/A 24 bit conversion in such way that it doesn’t degrade the audio quality and dynamic range. It is presently capable of operating @ 192 kHz processing.

Necessary Equipment

Processor: digital audio unit, containing a Linux computer with a dedicated and stable operative system coupled with a converser system (D/D, D/A, A/D). To the user, operating this is quite simple as its interface is just an on/off button and a customization for the number of users and relative positioning (filter variations).

The functions of this processor are essentially corrective of the phenomena created by the crosstalk: it corrects the comb-filtering, the frequency response at the ears of the listener, the spectral and temporal responses; it compensates for the individuals physical features of the listener (head and torso) that might affect the fidelity of spatial reproduction [11]. Thus, we may conclude there is no colouration added to the original sound and that this filter can be applied in every stereo recording.

Implementation in consumer hi-fi equipment

Presently, this system is not implemented in consumer environment for it requires very precise customization in situ.

Acoustic measurements must be proceeded to the consumer’s audio system, his speakers, listening room, his physical features (head, torso, ears) conducted by an expert, with specific hardware and software developed for the Pure Stereo™ implementation.

The acoustic engineer then places a series of instruments in the listener’s audio chain’s equipment. He will ask the listener to sit at his favourite position and place in the room. Two miniature microphones are placed into his ears. The new chain of equipment emits a series of tones, of which response in the microphones is recorded with high definition. This signal contains all the necessary acoustic information about the listening room, speakers and physical features of the listener. Then, the filter (a 32-bit file) is loaded into the chain equipment. It may be installed more than one filter if it is pretended to add more simultaneous listeners or to simply add another listening spot. Also, there are universal filters to be loaded – u-BACCH – created by measuring the responses on a dummyhead. These filters may also be used, but they lack the individual characteristics of the listening room, of the audio system and those of the listener.

Ideal speakers and its configuration

Pure Stereo™ works in virtually any system although speakers with higher directivity cover a 3D image with grater perfectionism – deepness and spatiality of the original signals, once the emitted information is more likely to be influenced by rooms with poor or no acoustic treatment, generating reflections that degrade the signals. The recommended configuration is that of a dipole, being the speakers distance 50 cm from each other: the 3D image then is less vulnerable to the movements of the head and torso.

Pure Stereo’s Sweet Spot

As mentioned before, Pure Stereo requires a sweet spot, and it is also possible to have more than one at the same time. The listeners may be sitting next to each other. If the listener moves a few feet away the 3D image collapses. In the sweet spot though, the listener ceases to precept the image source coming from the loudspeakers.

Optimized crosstalk cancelation – BACCH filters modulation

Sound colouration caused by a XTC filter consists of peaks in the frequency spectrum, typically exceeding 30 dBs even in the sweet spot.

The idealized crosstalk cancellation should be infinite. It requires that sound pressure at each ear may only be received by its respective source.

For the creation of formulas that would allow optimal crosstalk cancellation it was assumed an idealized model of sound propagation containing no reflections or diffractions related to the listener’s head and ears.

Assuming two punctual sources, their configuration is illustrated below.

For a better understanding of the theoretical construction of BACCH filter’s description, we explain the elements given above:

DL e DR: audio signal
PL e PR: pressure at the ears, left and right
l1 e l2: trajectory length between each source and ipsilateral and contralateral ears
VL e VR: sources’ vectors, given by: v = [VL(iω), Vr(iω)].
H: filter’s matrix
C: system’s transfer matrix
R: performance matrix

First, it is necessary to determine the mathematical formulation of the transformation matrix. This assumption refers to the transformation of the signal through the filter H, considering the variables of v, that travel from the source to the ears, with pressure p.

In which, R is the performance matrix.

Next, a series of matrices are defined and these allow the evaluation of the spectral colouration added by the XTC filters – amplitude, frequency spectrum and the system’s frequency response. Below is the matrix that compares different types of XTC filters, χ:

Esi = amplitude spectrum (to factor α) of a signal, coming from a loudspeaker, reaching the ipsilateral ear.

Si = lateral image, related to the incoming signal

|| = ipsilateral ear, related to the incoming signal

There are eight metrics that make up the matrix presented above:

Esi||, Esiχ, Eci, Ssi||, Ssiχ, Sci, S, χ

Each one represent functions of frequency, and through them the spectral colouration and XTC performance are evaluated and compared.

Esiχ: lateral image frequency response at the contralateral ear.

Eci: frequency response of the system at each year, being the same audio signal divided equally between the two inputs. “ci” refers to the central image.

S: source’s frequency response

Correction of the artifacts generated by XTC

Once the crosstalk cancellation is acknowledged, corrections in time domain, frequency domain (constant parameters regularization) are performed.

It was demonstrated that, considering the two source model in free field XTC systems, based in the HRTF,  there is a reduction of peaks mentioned above, but, on the other hand, produces a roll-off of the bass content and also generates high frequency artifacts at the filter’s systems response. It was also conclude that the constant regularization acts in discrete and distant frequencies in the spectrum.

The ideal optimization is achieved through based frequency regularization, which requires the audio spectrum to be divided in a band frequency hierarchy. Through mathematical calculations, three groups of frequency bands, in which one of them will be the full optimized filter. The analytical deduction of this solution relies in the most typical listening scenarios. From the three frequency bands, two are regularized and one is not (perfect filter). Above 6 kHz XTC filters are considered unnecessary, being the frequency cut [10].

Filter application

The kind of strategy conceived depends on it’s end.

The first approach is about the maximum levels of colouration to tolerate (dB). It takes in count the reproduction restrictions: room reflections, loudspeakers features and their distance to the listener. In what concerns to audiophiles, it is estimated that this level should not exceed 5 dB. If it is home theatre system, a higher level of colouration is tolerated, for there’s a higher level of headroom due to the reproduction surround special effects with two loudspeakers.

Angular distance between the loudspeakers

This value, although indicated at 60º, is not the only possibility and can be a variable in the filter’s design: once that tc (time that takes to a sound wave to travel l) varies with the loudspeakers’ span, the limits of the frequency bands can be modify, altering this configuration: thus, pointing a fixed value for ϴ, ϴ*, the maximum limit frequency of the second band frequency (considering the hierarchy system), it is possible to coincide with a cut off frequency, from which the XTC filter is no longer necessary.

Already demonstrated by Kirkeby, the stereo dipole configuration, with 10º of loudspeaker span, proves to be ideal for it allows higher resistance to head movements, and thus enlarging the sweet spot. This is easily shown once the trajectory Δl in a minor span is comparatively small to the head and torso movements.

——

Consulted sources:

[1] B. Merlier, Space and Music – The Stockholm Music Acoustics Conference, 1993
[2] http://www.infopedia.pt/lingua-portuguesa/aura, 04-05-2011
[3] M. Vorländer: Auralization. Springer-Verlag, Berlin, 2008
[4] Henrique, L. Acústica Musical, Fundação Calouste Gulbenkian, Lisboa, 1ª edição, 2002
[5] Reference to the DIVA and EVE homepage. URL: http://eve.hut.fi/, 04-05-2011
[6] http://orbi.ulg.ac.be/handle/2268/33822, 04-05-11
[7] V. Pulkki: Virtual sound source positioning using vector base amplitude panning. Journal of the Audio Engineering Society 45 (1997), 456-466
[8] V. Pulkki, T. Lokki: Creating auditory displays with multiple loudspeakers using VBAP: A case study with DIVA project. International Conference on Auditory Displays, ICAD, Glasgow, 1998
[9] http://www.princeton.edu/3D3A/PureStereo/Pure_Stereose2.html#x4-20002, 09-05-2011
[10] BACCH Filters: Optimized Crosstalk Cancellation for 3D Audio over two loudspeakers:
http://www.princeton.edu/3D3A/Publications/BACCHPaperV4d.pdf, 06-06-2011
[11] http://www.princeton.edu/3D3A/PureStereo/Pure_Stereose4.html#x10-40004, 06-06-2011
[12] http://kom.aau.dk/~pr/undervisning/inverse-filtering/mm3/stereo%20dipole-kirkeby.pdf, 27-06-2011
[13] http://www.isvr.soton.ac.uk/FDAG/VAP/html/sd.html, 27-06-2011

One thought on “Tridimensional Reproduction of Sound – Pure Stereo 3D audio™

Leave a comment