<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study on an E cient Spatialisation Technique for Near-Field Sound in Video Games</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Department of Software Engineering</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Madrid (Spain) manuel.lopez.ibanez@ucm.es</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>email@federicopeinado.com http://nil.fdi.ucm.es</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Research Center National Institute of Advanced Industrial Science and Technology (AIST) 2-3-26 Aomi</institution>
          ,
          <addr-line>Koto-ku, Tokyo 135-0064</addr-line>
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents a simple and e cient method for spatialising sound in virtual environments by adding low pass lters (LPF) to the already widespread panning and attenuation techniques. Through two di erent experiments, variations in subject performance when locating sounds in a virtual environment between regular 3D audio from popular game engines (Unreal Engine and Unity) and our proposed sound system were evaluated. The rst experiment consists of an audio-only test via an online survey, whereas the second experiment employs a minimalistic 3D video game which allows for user interaction guided by sound. Results of both experiments suggest better performance and accuracy when using LPFs, the second one nding a signi cant di erence when comparing both techniques. We conclude that the LPF technique, as a mean for spatialisation of sounds coming from behind the subject, could be applied to complement current audio systems due to their performance-oriented nature and their good results with real users.</p>
      </abstract>
      <kwd-group>
        <kwd>Acoustics</kwd>
        <kwd>3D Audio tion</kwd>
        <kwd>Entertainment Technology</kwd>
        <kwd>Directional Sound</kwd>
        <kwd>Spatial Atten-</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        3D sound for video games has not been a particularly fertile eld in the past
years due to the continued use of traditional spatialisation techniques [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
a relative lack of attention from both players and developers of virtual
environments. However, the popularization of Virtual Reality (VR) has brought an
increasing interest in improving sound systems for video games [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], so as to
achieve levels of realism and presence [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that were previously out of reach. This
new wave of sound technologies for video games has generated interesting
initiatives, such as Steam Audio1, which try to go beyond Head Related Transfer
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 https://valvesoftware.github.io/steam-audio/</title>
      <p>Functions (HRTFs) and take into account in-game geometry and materials to
simulate auditory spaces. Yet, these systems focus mainly on realism, not
necessarily on usability. Our intention with this study is the opposite: to focus
on gameplay, improving player orientation and task performance, even if that
means sacri cing realism.</p>
      <p>
        A good example of how the exclusive use of a complex and realistic sound
system can create gameplay problems is the recent addition of HRTFs to Valve's
rst person shooter (FPS) game Counter Strike: Global O ensive2. In this game,
players are able to move their avatars' heads at a much higher speed than in the
real world, which, together with a more realistic sound system that adds a delay
to audio propagation, can generate confusion when trying to quickly locate sound
sources, as the perceived delay is not consistent with HRTF generation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. That
is: in this situation audio changes are slower than player movements. Our take
on this problem is to propose an audio system that is not completely faithful to
reality, but gives enough clues to allow players to quickly learn where sounds are
located in virtual space. Besides, it works more e ciently than an HRTF-based
system (in terms of computational e ciency), as it only needs to track if an
object is not being rendered by the in-game camera, so as to decide when to
apply an LPF to the sounds emitted by it, as it will be explained later.
      </p>
      <p>
        Through this paper, we will propose a simple sound technique, based on
LPFs, which tries to balance realism and usability, while aiming to achieve
accurate sound source identi cation for all users. Our main intention is to allow
players to have better performance when identifying sounds coming from
behind, which constitutes one of the currently most important challenges in 3D
and surround sound generation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This is achieved through a very brief
learning process. Using our method, users compare sound sources they can see with
sound sources they cannot see, both being applied a di erent audio processing.
Subjects easily identify what type of sounds are meant to come from behind, and
which come from the front, just by comparison or by a process of elimination,
and after just a few seconds of training.
      </p>
      <p>The structure of the present article is as follows: First, we will review the
state of the art in 3D sound spatialisation and HRTFs; in the next two sections,
we will state our goals and the experiments we designed to reach them; after
that, we will compare our initial hypothesis to the results achieved, interpreting
and discussing the data; nally, we will end with some brief conclusions about
applicability of our system and future lines of work.</p>
      <p>Goals. The goals of the present research are the following:
{ To explore how sound spatialisation techniques work by default in two of the
most commonly used game engines (Unreal Engine and Unity Engine), and
try to improve them.
{ To identify an e cient and simple method for spatialising sound, which could
be used together with other, more complex approaches.</p>
    </sec>
    <sec id="sec-3">
      <title>2 www.counter-strike.net</title>
      <p>{ To study possible di erences in accuracy when users try to identify a sound
coming from the rear with and without LPF applied to it.
{ To test the performance of our proposal with real users locating sounds in a
virtual environment, using a pragmatic approach, instead of focusing on the
level of realism achieved.
2</p>
      <sec id="sec-3-1">
        <title>3D Sound Spatialisation</title>
        <p>
          It is well known that the key component in the process of spatialising sound in a
3D virtual environment is being able to simulate sound direction and sound
distance [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In this subsection we focus on two di erent techniques: Head-Related
Transfer Functions and LPFs.
2.1
        </p>
        <sec id="sec-3-1-1">
          <title>Head-Related Transfer Functions</title>
          <p>
            Currently, the most used technique to capture sounds that include information
about their direction is through Head-Related Transfer Functions (HRTFs) [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
HRTF capture consists, essentially, of recording sound as it would have been
heard by an individual. To achieve that, a set of microphones is placed in front
of both ears, in an attempt to capture sounds from all relevant directions. The
subject used for capturing can either be a human with microphones attached
to the head or a dummy speci cally designed for that purpose. For example,
the KEMAR HRTF database [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] used a synthetic head to recreate the hearing
capacities of a human. Other commonly used databases include: LISTEN HRTF
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], CIPIC HRTF [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], FIU DSP Lab HRTF [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] and ARI HRTF [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
          <p>The information captured during the recording process can later be used
to create impulse responses, which can be attached to game audio by using a
convolution reverberation plugin. The result is a processed sound that re ects
the physical properties of the environment in which the impulse was recorded,
and contains information about sound direction. The combination of HRTFs and
reverberation e ects allows for accurate 3D sound placement in a multichannel
environment, and is currently the most commonly used technique in realistic
audio content creation.
2.2</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Low Pass Filters</title>
          <p>Another common method for spatialising sound, and the one we chose for
developing our technique, is the use of LPFs. These audio lters are linear and
time-invariant; they emulate, by cutting o frequencies above a chosen number
of hertzs (Hz), a common phenomenon of real-life sound: the dissipation of high
frequencies in a way that depends on the substance the waves travel through and
the distance from the listener at which they originate. A bigger portion of high
frequencies are cut if sound travels through a more dissipating environment or
from far away, and emulating this can give virtual environment's sounds depth
and credibility.</p>
          <p>
            To mimic the mentioned e ect, sound designers need to attenuate the correct
range of frequencies of a sound attending to its position and the properties of the
virtual environment. A very common way to do this in Digital Signal Processing
(DSP) is through a Chebyshev (type 1) lter, which is the one being used in
our proposal. Its attenuation (A, in decibels), according to Williams [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], can be
represented as follows:
          </p>
          <p>AdB = 10log[1 +
2Cn2 (!)]</p>
          <p>Cn(!) is a Chebyshev polynomial of the nth order which oscillates between
1 for ! 1.</p>
          <p>=
q
And RdB is the ripple in decibels (dB).</p>
          <p>
            As stated by Smith, type 1 Chebyshev lters grant a \faster roll-o by
allowing ripple in the passband" [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], which means undesired frequencies are quickly
and precisely cut-o . This produces a more clear e ect, which is the reason why
we chose this variant over the rest.
          </p>
          <p>In video games, high frequency attenuation using an LPF is useful when
trying to communicate the distance at which an object is, the kind of materials
that surround the user or whether a sound comes from behind or not.
3</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Experiment Design</title>
        <p>We designed two di erent experiments (which were called Experiment 1 and
Experiment 2 throughout this text) to achieve the results that will be revealed
later.</p>
        <p>8
6
7
3</p>
        <p>Experiment 1. The rst experiment was conceived as a mere pilot. It was
an online survey, distributed through social networks and completed remotely,
in which subjects had to express their opinion on the position of a sound in a
complex audio environment while using a pair of headphones. The sound was
a single, clear, high pitched alarm tone, which came from four di erent places
in space (from a total of 8 possible directions), sequentially. Users did not have
any visual references: only sound. There were two audio tracks playing
simultaneously: on one hand, our alarm sound, normalized at -0,45 dB; on the other,
the complex audio of an action scene, extracted from a technical demo by Epic
Games called Showdown VR3, normalized at -3 dB. The intention pursued when</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 https://www.unrealengine.com/marketplace/showdown-demo</title>
      <p>including these two tracks was to help users di erentiate between our spatialised
alarm sounds and a game-like 3D sound environment, which was used as a
reference. The complex audio environment included sounds of guns ring, explosions,
cries, etc., coming from a variety of directions.</p>
      <p>There were two separated groups of 29 people each, which took two di erent
tests (1A and 1B). The rst one (1A) included the original sound of the
mentioned demo, along with alarm sounds coming from four positions. Everything
was recorded in Unreal Engine, using its default audio system and 3D
spatialisation for each sound. The second one (1B) included the same background track,
recorded in Unreal Engine, but the alarm sounds were processed separately, using
an LPF when they came from behind the player. We utilised Adobe Audition4 as
a tool for designing audio with Chebyshev LPFs, achieving attenuation graphs
similar to the ones in Figure 1.</p>
      <p>Excluding the di erently processed alarm sounds, both surveys had the same
structure, which was as follows:
{ First, the users were asked to listen to an isolated, non-spatialised sample of
the alarm sound.
{ Next, they were explained how to position sounds in a diagram like the one
in Figure 2.
{ Then, subjects were presented with a sound track which included four
consecutive alarm sounds on top of the background noise described above. They
were asked to identify their positions and mark them in the diagram. For
example: rst sound in position 4, second sound in position 1, etc.
{ Lastly, a series of questions related to the demographic pro le of each subject
were asked. They included: age, sex, level of education, diagnosed audition
(hearing) problems, perceived performance during the experiment, frequency
with which the subject plays video games and opinion on the importance of
video game audio.</p>
      <p>
        All questions related to subjects' opinions were posed by using a Likert scale
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Experiment 2. Our second experiment (2) was an on-site test in which users
had to play a minimalistic 3D video game made in Unity5, using a
consumerquality pair of headphones (JVC HA-X570). There were also two versions of this
test (2A and 2B), and each one was taken by 13 di erent subjects.</p>
      <p>This game consisted on an empty room, except for eight spheres that oated
around the player, as the ones shown in Figure 3. Subjects had to complete a
brief interactive sequence during which they had to point and click (with the
help of mouse rotation, buttons and a crosshair on the center of the screen) the
sphere that they thought was emitting the looping alarm sound. If the position
was correct, the sphere would stop playing its sound, start emitting a blue light,</p>
    </sec>
    <sec id="sec-5">
      <title>4 http://www.adobe.com/es/products/audition.html</title>
    </sec>
    <sec id="sec-6">
      <title>5 https://unity3d.com/es</title>
      <p>and nally the next sphere of the sequence would start playing the same alarm
sound from a di erent position. When all eight spheres had been turned on, the
game ended.</p>
      <p>Experiment 2 was thought as a way to increase feedback and allow trial and
error, so that every user would end up having information about their general
performance. Also, in this experiment time was a signi cant measure of how
well a subject did, as even people with many incorrect answers could nish the
experiment, and we could register their delay.</p>
      <p>Before starting the experiment, every subject was given the following guidelines:
{ You will play a game from a rst person perspective.
{ You will be inside a small and dimly lit room.
{ You will not be able to walk around. You will, however, be able to look
around using the mouse.
{ A crosshair is shown at the center of the screen. It indicates where you are
looking at, and always follows the position of the mouse.
{ Eight spheres will oat around you. All will be at the same distance from
you, and static. They will also be at the same distance from each other,
forming a circumference around you.
{ At the beginning of the game, one of the spheres will produce a looping
alarm sound. Your task is to identify the sphere from which that sound is
coming, point at it with the crosshair, and click the left mouse button.
{ If you identi ed the position correctly, the sphere will turn blue and the same
alarm will start playing from a di erent sphere. If you failed to identify the
position, the alarm will keep sounding until you do.
{ The game will nish when all eight spheres are blue and you do not hear
any more sounds.
{ You must complete the task as soon as possible.</p>
      <p>A logger would save data from every user (e. g.: total time to complete the
task and raycast hits from the crosshair), so as to be able to compare
performance between the two di erent audio techniques utilised. Besides, everyone
had to complete a small survey after nishing the experiment, in which they
gave demographic information, as in Experiment 1, and expressed their level of
agreement with the actual location of sounds. Relevant data will be detailed in
the \Results" section.</p>
      <p>Audio system for experiment 2. The only di erence between 2A and 2B was
the audio producing method used by each. While 2A used the original 3D sound
system from Unity, 2B used a modi ed one which worked as follows. Each frame,
all audio coming from visible spheres (that is, from spheres being rendered by
the player's camera) was processed by using simple stereo panning and distance
attenuation (the original methods used in Unity, by default), whereas sounds
coming from objects not being currently rendered were applied a LPF. The
parameters used for the lter were a cuto frequency of 2456 Hz and a low
pass resonance of 1. This e ect is not consistent with how attenuation works in
reality, but is clearly identi able.</p>
      <p>
        The eld of view (FOV) of the in-game camera tried to mimic that of
the frontal eye eld (FEF) of a human eye (around 114 degrees [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]), so that
everything outside on-screen space could be considered to be in the rear.
3.1
      </p>
      <sec id="sec-6-1">
        <title>Hypothesis</title>
        <p>Our main hypothesis is that our proposal, a low-latency spatialisation technique
based on position-dependent LPF, can allow for more accurate sound position
identi cation when compared to a 3D sound system based on panning, such as
the default audio system present in video game engines like Unreal Engine6 and
Unity 3D.</p>
        <p>Therefore, the null hypothesis (H0) in this case is that performance of users
when identifying sound positions does not improve when using an LPF-based
system. The alternative hypothesis (H1) is that it does improve only when using
our system.
3.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Demography</title>
        <p>The rst experiment (1) was passed to a sample of 58 people (41 men and 17
women), randomly distributed in groups of 29 for each version of the test (1A
and 1B), with average ages of 33.03 and 31.28, respectively. The ages ranged
between 22 and 51 in group 1A and between 20 and 43 in group 1B.</p>
        <p>48 of them had gone through college (17 degrees, 29 master degrees and 2
PhDs), whereas 9 had ended their education during high school.</p>
        <p>The second experiment (2) had a smaller sample due to its face-to-face
nature, with a total of 26 subjects (19 men and 7 women), randomly distributed</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6 https://www.unrealengine.com/</title>
      <p>in groups of 13 for each version of the test (2A and 2B). Group 2A had an
average age of 25 (18 to 38), whereas group 2B was 24.31 years old (18 to 37)
on average. 12 subjects were studying a degree related to Computer Science, 9
of them had already nished it, 4 had a PhD and 1 had a master's degree in the
eld.
4</p>
      <sec id="sec-7-1">
        <title>Results and Discussion</title>
        <p>The goal of the rst pilot (Experiment 1) was to study the possibility of a
di erence in accuracy, when users try to identify a sound coming from behind,
with and without a spatialisation system based on LPF. Only one sound came
from the rear in each version of the experiment, so we measured the number of
subjects who got the position of the sound right in each case.</p>
        <p>The rst results were promising, though we could not consider them to be
statistically signi cant due to the high rates of failure most subjects obtained
in both cases. As can be seen in Table 1, the success rate (number of right
answers divided by the total amount of subjects) of group 1A for sounds coming
from behind was a mere 31.03%, while group 1B achieved a 41.38%. In spite of
reaching a di erence of more than 10 points, not having any set of answers with
a success rate of more than 50% led us to the conclusion that the experiment
was too di cult for a person with normal hearing.</p>
        <p>The data collected during the second experiment (2) was more enlightening
than the previous one, as is shown in Table 2 and Figure 4. The average time
taken by subjects from group 2A to complete the task, 38.84 seconds, is far
from the average time of 23.66 seconds achieved by group 2B. This hints at an
improvement due to the utilisation of the new spatialisation system in 2B.</p>
        <p>As Figure 4 shows, results were not regular in Experiment 2 due to the
variation induced by the di erent levels of ability found in subjects. However,
average times in 2A nearly duplicate those in 2B, and the same happens with
maximum completion times for each group: while 2A hits a maximum of 80.36,
2B's highest time was 41.4 seconds.</p>
        <p>Besides, when asked about their own performance (\Was it easy for you to
identify sound positions during the experiment?") in a 5-level Likert-type scale,
11 users in group 2B answered in a positive way: \Strongly agree" (3) or \Agree"
(8), and 2 gave neutral answers (\Neither agree nor disagree"). Users in group
2A, on the other hand, gave a total of 8 positive answers (3: \Strongly agree";
5: \Agree"), plus 5 neutral answers.</p>
        <p>The results attained by the new system negate the null hypothesis (H0) and
con rm the alternative (H1), as there exists a signi cant di erence in
performance between the two prototypes (2A and 2B). If we assume a normal
distribution of subject auditive skills in both groups |during the survey, none of the
subjects we used to collect data said to have any hearing problems|, the only
di erence between the two systems is the addition of LPF to sounds coming
from behind the player, which makes this variable seemingly responsible for the
above-mentioned changes in performance.</p>
        <p>100
80
)
sndo 60
c
e
s
n
i(
iem 40
T
20
0</p>
        <p>Original (2A)</p>
        <p>LPF (2B)</p>
        <p>Though the goals of this research were accomplished, our results would have
been more consistent with a larger set of subjects. The second test (Experiment
2) being in-person, and due to time and space constraints, we had to limit the
total amount of subjects taking the experiment to a set of 26, though having a
greater sample of measures would have been highly bene cial.</p>
        <p>
          Additionally, it would have been useful to integrate our tecnhique in real
video games to test its capacities in real-world situations. This was not done
due to the lack of popular open source video games which depend heavily on
sound spatialisation, and the high development cost it would have had to change
their audio systems. Besides, there were far more men than women among
volunteers during our tests. Though women were distributed evenly between groups,
their small numbers could have in uenced the results, as di erences in hearing
between men and women have been previously discovered [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Moreover, the
set of subjects utilised does not represent a particular statistical population,
and therefore the results cannot be extrapolated to a more generic set of people.
5
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>Conclusions and future work</title>
        <p>Due to the fact that A and B tests contained a single di erence in sound
processing, and considering the results are better in B groups for experiments 1 and
2, we can extract the conclusion that the addition of LPFs to rear sounds seems
to improve |notwithstanding the lack of realism of this technique| recognition
of those sounds in the game-like environments already tested. A probable reason
for the quicker identi cation of sound location when applying LPFs is the nature
of our implementation: as LPFs are only applied when sounds come from a place
that is o camera, the usual reaction for most players was to quickly turn around
every time the LPF e ect was detected.</p>
        <p>Perception of self-performance was evaluated generously by both groups in
experiment 2 (stated in the \Results" section), as even users with the highest
times gave a neutral answer to the question on this matter, which leads us to
think there is no conscious advantage for subjects in group 2B. However, their
results were indeed better.</p>
        <p>As for future research, we think it would be interesting to test our system
when building near- eld interfaces for rst person or VR games, as it can be
used in addition to a more complex and realistic audio engine, and can improve
performance when locating interactive objects in virtual environments. It would
be desirable to check if our system works the same way in a full- edged video
game, in which the player can usually nd many more auditive stimuli. It would
also be useful to build a di erent experiment in which all frontal sounds would
have an LPF applied, and all rear sounds would be left untouched, so as to be
able to judge if LPFs are automatically associated to places in the rear, or if they
simply induce the observed behaviour by producing contrast (and thus, pattern
learning) between two sound categories: frontal and rear.</p>
        <p>Another interesting addition to this work would be to compare the
performance of our technique to that of an engine which uses HRTFs, newer approaches
such as physics-based sound (e.g.: Steam Audio), or a combination of these two.
A possibly useful experiment for this research goal would be to test user
performance when identifying sound location in three di erent audio environments:
one which uses only our method, a second one which uses Steam Audio
out-ofthe-box, and a third one which uses a combination of both, so that LPFs are
only applied to near- eld objects not being currently rendered, and the rest of
the sounds are spatialised normally.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Acknowledgements</title>
        <p>This research was funded by the Complutense University of Madrid (grant
CT27/16-CT28/16 for predoctoral research), in collaboration with Santander
Bank and NIL research group.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Morimoto</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ando</surname>
          </string-name>
          , \
          <article-title>On the Simulation of Sound Localization,"</article-title>
          <source>Journal of the Acoustical Society of Japan</source>
          , vol.
          <volume>1</volume>
          , pp.
          <volume>167</volume>
          |-
          <fpage>174</fpage>
          ,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Joo</surname>
          </string-name>
          , and W.-C. Park, \
          <article-title>Real-time Sound Propagation Hardware Accelerator for Immersive Virtual Reality 3D Audio,"</article-title>
          <source>Proceedings of the 21st ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lessiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freeman</surname>
          </string-name>
          , E. Keogh, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Davido</surname>
          </string-name>
          , \
          <article-title>A cross-media presence questionnaire: The ITC-Sense of Presence Inventory,"</article-title>
          <source>Presence</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Phillip Brown</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Duda</surname>
          </string-name>
          , \
          <article-title>A structural model for binaural sound synthesis,"</article-title>
          <source>IEEE Transactions on Speech and Audio Processing</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>5</issue>
          , pp.
          <volume>476</volume>
          {
          <issue>488</issue>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Makino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Rutkowski</surname>
          </string-name>
          , \
          <article-title>Brain Evoked Potential Latencies Optimization for Spatial Auditory Brain-Computer Interface,"</article-title>
          <source>Cognitive Computation</source>
          , vol.
          <volume>7</volume>
          , pp.
          <volume>34</volume>
          {
          <issue>43</issue>
          , feb
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Begault</surname>
          </string-name>
          , \
          <fpage>3</fpage>
          -
          <string-name>
            <given-names>D</given-names>
            <surname>Sound</surname>
          </string-name>
          for
          <article-title>Virtual Reality and Multimedia,"</article-title>
          <source>Computer Music Journal</source>
          , vol.
          <volume>19</volume>
          , no.
          <source>April</source>
          , p.
          <fpage>99</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>W. G.</given-names>
            <surname>Gardner and K. D. Martin</surname>
          </string-name>
          , \
          <article-title>HRTF measurements of a KEMAR,"</article-title>
          <source>The Journal of the Acoustical Society of America</source>
          , vol.
          <volume>97</volume>
          , no.
          <issue>6</issue>
          , pp.
          <volume>3907</volume>
          {
          <issue>3908</issue>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>O.</given-names>
            <surname>Warusfel</surname>
          </string-name>
          , \LISTEN HRTF database,
          <year>" 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>V.</given-names>
            <surname>Algazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Duda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thompson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Avendano</surname>
          </string-name>
          , \
          <article-title>The CIPIC HRTF database,"</article-title>
          <source>in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics</source>
          , pp.
          <volume>99</volume>
          {
          <issue>102</issue>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. C. Gupta</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barreto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Agudelo</surname>
          </string-name>
          , \HRTF Database at FIU DSP Lab,
          <article-title>"</article-title>
          <source>in 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)</source>
          ,
          <source>(Dallas)</source>
          , pp.
          <volume>169</volume>
          {
          <issue>172</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. P. Balazs, \ARI HRTF Database,
          <article-title>"</article-title>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Electronic Filter Design Handbook.
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>The Scientist and Engineer's Guide to Digital Signal Processing</article-title>
          .
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. R. Likert, \
          <article-title>A technique for the measurement of attittudes,"</article-title>
          <source>Archives of Psychology</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>140</issue>
          , pp.
          <volume>1</volume>
          {
          <issue>55</issue>
          ,
          <year>1932</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>I. P.</given-names>
            <surname>Howard</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Rogers</surname>
          </string-name>
          ,
          <article-title>Binocular vision and stereopsis (extraits)</article-title>
          , vol.
          <volume>29</volume>
          . Oxford University Press,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Don</surname>
            ,
            <given-names>C. W.</given-names>
          </string-name>
          <string-name>
            <surname>Ponton</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Eggermont</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Masuda</surname>
          </string-name>
          , \
          <article-title>Gender di erences in cochlear response time: an explanation for gender amplitude di erences in the unmasked auditory brain-stem response,"</article-title>
          <source>The Journal of the Acoustical Society of America</source>
          , vol.
          <volume>94</volume>
          , pp.
          <volume>2135</volume>
          {
          <issue>48</issue>
          , oct
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>