<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Human-in-the-loop testing of the explainability of robot navigation algorithms in extended reality</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jérôme Guzzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Giusti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dalle Molle Institute for Artificial Intelligence (IDSIA), USI-SUPSI</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Developing robots to be deployed in spaces shared with people requires testing with humans-in-the-loop. In fact, only humans co-located with the robots are in a suitable position to judge the robots' behaviors. This is especially important when people participate in the same task as the robots, like when navigating a shared environment: the actions of people and robots influence each other. To test the resulting dynamic system, we need to let real people experience the interaction, which, when done in simulation, translates to immersing real users using Virtual or Mixed Reality. We implement and demonstrate this solution where users experience variable legibility, predictability, and explainability of the robot navigation algorithms, depending on if and how the robots explicitly communicate their intentions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Social robotic navigation</kwd>
        <kwd>Human-in-the loop</kwd>
        <kwd>Virtual Reality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Mobile robots are increasingly deployed in spaces shared with people: from small vacuum
cleaners to large autonomous vehicles for moving goods, their behavior impacts people surrounding
them [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In particular, when people and robots move in the same space, they take each other
states and actions into account to adjust their movements, with the shared goal of avoiding
collisions and approaching targets [2]. For people and animals, occupying space is an implicit
form of communication called proxemics [3]: humans are aware of personal spaces, respect
social rules when approaching others, and infer the intentions of neighbors from current and
past actions. During navigation, humans predict and account for future trajectories of neighbors:
an ability that relies on the legibility (i.e., infer goals from actions) and predictability (i.e., infer
actions from goals) of their neighbors’ actions [4].
      </p>
      <p>To promote acceptance by surrounding people, it is therefore important that the navigation
behavior of robots is — in addition to safe, eficient, and respectful of basic social rules — legible
and predictable [5]. How much robots should rely on proxemics or use more explicit forms of
communication, e.g. through their gaze[6], is an open research question, whose answer depends
on the context. Explicit communication is also common among humans when navigating or
driving, like nodding or waving at each other at a crossing, or signaling intentions by activating
the turn signals.</p>
      <p>Even when the individual robot behavior is based on transparent and explainable bio-inspired
rules that mimic pedestrians [7], trajectories result from a complex multi-agent dynamical
system and are less predictable, in particular in densely packed spaces. Nonetheless, co-located
humans need to understand the behavior of the robots, at least in terms of legibility and
predictability, which in this context are primary factors for explainability [8, 9]</p>
      <p>The development of navigation algorithms for social robots requires testing with
humansin-the-loop. In fact, only people moving among the robots are in a suitable position to judge
the robots’ behaviors: people can experience the influence of the surrounding robots, and
experience if and how the robots’ presence afects their psycho-physical efort. This brings the
question of how to gather human feedback during early development phases when it is not yet
possible, or convenient, to test with real robots but simulations are available. The answer we
explore in this contribution is to use Virtual Reality to immerse subjects in simulations, where
they experience interacting with the robots with enough realism to not impact significantly
their feedback. In fact, Virtual Reality is well suited to experiment with cognitive
HumanRobot Interaction, like when using pointing gestures to select objects or destinations [10] or
to build a wall together [11]. Simulation speeds up development, for instance, by isolating
control algorithms from perception; Virtual Reality extends it to include real humans in a
human-centred development cycle.</p>
      <p>Our research targets navigation algorithms for autonomous wheelchairs in the context of the
Horizon Europe project REXASI-PRO (REliable &amp; eXplAinable Swarm Intelligence for People
with Reduced mObility)1 that is developing trustworthy-by-design tools to help people with
reduced mobility. We are designing social-compliant behaviors that are comfortable for the user
sitting in the wheelchair, as well as perceived as friendly by people sharing the same spaces. In
this work, we present a demonstration of the setup we are using to test navigation algorithms
with humans-in-the-loop: subjects, wearing a wireless VR headset, immerse themselves in a
Virtual Reality simulation populated with virtual wheelchairs and virtual pedestrians, are free to
move around and experience how the virtual agents behave and react to their actions. Subjects
experience navigation scenarios in one of two modalities: in Virtual Reality, they perceive
virtual agents in a virtual environment; in Mixed Reality, they perceive virtual agents in the
real environment instead. In the scenarios, robots are equipped with diferent behaviors, some
using explicit communication (e.g., switching on/of LEDs) to signal their intentions. We believe
that this demonstration ofers an interesting and unique opportunity for people to experience
interaction with AI agents in a simple but ubiquitous task that displays the interplay between
legibility, predictability, explainability, and communication.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Extended reality testbed for social robotics navigation</title>
      <sec id="sec-2-1">
        <title>2.1. Infrastructure</title>
        <p>To interact with virtual robots, users wear a VR headset. The headset runs an application that
connects as a client to a robotic simulator. The simulator regularly broadcasts the state of the
simulated world, which the client uses to render the scene from the point of view of their user.
1https://rexasi-pro.spindoxlabs.com
ZeroMQ on WLAN
advertise
announce
send init</p>
        <p>sync
state
state</p>
        <p>In the following, we describe the components and their integration illustrated in Figure 1.
Navground Navground2 is a collection of navigation algorithms and tools to
simulate/integrate/test them. It’s a cross-platform open-source project that provides benchmarks and
reference implementations for robotic navigation. Users can extend from C++ or Python the
navigation algorithms, and other components like kinematics models, and test them against
baselines in diferent navigation scenarios. It is integrated with ROS2 and with machine learning
libraries, like Gymnasium,3 to support ML-based navigation policies.</p>
        <p>Simulator CoppeliaSim [12] is one of the most used simulators for robotics research. It is
very flexible, supporting diferent ways — from embedded LUA scripts to C++ plugins and ROS
interfaces — to program robots. It ofers an intuitive WYSIWYG interface to edit simulation
scenes. Navground supports CoppeliaSim by providing LUA bindings to configure simulations
where robots or other agents (e.g., pedestrians) use Navground’s algorithms to navigate.
Headset The Quest 3 from Meta is a relatively inexpensive VR headset that supports color
passthrough for Mixed Reality applications. We develop an application, using the oficial Unity
API,4 that renders on the headset a scene simulated by CoppeliaSim on a separated computer (a
laptop, in our setup): through the application, the headset acts as a client to a server running
CoppeliaSim. The scene is rendered at up to 120 Hz on the headset (to avoid motion sickness),
interpolating and extrapolating the state updated at a lower rate in CoppeliaSim (typically at
20 Hz).</p>
        <p>Communication between headset and simulator We use ZeroMQ to synchronize the
state of the simulation between the server and the clients, as depicted in Figure 1. CoppeliaSim
instances are automatically discovered by (wireless) clients in the same local network, which
announce themselves to the server. The server sends to each client a description of the current
scene containing the pose and shape of all visible objects. At each simulation step, the server
publishes poses and velocities of objects that have moved, which clients use to update their
2https://github.com/idsia-robotics/navground
3https://gymnasium.farama.org
4https://developer.oculus.com/documentation/unity
rendering. At the same time, clients stream the position of the headset (tracked by onboard
sensors) to the server. CoppeliaSim passes the location, shape, and velocity of each user to the
navigation algorithms controlling the simulated agents, which take them into account to avoid
collisions.
2.2. Usage
With the infrastructure described in Section 2.1, we can immerse users in any scene that we can
simulate in CoppeliaSim: “what you simulate, you can interact with”. This is very powerful
because it extends the domain of a robotic simulator like CoppeliaSim to any human-robot
interaction that does not require physical contact between robots and people.
Freedom of movement Users wear a wireless headset and are free to move in the space
to observe (and interact with) robots from diferent places/viewpoints: this is required to
experience proximity-dependant interactions.</p>
        <p>Virtual and Mixed Reality Users can interact with the robots in two possible modes (see
Figure 2). In Virtual Reality, they can immerse themselves completely in a world that could be
diferent and possibly larger their physical environment, and focus on the interaction without
distractions. In Mixed Reality, users may feel safer as they are aware of the physical environment,
perceive other people or real robots more realistically, and perform interactions situated in the
real environment, like when interacting with a virtual robot placed on a real desk or observing
a virtual smart wheelchair passing through a real door. However, Mixed Reality is constrained
by the environment.</p>
        <p>Multiple users The infrastructure supports multiple users immersed in the same simulation
at the same time. In Virtual Reality, users see the avatars of other users, while in Mixed Reality
they directly see other users. Moreover, in Mixed Reality, users have to physically share the
space, while in Virtual Reality they may actually be in separate spaces that only virtually overlap.
In both cases, the only requirement to support multiple users is that all headsets are connected
to the same wireless network and that users agree on a common reference frame, which is done
through a simple procedure that only needs to be performed once in each environment.
Displacing objects Using their hands or VR controllers, users can move virtual objects.
When an object is is in the user’s hand, CoppeliaSim stops simulating its dynamics but resumes
as soon as the user drops the object. The ability to move objects (or even robots or parts of
robots) increases the range of interactions that we can simulate. For instance, we could interact
with a robotic arm by moving a target point for its end-efector, we could move way-points
used by navigating robots, or we can modify obstacles, like when opening or closing doors.
Multiple scenes The same client/server applications described in Section 2.1 work with any
scene that the user selects from CoppeliaSim or from the headset: changes are propagated to all
connected clients. No software update is required when adding new scenes from CoppeliaSim:
one VR application can therefore support diferent scenes that developers are free to add. For
instance, researchers working with a particular robot only need to integrate it in CoppeliaSim
to be able to render it in VR.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Testing social navigation algorithms</title>
        <p>We use the infrastructure described in Section 2.1 to develop, test, and validate algorithms
for social navigation. After developing an algorithm with Navground, we design one or more
simulation scenes in CoppeliaSim where some of the agents are configured to use this algorithm.
Scenes reproduce many diferent situations, ranging from empty spaces (e.g. a plaza or a large
hall), to narrow corridors, to unstructured environments with obstacles at random positions.</p>
        <p>During user-tests, subjects wearing the VR headset are immersed in the scene simulated
in CoppeliaSim, and are free to move in a an area of the physical environment that is free of
obstacles. We instruct subjects to perform a navigation task (like “walk along the corridor”,
“follow the yellow arrow”, or “reach the yellow target”). During the tests, we record their poses
using the VR headset’s native tracking capabilities. At the end of the tests, we ask subjects to
provide feedback about the dificulty of the task and the behavior of the agents.</p>
        <p>We use objective and subjective feedback from subjects to assess and improve the agent’s
navigation behavior, which we then test again. This development cycle is an efective way to
incorporate users’ feedback. Compared with real-world tests, it is easier and faster to deploy/test
algorithms in simulation and we avoid any safety issues from potential collisions between robots
and people. In contrast with tests where subjects passively look at a simulation/video to judge
the robots’ behaviors, we can gather more realistic feedback because being part of the navigation
scenario significantly changes its perception. For instance, from the outside, virtual robots
generally appear slower to passive observers than to observers who are actively avoiding them
in the VR simulation. Our goal is that robots are perceived as friendly by nearby people, therefore
we should test their behavior in similar settings.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The demonstration</title>
      <sec id="sec-3-1">
        <title>3.1. Communication and explainability in social robotics navigation</title>
        <p>We propose a demonstration that focuses on the relationship between communication and
legibility for robotic navigation in spaces shared with people. As a form of implicit communication
(proxemics), people and robots partially expose their intentions by moving towards a target: if
we see that an autonomous wheelchair, traveling along a corridor, turns towards a door, we can
be quite sure that it would like to pass that door and enter a room. In more formal terms, we
can read the intention from the trajectory, use this knowledge to predict the future trajectory,
and take actions, like moving out of its way. In a more complex scenario, like when there are
many robots navigating in the same space, their behaviors lose legibility — additional factors
influence the trajectories behind the goal to enter the room — which in turn makes humans
less able to predict and adapt to the robots’ behavior. From the point of view of the robots,
very similar dynamics happen with respect to surroundings robots and people. This results in a
dynamical system where, although the algorithms covering the single (non-human) agents may
be deterministic and explainable, the collective behavior is chaotic and more opaque.</p>
        <p>To improve legibility, and predictability, and let co-located humans better understand the
behavior of robots, we can make robots expose their intentions more explicitly. In the proposed
demonstration, we test, in simulation, simple communication strategies for ground robots,
like: 1) switching on LEDs when about to enter/exit a room or pass through a narrow door;
2) projecting the goal trajectory or direction on the floor using a laser; 3) hinting the target
goal on a front-facing screen (like the Astro robot by Amazon directs its virtual eyes). How
much do these strategies impact legibility, predictability, and explainability? Which is one
do surrounding people find more intuitive and comfortable? We demonstrate and investigate
these questions in a simulation with humans-in-the-loop using the Extended Reality test-bed
described in Section 2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. User experience during the demonstration</title>
        <p>For each user, the demonstration starts by explaining the general functioning and goals. Then,
they move to the middle of an area free of obstacles and wear the VR headset, which immerses
them in a sequence of navigation scenes they are guided through, as illustrated in Figure 2.</p>
        <p>At first, in a simple scene where a virtual autonomous wheelchair and a virtual pedestrian
go back and forth between two points, users get comfortable with switching between Mixed
and Virtual Reality and experience how the virtual agents react to their movements. Then, they
progress through more complex scenes where, for instance, agents have to pass an area with
many crossings, or move in an indoor space where only one agent at a time can pass through
doors (see Figure 3). Users are encouraged to participate in the navigation task (e.g., enter/exit
the rooms) but may also just passively observe the scene and the behavior of the virtual agents.</p>
        <p>When users are immersed in the simulation, spectators can observe the current scene on a
computer screen — where the environment, virtual agents, and real users are rendered from a
ifxed external point of view — to understand what is going on (see Figure 2).</p>
        <p>For each user, a demonstration lasts about 5 minutes. The demonstration is very similar to
the procedure we use in our lab to test and validate the navigation algorithm with
humans-inthe-loop, with the main diference that, as a demonstration, we do not follow a protocol, or
record users’ actions and feedback.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>We presented an interactive demonstration where participants move among virtual autonomous
wheelchairs in Extended Reality. On one hand, participants experience how their behavior
influences the surrounding robots’ navigation behavior. On the other hand, participants also
observe how diferent navigation algorithms impact the legibility and predictability of the
robots, which in turn influences the participants’ actions. In particular, they experience how
explicitly exposing robots’ intentions impacts the intuitive explainability of the individual and
collective behaviors of the robots. The demonstration is part of a research efort to improve the
ability of robots to navigate spaces shared with people in a safe, eficient, and friendly way.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Swiss State Secretariat for Education, Research and lnnovation
(SERI) under contract no. 22.00291 (REXASI-PRO project). The project has been selected
within the European Union’s Horizon Europe research and innovation programme under
grant agreement ID: 101070028 (call HORIZON-CL4-2021-HUMAN-01-01). Views and opinions
expressed are however those of the authors only and do not necessarily reflect those of the
funding agencies, which cannot be held responsible for them.
[2] C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Steinfeld, J. Oh, Core
challenges of social robot navigation: A survey, ACM Transactions on Human-Robot
Interaction 12 (2023) 1–39.
[3] E. T. Hall, et al., Proxemics, Current anthropology 9 (1968) 83–108.
[4] A. D. Dragan, K. C. Lee, S. S. Srinivasa, Legibility and predictability of robot motion, in:</p>
      <p>ACM/IEEE Int. Conf. on Human-Robot Interaction (HRI), 2013, pp. 301–308.
[5] C. Lichtenthäler, A. Kirsch, Towards legible robot navigation-how to increase the intend
expressiveness of robot navigation behavior, in: Int. Conf. on Social Robotics, 2013.
[6] M. M. Neggers, P. A. Ruijten, R. H. Cuijpers, W. A. IJsselsteijn, Efect of robot gazing
behavior on human comfort and robot predictability in navigation, in: IEEE Int. Conf. on
Advanced Robotics and Its Social Impacts (ARSO), 2022, pp. 1–6.
[7] J. Guzzi, A. Giusti, L. M. Gambardella, G. Theraulaz, G. A. Di Caro, Human-friendly robot
navigation in dynamic environments, in: IEEE Int. Conf. on Robotics and Automation
(ICRA), 2013, pp. 423–430.
[8] A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, M. Kankanhalli, Trends and trajectories for
explainable, accountable and intelligible systems: An hci research agenda, in: Conf. on
Human Factors in Computing Systems (CHI), 2018, pp. 1–18.
[9] S. Wallkötter, S. Tulli, G. Castellano, A. Paiva, M. Chetouani, Explainable embodied agents
through social cues: a review, ACM Transactions on Human-Robot Interaction (THRI) 10
(2021) 1–24.
[10] J. Guzzi, G. Abbate, A. Paolillo, A. Giusti, Interacting with a conveyor belt in virtual reality
using pointing gestures, in: ACM/IEEE Int. Conf. on Human-Robot Interaction (HRI), 2022,
pp. 1194–1195.
[11] S. Shayesteh, H. Jebelli, Toward human-in-the-loop construction robotics: Understanding
workers’ response through trust measurement during human-robot collaboration, in:
Construction research congress, 2022, pp. 631–639.
[12] E. Rohmer, S. P. N. Singh, M. Freese, V-rep: A versatile and scalable robot simulation
framework, in: IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2013, pp.
1321–1326.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Syrdal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dautenhahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Koay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Walters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <article-title>Sharing spaces, sharing lives-the impact of robot mobility on user perception of a home companion robot</article-title>
          ,
          <source>in: Int. Conf. on Social Robotics</source>
          , Springer,
          <year>2013</year>
          , pp.
          <fpage>321</fpage>
          -
          <lpage>330</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>