<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops at the Third International Conference on Hybrid Human-Artificial Intelligence (HHAI), June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Intention Recognition and Communication for Human-Robot Collaboration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hadi Banaee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franziska Klügl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fjollë Novakazi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephanie Lowry</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Applied Autonomous Sensor Systems (AASS), Örebro University</institution>
          ,
          <addr-line>Fakultetsgatan 1, 701 82 Örebro</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Human-robot collaboration follows rigid processes, in order to ensure safe interactions. In case of deviations from predetermined tasks, typically, processes come to a halt. This position paper proposes a conceptual framework for intention recognition and communication, enabling a higher granularity of understanding of intentions to facilitate more eficient and safe human-robot collaboration, especially in events of deviations from expected behaviour.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Intention recognition</kwd>
        <kwd>intention granularity</kwd>
        <kwd>human-robot collaboration</kwd>
        <kwd>human-robot communication</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The promise of Industry 4.0 is a paradigm shift towards interconnected manufacturing systems
that leverage advanced technologies such as artificial intelligence and automated technologies
to optimise production processes. As removing humans from manufacturing is not a viable
option [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the focus is instead on finding ways to integrate humans and machines to work
collaboratively and eficiently. The reason for the development of mixed human-robot teams
in various application areas, such as assembly and transportation, is to combine the flexibility,
adaptability, and problem-solving skills of humans with the precision and eficiency of robots. In
the past, this has been accomplished by organising activities in highly controlled environments
where robots and humans are kept apart by enforcing safeguards, such as maintaining spatial
or temporal distances between them and assigning tasks or objectives to each agent, whether
they are human or robotic.
      </p>
      <p>For a successful transition into an industry 4.0 and subsequently 5.0 setting, robots need to
have the ability to coexist and interact with humans in both physical and social settings. This
entails creating a safe environment for humans where they can perceive, interpret, and respond
to the actions and intentions of robots, and vice versa. This, though, becomes a challenge in the
event of deviations from the established processes or assigned tasks. While a simple intention
recognition (IR) approach places the responsibility exclusively on the robotic agent to detect
actions and adapt their behaviour accordingly, a more efective approach would integrate a
hierarchical IR with communication strategies to detect and clarify the reasons for deviations.
This would allow the sharing of responsibilities to ensure a seamless continuation of teamwork
and ultimately, productivity in the evolving industry landscape.</p>
      <p>To achieve this goal, we argue that there is a need to identify the appropriate level of
granularity, sequence, and abstraction of tasks as intentions for an IR and communication
framework in human-robot teams. Therefore, this position paper explores alternative solutions
driven by an intention recognition approach to managing a human-robot collaboration (HRC),
or mixed-agent teams, to enable more efective and seamless interactions between humans and
robots that do not require rigid safety constraints, but instead rely on communication strategies
to enable the handling of deviations and enhanced reactions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        For HRC to be successful, the robot must analyse and understand human intentions, as well
as efectively convey its own goals. Intention refers to the mental state or attitude of aiming
to do or achieve something [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It involves a conscious decision or plan to act, often driven
by a purpose or goal. Hence, understanding intent involves identifying the occurring activity,
inferring the objectives of the task, and predicting the next actions. In an HRC context, IR refers
to the process of identifying the intentions of agents, whether they are human or robotic, by
examining their sequences of actions and/or analysing the impact of their actions on the state of
the environment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In other words, IR can be defined as the process of inferring the intentions
of an agent by analysing their behaviour [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Hence, in this paper, we define IR as the capacity
to identify the particular goal being pursued by precisely discerning the exact course of action
that is taken [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The current research body presents various methodologies to address this challenge. One of
the driving ideas is that robots need to be more proactive in their interactions with humans,
which puts IR at the centre of possible solutions. For example, Tong and colleagues [6] explored
a method for proactive human IR based on context changes and triggers, utilising vision-based
technologies. However, a single feature reliance can make such a system unreliable. Other
research looked at how two-way IR and communication afect HRC to figure out what people
are going to do, like pick up an object, and help team members work together better [7]. In
yet another attempt to explore task sequences as a source for IR, the trajectory of human
actions was analysed to predict the subsequent actions [8], thus breaking down intentions into
smaller actions. A diferent study utilised an inverse planning approach to IR, putting forward
a logic-based approach for fully observable systems. This approach inferred the human’s goal
by observing a sequence of actions and still allowed for small deviations in the sequence where
the human perform their actions in a diferent order [9].</p>
      <p>These approaches lack addressing granularity to recognise intermediate intentions
hierarchically. Moreover, deviations are merely studied by moving from a limited set of actions to a single
intention as the overall goal. This creates a challenge for a complex HRC since inferring only the
ifnal goal as an intention from the atomic actions might not lead to enhanced decision-making.</p>
      <p>Typically, in HRC, communication (whether explicit or implicit) is often used to convey an
intention from the robot to the human [10]. However, the framework presented here aims to
facilitate purposeful communication in the event of a deviation. This allows the robot to adjust
its decision-making and subsequent reaction when interacting with a human agent.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conceptualisation of IR Framework</title>
      <p>In the context of mixed human-robot teams, intention recognition is crucial to achieving two
main challenges: 1) safety enhancement by avoiding potential harm or hazards, and 2) eficiency
by providing proper support and assistance from the agents. Our conceptual framework for
intention recognition and communication addresses these challenges by focusing on aspects
of the appropriate level of granularity, sequentiality of actions, and deviation in performing
the tasks. To formalise the conceptual framework, first, we clarify the input of the framework
in such a context. Then, we specify in which situation the proposed intention recognition
framework will be triggered.</p>
      <p>In a human-robot environment, the agents (i.e., humans and/or robots) are asked to fulfill a
certain goal by performing a sequence of predetermined tasks. Each of the predetermined tasks
might be either at the level of primitive observable actions (e.g., pick a tool) or at a higher level
of abstraction (e.g., repair the machine). The agents should have a shared understanding of the
tasks and the overall team goal. However, situations may arise when an agent deviates from the
shared plan. In such a situation, the other agents need to recognise the intentions behind the
deviation and react accordingly, to avoid unnecessary interruptions to the workflow.</p>
      <sec id="sec-3-1">
        <title>3.1. Conceptual Model</title>
        <p>Our proposed conceptual model takes into account three following aspects:
Temporal sequence of actions: An intermediate intention may not be directly observable
by a single action. Therefore it is important to consider the sequence of actions to infer the
intermediate intention behind them. For example, in the kitting task below, a robot follows
the sequence of the move–pick–move–place actions to fulfill the intermediate intention of
“collecting one item”.</p>
        <p>Granularity of intentions in a hierarchical structure: One can consider primitive actions
as the first level of intentions. The combination of these intentions can then lead to inferring a
higher abstraction level of intermediate intentions. But these intermediate intentions can also
be combined to infer even higher levels of abstractions as intermediate intentions. The highest
level of abstraction will be then the shared intention of the team as the overall goal. Note that
the granularity of the intentions is not fixed and can be adjusted based on the context and the
requirements of the task.</p>
        <p>Deviation from the predetermined tasks: In the process of inferring the intermediate
intentions toward achieving the goal, the agents should be able to detect a deviation from the
shared plan. The deviation is either detected by observing an unexpected sequence of primitive
actions or is identified after inference of intentions, i.e., as a deviation in the intermediate
intentions.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Framework Processes</title>
        <p>Based on the concepts illustrated in Fig 1, the following processes are required for the overall
adaptive intention recognition and communication framework.
1) Observation and Context Analysis: The agents follow the shared tasks and observe the
primitive actions of the other agents, considering the context and the goal of the task. These
observable actions will be the ingredients of the IR process.
2) Intention recognition: The sequence of observed actions is analysed to infer the
intermediate intentions of the other agents, considering the temporality of the actions and the
granularity of the lower-level intentions. These inferences can be made in higher levels of
intention abstractions until leading to meaningful decision-making.
3) Deviation Detection: The agents recognise the deviation from the shared tasks by either
observing the sequence of actions performed by the other agent, or at a higher level of abstraction,
by recognising the change in the intermediate intentions.
4) Adaptation &amp; Reaction: Based on the deviation detection, and the expected intermediate
intentions based on the shared tasks, the agents can make a decision based on the higher-level
abstract intention behind the deviation and react accordingly to the situation. The proper
reaction depends on the level of abstraction of the intention and the context of the task. This
component can either lead to a direct execution of the reaction or may infer further adaptation
based on the complexity of the situation, which leads to the next step (i.e., communication).</p>
        <p>Observation &amp;
Context Analysis</p>
        <p>Deviation
Detection</p>
        <p>Intention</p>
        <p>Recognition
Communication
5) Communication: In most cases, before deciding whether a reaction is proper to be
executed, further adaptions are needed. To do so, the agents communicate the reasons behind
the deviation, to ensure a shared understanding of the task and the goal. This component
along with others can be seen as a cycle, as the new adaptation driven by the performed
communication can lead to further iterations of intention recognition.</p>
        <p>These components as the main building blocks of the proposed framework create a loop
of intention recognition and communication, which ensures the shared understanding of the
task and the goal, and also the safety and eficiency of the agents in a mixed human-robot
environment. Figure 2 illustrates the components of the proposed framework and the loop of
intention recognition and communication.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Illustrative Scenario: Large Sale Kitting Task</title>
      <p>To demonstrate the proposed framework, we consider a large-scale kitting task in an industrial
environment. The overall goal of the kitting task is to collect items from storage racks around a
kitting area and place them on a central table. Figure 3 illustrates a simple representation of the
kitting task scenario with a human and a robot in the environment.</p>
      <p>In this scenario, we assume there are two agents – a human and a robot – working
collaboratively to complete this task. Each agent has access to its designated half of the storage racks
and the kitting table, and the predetermined plan they are following ensures that – to minimize
collisions and disruptions – they should not enter each other’s designated zones.</p>
      <p>The agents have a shared understanding of the tasks and the goal and they can observe
the actions of each other within the environment. The robot is equipped with the proposed
intention recognition framework: it can observe human actions, infer human intentions, detect
deviations in various levels of abstraction, and react accordingly.</p>
      <p>When a deviation from the shared tasks is detected, the robot adapts its reactions based on the
inferred intentions and communicates with the human if necessary. The robot then can make a
decision based on the higher abstract intention behind the deviation and react accordingly to the
situation. We emphasise on the diferences between the possible reactions: 1) intrinsic reaction
where the robot directly reacts to the deviation by only considering the observed actions, or 2)


retrieve items from the storage boxes around the edges of the kitting area and place them on the kitting
boxes on the left-hand side and the robot should collect items from the blue boxes on the right-hand
side. However, sometimes deviations may occur from the plan, and the human may cross the central
dividing line to enter the robot’s work area.
enhanced reaction where the robot infers the intentions and communicates with the human.</p>
      <p>We demonstrate two examples of deviations that may occur during the execution of the
kitting task and discuss how the proposed framework can be applied to handle these situations
with enhanced reactions based on inferred intentions and communication in comparison with
the intrinsic reaction.</p>
      <sec id="sec-4-1">
        <title>Scenario 1:</title>
        <p>The robot fails to pick up an item because the items are placed in the box in an
unfortunate way. The intrinsic reaction of the robot is to give an alarm signal and then wait for
the human to come and help pick up the dificult item. According to the intrinsic reaction, the
robot must remain stationary while the human is within the robot’s designated zone to avoid
any possible collisions.</p>
        <p>However, an enhanced reaction is for the robot, once it has informed the human of the picking
failure, to deviate from the predetermined plan and continue to collect other items from the
rack while the human retrieves the dificult item. The robot will re-plan to ensure that there
is no collision with the human – that is, it must only collect items that do not interfere with
the human’s ability to retrieve the dificult item. The IR framework will facilitate the robot to
recognise that the human has deviated from the predetermined plan to help the robot retrieve
the dificult item, and allow the robot to communicate its own new intentions while the human
is within the robot’s designated zone. This enhanced reaction can potentially avoid the collision
and improve the eficiency of the task.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Scenario 2:</title>
        <p>The robot has lost an item on its way to the kitting table, and the human crosses
into the robot’s designated zone to help retrieve the item. The intrinsic reaction of the robot
is to stop to avoid collisions with humans. An enhanced reaction could be to recognise that
the human comes to help and communicate about what to do with the lost item and/or who
brings it to the table. This enhanced reaction can potentially improve the eficiency of the task
by resolving the ambiguities in the plan execution.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. The Role of Communication for IR</title>
      <p>In our proposed framework, the inferences driven by IR enable humans and robots to adapt and
react safely and appropriately to deviations in a predetermined workflow. When deviations
occur, humans and robots need to communicate and negotiate to enable reasonable
decisionmaking, to react appropriately, and to allow the continuation of the workflow in a safe and
eficient manner. The proposed framework introduces a hierarchy to inferring the intentions, by
introducing semantically meaningful knowledge on a higher level, which enables more precise
communication on any of the intermediate intention levels. It is a tool to reduce ambiguity, and
consequently, uncertainty in the intention recognition framework.</p>
      <p>Communication is an essential component of the intention recognition – adaptation/reaction
– communication cycle. Through more purposeful understanding and recognition of each other’s
intentions, empowering more expressive communication strategies, the team becomes more
of a peer-to-peer system, assisting each other to clarify the inferred intentions, aligning the
team’s mental model, thereby enhancing the process instead of stopping operations in case of
deviations.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>This work was supported by the Swedish Knowledge Foundation in the TeamRob Synergy
Project (contract number 20210016).
[6] T. Tong, R. Setchi, Y. Hicks, Context change and triggers for human intention recognition,</p>
      <p>Procedia Computer Science 207 (2022) 3826–3835.
[7] M. L. Chang, R. A. Gutierrez, P. Khante, E. S. Short, A. L. Thomaz, Efects of integrated
intent recognition and communication on human-robot collaboration, in: 2018 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 3381–
3386.
[8] J. Zhang, H. Liu, Q. Chang, L. Wang, R. X. Gao, Recurrent neural network for motion
trajectory prediction in human-robot collaborative assembly, CIRP annals 69 (2020) 9–12.
[9] S. Buyukgoz, J. Grosinger, M. Chetouani, A. Safiotti, Two ways to make your robot
proactive: Reasoning about human intentions or reasoning about possible futures, Frontiers
in Robotics and AI 9 (2022) 929267.
[10] R. Salehzadeh, J. Gong, N. Jalili, Purposeful communication in human–robot collaboration:
A review of modern approaches in manufacturing, IEEE Access 10 (2022) 129344–129361.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolbeinsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lagerstedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lindblom</surname>
          </string-name>
          ,
          <article-title>Foundation for a classification of collaboration levels for human-robot cooperation in manufacturing</article-title>
          ,
          <source>Production &amp; Manufacturing Research</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>448</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Merriam-Webster</surname>
          </string-name>
          , Intention, n.d. URL: https://www.merriam-webster.com/dictionary/ intention, accessed 16 Apr.
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sadri</surname>
          </string-name>
          ,
          <article-title>Logic-based approaches to intention recognition, in: Handbook of research on ambient intelligence and smart environments: Trends and perspectives</article-title>
          ,
          <source>IGI Global</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>346</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Belle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Petrick</surname>
          </string-name>
          ,
          <article-title>Intention recognition with problog</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          <volume>5</volume>
          (
          <year>2022</year>
          )
          <fpage>806262</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Porteous</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lindsay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Charles</surname>
          </string-name>
          ,
          <article-title>Communicating agent intentions for human-agent decision making under uncertainty</article-title>
          ,
          <source>in: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems</source>
          , AAMAS '23,
          <string-name>
            <surname>International</surname>
            <given-names>Foundation</given-names>
          </string-name>
          <source>for Autonomous Agents and Multiagent Systems</source>
          , Richland,
          <string-name>
            <surname>SC</surname>
          </string-name>
          ,
          <year>2023</year>
          , p.
          <fpage>290</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>