=Paper=
{{Paper
|id=Vol-3762/543
|storemode=property
|title=The NEMO co-pilot
|pdfUrl=https://ceur-ws.org/Vol-3762/543.pdf
|volume=Vol-3762
|authors=Stefania Costantini,Pierangelo Dell'Acqua,Giovanni De Gasperis,Francesco Gullo,Andrea Rafanelli
|dblpUrl=https://dblp.org/rec/conf/ital-ia/CostantiniDGGR24
}}
==The NEMO co-pilot==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/543.pdf</pdf>
<pre>
                                The NEMO co-pilot
                                Stefania Costantini1,2,*,† , Pierangelo Dell’Acqua3,† , Giovanni De Gasperis1,† , Francesco Gullo1,†
                                and Andrea Rafanelli4,1,†
                                1
                                  Department of Information Engineering, Computer Science and Mathematics, University of L’Aquila, L’Aquila, Italy
                                2
                                  Gruppo Nazionale per il Calcolo Scientifico - INdAM, Roma, Italy
                                3
                                  Department of Science and Technology, Linköping University, Linköping, Sweden
                                4
                                  Department of Computer Science, University of Pisa, Italy


                                                                          Abstract
                                                                          In this work, we describe an agent to be employed in Human-AI Teaming in various, even critical, domains, based upon
                                                                          affective computing, empathy, and Theory of Mind, and a description of the user profile and of the operational, professional,
                                                                          and ethical requirements of the domain in which the agent operates. The architecture of the proposed agent encompasses a
                                                                          Knowledge Graph, a Neural component and a Behaviour Tree. We briefly discuss a case study.

                                                                          Keywords
                                                                          Human-AI Interaction, Human-AI Teaming, Trustworthy AI, Responsible AI


                                1. Introduction                                                                                            AI and humans, if working together in Human-AI
                                                                                                                                        Teaming (HAIT), can produce results exceeding what
                                One recent focus in Artificial Intelligence (AI) is building either can achieve alone, whereas they can control and
                                intelligent systems where humans and AI systems form improve each other. For instance, a human driver might
                                teams. This with the aim of exploiting the potentially train to cope with previously unseen situations through
                                synergistic relationships between human and automa- co-driving automation via a cooperative task shared be-
                                tion, thus devising “hybrid” systems where the partners tween the human driver and the AI-based system in-
                                should cooperate to perform complex tasks, possibly in- stalled on the vehicle. At the same time, AI helps drivers
                                volving a high degree of risk. As a simple example, in an in case of difficulties and immediate risks. In this syn-
                                AI-supported self-driving or assisted-driving vehicle, the ergistic relationship, humans may improve automation
                                AI component can be expected to evaluate and co-manage efficacy and capabilities. At the same time, automation
                                situations and risks, where the driver can provide the AI may enhance human performance in a task and compen-
                                component with useful information on practical driving sate for human inadequacies, catching and correcting
                                in all conditions and can self-manage the risks in the case possible misbehaviors, possibly also due to physically
                                this should be required by the circumstances. Human- or emotionally impaired states, and providing valuable
                                automation interaction is, in fact, one of the main themes suggestions.
                                of Human-centered AI. This issue also falls in the realm                                                   For the tasks of adopting AI agents in crucial tasks
                                of Trustworthy AI, whose requirements include respect such as, e.g., improving caregiving in medicine and teach-
                                for human autonomy, prevention of harm, fairness, and ing and constructing effective human-AI teams, agents
                                explainability, and of Responsible AI, whose goal is to should be endowed with an emotion recognition and
                                employ AI in a safe, trustworthy and ethical fashion.                                                   management module, capable of empathy, and modelling
                                Ital-IA 2023: 3rd National Conference on Artificial Intelligence, orga- aspects of the Theory of Mind (ToM), in the sense of being
                                nized by CINI, May 29–31, 2023, Pisa, Italy                                                             able to reconstruct what someone is thinking or feeling.
                                *
                                  Corresponding author.                                                                                 Modelling a Theory of Mind is often based on forms of
                                †
                                  These authors contributed equally.                                                                    “Affective Computing”, which is a set of techniques aimed
                                $ stefania.costantini@univaq.it (S. Costantini);                                                        at eliciting a human’s emotional condition from physical
                                pierangelo.dellacqua@liu.se (P. Dell’Acqua);
                                                                                                                                        signs, to enable the system to respond intelligently to
                                giovanni.degasperis@univaq.it (G. D. Gasperis);
                                francesco.gullo@univaq.it (F. Gullo); andrea.rafanelli@phd.unipi.it human emotional feedback.
                                (A. Rafanelli)                                                                                             In this work, we describe an agent to be employed in
                                 http://www.di.univaq.it/stefcost (S. Costantini);                                                     HAIT, based upon affective computing, empathy, and
                                https://dellacqua.se/ (P. Dell’Acqua); https://fgullo.github.io/                                        Theory of Mind, and a description of the user profile and
                                (F. Gullo)
                                                                                                                                        of the operational, professional and ethical requirements
                                 0000-0002-5686-6124 (S. Costantini); 0000-0003-3780-0389
                                (P. Dell’Acqua); 0000-0001-9521-471 (G. D. Gasperis);                                                   of the domain in which the agent operates. The archi-
                                0000-0002-7052-1114 (F. Gullo); 0000-0001-8626-2121 (A. Rafanelli) tecture of the proposed agent encompasses a Knowledge
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                           Attribution 4.0 International (CC BY 4.0).                                                   Graph, a Neural component and a Behavior Tree.
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Background                                                     nodes based on the agent’s current affective state. The
                                                                  agent elaborates on the affective state during repeated
2.1. Behavior Trees                                               interactions with the user and then tune its reaction ac-
                                                                  cordingly. Once the ordering has been established, the
Behaviour Trees (BTs) were introduced as a tool to en-
                                                                  emotional selector behaves as a priority selector. A white
able modular AI in computer games. A behavior tree
                                                                  circle with the character E represents an emotional selec-
is essentially a mathematical model of plan execution,
                                                                  tor. In contrast, an empathy node provides an emotional
where each element (task and action) of a plan is associ-
                                                                  evaluation of its single child node. An empathy node can
ated with a node in the tree. Their strength comes from
                                                                  only be a child of an emotional selector. Its child can be a
their ability to create complex tasks composed of simple
                                                                  leaf or an inner node. A dashed circle line with the name
tasks without worrying about how the simple functions
                                                                  of the empathy emotion represents an empathy node.
are implemented. For a comprehensive survey of BTs in
                                                                     To enable the integration of deep learning models for
Artificial Intelligence and Robotic applications, see [1, 2].
                                                                  emotion recognition and symbolic models for planning
A BT is a directed acyclic graph consisting of different
                                                                  and decision-making within emotional behavior trees,
types of nodes, each one associated with executable code
                                                                  we introduced neural nodes. A neural node takes the
(where such code enacts an element composing a plan).
                                                                  current state of the environment and agent as input and,
In most cases, a BT is tree-shaped, hence the name. How-
                                                                  using a deep learning model, makes inferences about the
ever, unlike a traditional tree, a node in a BT can have
                                                                  emotional state. It contains a model, such as an emo-
multiple parents, allowing the reuse of that part of the
                                                                  tion recognition system, that estimates the emotional
tree. The traversal of a behavior tree starts at the top
                                                                  state. These estimates are then mapped into the agent’s
node. When a node is traversed, the associated code is ex-
                                                                  affective state variables that parameterize the emotional
ecuted, returning one of the three states: success, failure,
                                                                  selector. The neural node continually updates the agent’s
or running.
                                                                  internal emotional state, allowing the dynamic adapta-
   The critical nodes in a BT include leaf nodes and inner
                                                                  tion of behavior trees to the emotional context.
nodes. An action is a leaf node representing a behav-
ior that the character can perform. The action returns
success or failure when it completes its execution, de-           2.3. Knowledge Graphs
pending on the outcome. An action is depicted as a white  Knowledge Graphs (KGs) [4, 5] is a particular type of
circle. A condition is a leaf node that checks an internalknowledge base [6] where knowledge is organized in a
or external state. It returns either success or failure. Agraph-like structure, i.e., with triples that define relation-
condition is represented as a grey rounded rectangle. A   ships (edges) among entities (nodes) of interest. KGs are
sequence selector is an inner node that typically has sev-also known as information graphs [7], or heterogeneous
eral child nodes that are executed sequentially. Once a   information networks [8].
child node completes its execution successfully, the se-     KGs have been extensively used in a plethora of ap-
quence selector continues executing the next child node.  plication scenarios, including knowledge completion [9],
If every child node returns success, then the sequence    head/tail prediction [10], rule mining [11], query answer-
selector returns success. If one of the child nodes returning [12], and entity alignment [13, 14, 15]. KGs have
failure, the sequence selector immediately returns failure.
                                                          also recently recently emerged as supporting tools for
A sequence selector is depicted as a grey square with     Retrieval-Augmented Generation (RAG) for Large Lan-
an arrow across the links to its child nodes. A priority  guage Models (LLMs) [16, 17, 18, 19].
selector is an inner node. It has a list of child nodes that it
                                                             A well-established technique that is commonly ex-
tries to execute one at a time until one of the child nodes
                                                          ploited for tasks on KGs is Knowledge graph embeddings
returns success. If none of the child nodes executes suc- (KGEs) [20, 10]. KGEs generate numerical vector repre-
cessfully, the priority selector returns failure. A priority
                                                          sentations for entities and relationships of a KG, thus
selector is represented with a grey circle with a questionmaking them amenable to be processed in downstream
mark.                                                     tasks where a numerical representation is required (e.g.,
                                                          neural network-based machine-learning tasks). Although
2.2. Neural Empathy-Aware Behavior                        KGEs can differ (significantly) from one another in their
       Trees                                              definition, a shared key aspect of all KGEs is that they are
                                                          typically defined based on a so-called embedding scoring
To consider empathy and mimic human decision-making, function or simply embedding score. This function quan-
in [3] we introduced neural empathy-aware behavior trees tifies how likely a triple exists in the KG based on the
(NEABTs) by introducing a selector node called emotional embeddings of the entities and the relationship of that
selector, an empathy node, and a neural node.             triple. Several KGEs have appeared in the last few years.
   The emotional selector is a node that orders its child The distinctive features among embeddings are the score
                                                                                                                                    NEABT
                                                      Sensor
     ENVIRONMENT
                                                                                                              N


 KNOWLEDGE GRAPH
                                     KG                KG-to-   KG '                                          E
                                             KG
    Domain                                            NEABT            (KG', Env)
                 User Profile              encoder
   Knowledge                                          decoder
                                                                                              e1                               e2
                                                                                                              ...


                                                Sensor                                (KG',        A1   A2   ... An    (KG',        A1   A2   ... An
                                                                                      Env)                             Env)


         KG decoder
                                  User
                                Feedback
       KGNEABT-to-KG
          encoder                              USER                                      Agent's        Agent Action   Detected User
                                                                                       suggested                         Emotion
                                                                                      action to User


                                               Aggregator


Figure 1: The NEMO co-pilot framework


function and the optimization loss. Translational embed-                      to the User. The Aggregator may perform something
dings in the TransE [21] family and the recent PairRE [22]                    either very simple (e.g., just derive a textual representa-
assumes that the relationship of a triple performs a trans-                   tion of the three outputs and concatenate them) or more
lation between the entities of that triple. Semantic em-                      sophisticated (e.g., exploit a large language model (LLM)).
beddings, such as DistMult [11] or HolE [23], interpret                       BT’s outputs and User’s feedback are used to update back
the relationship as a multiplicative operator. Complex                        the KG. This way, we have a loop-back mechanism in
embeddings, such as RotatE [24] and ComplEx [25], use                         which the KG is exploited by the BT for its internals, and
complex-valued vectors and operations in the complex                          the BT is exploited to update the KG properly.
plane. Neural-network embeddings, such as ConvE [26],                            Next, we describe the User, Environment, KG, and
perform sequences of nonlinear operations.                                    NEABT components in more detail.
                                                                              User. The User performs reactions and actions based
3. Framework                                                                  on the signals provided by the NEABT. The user’s sen-
                                                                              sory data flow into the NEABT through a sensor, which
The architecture of the proposed agent is illustrated in                      represents them in some proper numerical format. Also,
Figure 1. The main components of the architecture are                         the User’s feedback—e.g., whether (or to which extent)
the User, the Environment, a Knowledge Graph (KG), and a                      the User has adopted the Agent’s suggested action—is
Behavior Tree (BT). The overall interaction between such                      sent back to the KG. User’s reactions/actions are assumed
components is described next.                                                 to be determined by all three types of BT’s output. In
   The BT is fed with signals from Environment, User, and                     particular, the User’s emotion detected by the BT at the
KG. Such signals are exploited by the BT to perform its                       previous iteration is important for establishing the emo-
computation and to output (𝑖) an action to be suggested                       tional conditions that most influence the user.
to the User, (𝑖𝑖) an action actually performed by the agent                   Environment. Signals from the surrounding environ-
(e.g., an empathetic action), and (𝑖𝑖𝑖) User’s emotion de-                    ment are detected by a sensor, representing them in some
tected by its neural node (‘N’, see below). Threefold BT’s                    numerical format, and are thus ready to be processed by
output passes through an “Aggregator”, responsible for                        the NEABT (along with the KG representation).
suitably aggregating and presenting three BT’s outputs                        KG. The KG contains information about domain knowl-
edge and user profile. KG’s information is provided to         ous driving-related tasks, even under challenging scenar-
the NEABT in a twofold form. It is first encoded in some       ios. In this synergistic relationship during the training
proper numerical format and passed to BT’s neural node         phase, humans enhance the effectiveness of automation
(see below). The encoding is performed by a KG encoder         (capabilities and performance). At the same time, the
component, which can be implemented, e.g., with a KGE          agent installed in each vehicle improves human efficiency
(see Section 2.3). KG’s encoded information is then de-        and compensates for human inadequacies by intercepting
coded into a format suitable for processing by the internal    and correcting potential erroneous behaviours, possibly
nodes of the BT. A KG-to-BT decoder performs KG’s in-          resulting from compromised physical or emotional states.
formation decoding. This can be implemented, e.g., as a           Potential intervention modes for the agent to assist
neural network component whose training can be per-            a struggling driver could include automatically activat-
formed on a ground truth defined through either manual         ing (semi-)autonomous driving mode (if available) so the
annotation or the agent’s historical data. The KG is fed       driver can momentarily divert their attention. Alterna-
BT’s output and user feedback. Such data in input to           tively, the agent could more actively engage with the
the KG are represented in a format suitable for updating       driver to regain attentiveness, such as by recommending
the KG, e.g., a set of KG triples should be added and a        stimulating music on a dedicated radio station. In case of
set of KG triples removed. Such a translation from BT’s        health issues, the agent could recommend pulling over
and User’s signals to KG updating signal is performed          to rest or take medication (e.g., for hypertension) or, in
by a further encoder-decoder component. Again, such            critical cases, seek emergency assistance by contacting
an encoder-decoder can be implemented as a neural net-         emergency services.
work and trained with a ground truth defined manually
or through historical data.
NEABT. The NEMO framework deploys a NEABT as a
                                                               References
behavior tree. The BT’s neural node receives the KG’s           [1] M. Colledanchise, P. Ögren, Behavior trees in
information and the user’s sensory data and makes infer-            robotics and AI: An introduction, CRC Press, 2018.
ences about the user’s emotional state. These estimates         [2] M. Iovino, E. Scukins, J. Styrud, P. Ögren, C. Smith,
are mapped into the user affective state variables that             A survey of behavior trees in robotics and ai,
parametrize the neural node child, the emotional selector.          Robotics and Autonomous Systems 154 (2022)
In turn, the emotional state selector passes the values of          104096.
the affective state variables to its child nodes, empathy       [3] S. Costantini, P. Dell’Acqua, G. De Gasperis,
nodes. Each child empathy node provides an empathic                 A. Rafanelli, Empowering emotional behavior trees
evaluation of its subtree. In Figure 1, every subtree has           with neural computation for digital forensic, 15th
a root node that is a sequence selector with a condition            European Symposium on Computational Intelli-
node as a child and several action nodes. The condition             gence and Mathematics (ESCIM 2024) (in press).
child node returns success/failure by performing a test         [4] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato,
condition upon the input pair (KG’, Env). The correspond-           G. de Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo,
ing action child nodes are executed if the condition node           R. Navigli, S. Neumaier, A. N. Ngomo, A. Polleres,
returns success. By doing so, the NEABT can execute ac-             S. M. Rashid, A. Rula, L. Schmelzeisen, J. F. Sequeda,
tions over the environment. Some of these action nodes              S. Staab, A. Zimmermann, Knowledge graphs, ACM
define the BT threefold output.                                     CSUR 54 (2022) 71:1–71:37.
                                                                [5] G. Weikum, Knowledge graphs 2021: A data
                                                                    odyssey, PVLDB 14 (2021) 3233–3238.
4. Case Study: Driver Co-Pilot                                  [6] O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Sub-
Here, we envision a case study that involves developing             ramaniam, A. Rajaraman, V. Harinarayan, A. Doan,
an intelligent agent that actively functions as a "compan-          Building, maintaining, and using knowledge bases:
ion" (co-driver) and support system for drivers. The agent          a report from the trenches, in: SIGMOD, 2013, pp.
will assist drivers by providing interventions in risky sit-        1209–1220.
uations that may arise due to external circumstances            [7] M. Lissandrini, D. Mottin, T. Palpanas, D. Papadim-
and/or the driver’s health condition and emotional state,           itriou, Y. Velegrakis, Unleashing the power of in-
taking into account emotional aspects that could impact             formation graphs, ACM SIGMOD Record 43 (2015)
driving performance.                                                21–26.
   The intelligent agent will also be trained through inter-    [8] C. Shi, Y. Li, J. Zhang, Y. Sun, S. Y. Philip, A sur-
action with the human user following the recent "Human-             vey of heterogeneous information network analysis,
AI teaming" paradigm. A human driver could coopera-                 TKDE 29 (2016) 17–37.
tively train the agent by collaboratively performing vari-      [9] X. Wang, L. Chen, T. Ban, M. Usman, Y. Guan, S. Liu,
     T. Wu, H. Chen, Knowledge graph quality control:           Convolutional 2d knowledge graph embeddings, in:
     A survey, Fundamental Research 1 (2021) 607–626.           AAAI, 2018.
[10] S. Ji, S. Pan, E. Cambria, P. Marttinen, S. Y. Philip,
     A survey on knowledge graphs: Representation,
     acquisition, and applications, Trans. Neural Netw.
     Learn. Syst. 33 (2021) 494–514.
[11] B. Yang, S. W.-t. Yih, X. He, J. Gao, L. Deng, Embed-
     ding entities and relations for learning and infer-
     ence in knowledge bases, in: ICLR, 2015.
[12] Y. Wu, Y. Xu, X. Lin, W. Zhang, A holistic approach
     for answering logical queries on knowledge graphs,
     in: ICDE, 2023, pp. 2345–2357.
[13] S. S. Bhowmick, E. C. Dragut, W. Meng, Glob-
     ally aware contextual embeddings for named entity
     recognition in social media streams, in: ICDE, 2023,
     pp. 1544–1557.
[14] J. Huang, Z. Sun, Q. Chen, X. Xu, W. Ren, W. Hu,
     Deep active alignment of knowledge graph entities
     and schemata, PACMMOD 1 (2023) 159:1–159:26.
[15] A. Zeakis, G. Papadakis, D. Skoutas, M. Koubarakis,
     Pre-trained embeddings for entity resolution: An
     experimental analysis, PVLDB 16 (2023) 2225–2238.
[16] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai,
     J. Sun, Q. Guo, M. Wang, H. Wang, Retrieval-
     augmented generation for large language models:
     A survey, CoRR abs/2312.10997 (2023).
[17] S. Hao, T. Liu, Z. Wang, Z. Hu, ToolkenGPT: Aug-
     menting frozen language models with massive tools
     via tool embeddings, in: NeurIPS, 2023.
[18] X. Wang, Q. Yang, Y. Qiu, J. Liang, Q. He, Z. Gu,
     Y. Xiao, W. Wang, KnowledGPT: Enhancing large
     language models with retrieval and storage access
     on knowledge bases, CoRR abs/2308.11761 (2023).
[19] J. Zhang, Graph-toolformer: To empower LLMs
     with graph reasoning ability via prompt augmented
     by ChatGPT, CoRR abs/2304.11116 (2023).
[20] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge
     graph embedding: A survey of approaches and ap-
     plications, TKDE 29 (2017) 2724–2743.
[21] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston,
     O. Yakhnenko, Translating embeddings for model-
     ing multi-relational data, NeurIPS 26 (2013).
[22] L. Chao, J. He, T. Wang, W. Chu, PairRE: Knowledge
     graph embeddings via paired relation vectors, in:
     ACL, 2021, pp. 4360–4369.
[23] M. Nickel, V. Tresp, H.-P. Kriegel, et al., A three-way
     model for collective learning on multi-relational
     data, in: ICML, 2011, pp. 3104482–3104584.
[24] Z. Sun, Z. Deng, J. Nie, J. Tang, RotatE: Knowledge
     graph embedding by relational rotation in complex
     space, in: ICLR, 2019.
[25] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier,
     G. Bouchard, Complex embeddings for simple link
     prediction, in: ICML, 2016, pp. 2071–2080.
[26] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel,

</pre>