NEMO - A Neural, Emotional Architecture
                         for Human-AI Teaming
                         Stefania Costantini1,2,*,† , Pierangelo Dell’Acqua3,† , Giovanni De Gasperis1,† ,
                         Francesco Gullo1,† and Andrea Rafanelli4,1,†
                         1
                           Department of Information Engineering, Computer Science and Mathematics, University of L’Aquila, L’Aquila, Italy
                         2
                           Gruppo Nazionale per il Calcolo Scientifico - INdAM, Roma, Italy
                         3
                           Department of Science and Technology, Linköping University, Linköping, Sweden
                         4
                           Department of Computer Science, University of Pisa, Italy


                                     Abstract
                                     In this work, we propose a novel architecture for agents to be employed in Human-AI Teaming in various, even
                                     critical, domains based upon affective computing, empathy, and Theory of Mind, and a description of the user
                                     profile and the operational, professional, and ethical requirements of the domain in which the agent operates.
                                     In this paper, we outline the architecture’s building blocks and their interconnections. The architectural design
                                     agent encompasses a Knowledge Graph to enclose the above-mentioned kinds of knowledge and a Behavior Tree
                                     enhanced via a Neural component, where the latter elaborates sensor input from devices that monitor the user
                                     and input from the knowledge graph. The enhanced behavior tree actually interacts with the user, making actions
                                     or providing suggestions, and returns feedback to feed to the knowledge graph as a novelty in the literature. We
                                     briefly present a case study, on which to experiment once the implementation, which is presently at an initial
                                     stage, will be completed. We discuss in some detail the Prolog program implementing the behavior tree, and
                                     discuss why we chose Prolog.

                                     Keywords
                                     Human-AI Interaction, Human-AI Teaming, Trustworthy AI, Responsible AI


                         1. Introduction
                         One recent focus in Artificial Intelligence (AI) is building intelligent systems where humans and AI
                         systems form teams. This to exploit the potentially synergistic relationships between human and
                         automation, thus devising “hybrid” systems where the partners should cooperate to perform complex
                         tasks, possibly involving a high degree of risk. As a simple example, in an AI-supported self-driving or
                         assisted-driving vehicle, the AI component can be expected to evaluate and co-manage situations and
                         risks, where the driver can provide the AI component with useful information on practical driving in
                         all conditions and can self-manage the risks in the case this should be required by the circumstances.
                         Human-automation interaction is, in fact, one of the main themes of Human-centered AI. This issue
                         also falls in the realm of Trustworthy AI, whose requirements include respect for human autonomy,
                         prevention of harm, fairness, and explainability, and of Responsible AI, whose goal is to employ AI in a
                         safe, trustworthy and ethical fashion.
                            AI and humans, if working together in Human-AI Teaming (HAIT), can produce results exceeding
                         what either can achieve alone, whereas they can control and improve each other. For instance, a human
                         driver might train to cope with previously unseen situations through co-driving automation via a
                         cooperative task shared between the human driver and the AI-based system installed on the vehicle. At

                         CILC 2024 – 39th Italian Conference on Computational Logic, June 26–28, 2024, Rome, Italy.
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                         $ stefania.costantini@univaq.it (S. Costantini); pierangelo.dellacqua@liu.se (P. Dell’Acqua); giovanni.degasperis@univaq.it
                         (G. De Gasperis); francesco.gullo@univaq.it (F. Gullo); andrea.rafanelli@phd.unipi.it (A. Rafanelli)
                          https://www.disim.univaq.it/StefaniaCostantini (S. Costantini); https://dellacqua.se/ (P. Dell’Acqua);
                         https://www.disim.univaq.it/GiovanniDeGasperis (G. De Gasperis); https://fgullo.github.io/ (F. Gullo)
                          0000-0002-5686-6124 (S. Costantini); 0000-0003-3780-0389 (P. Dell’Acqua); 0000-0001-9521-4711 (G. De Gasperis);
                         0000-0002-7052-1114 (F. Gullo); 0000-0001-8626-2121 (A. Rafanelli)
                                  © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
the same time, AI helps drivers in case of difficulties and immediate risks. In this synergistic relationship,
humans may improve automation efficacy and capabilities. At the same time, automation may enhance
human performance in a task and compensate for human inadequacies, catching and correcting possible
misbehaviors, possibly also due to physically or emotionally impaired states, and providing valuable
suggestions.
   For the tasks of adopting AI agents in crucial tasks such as, e.g., improving care-giving in medicine
and teaching and constructing effective human-AI teams, agents should be endowed with an emotion
recognition and management module, capable of empathy and modelling aspects of the Theory of Mind
(ToM), in the sense of being able to reconstruct what someone is thinking or feeling. Modelling a Theory
of Mind is often based on forms of “Affective Computing”, which is a set of techniques aimed at eliciting
a human’s emotional condition from physical signs, to enable the system to respond intelligently to
human emotional feedback.
   In this work, we devise the architectural design of an agent to be employed in HAIT, based upon
affective computing, empathy, and Theory of Mind, and a description of the user profile and of the
operational, professional and ethical requirements of the domain in which the agent operates. Our
main contribution is the design of the architecture, which includes a Knowledge Graph (KG), a Neural
component, and a Behavior Tree (BT) as its main components. As such, we term the proposed framework
NEMO (“Neural EMOtional”).
   The KG will encompass definitions concerning all the required kinds of knowledge, plus a history of
past interactions of the agent with the user. The BT enhanced via a Neural component, where the latter
elaborates sensor input from devices that monitor the user and input from the KG, will actually interact
with the user, making actions or providing suggestions, and return feedback to feed to the KG. This
two-way relationship between KG and BT constitutes a novelty in the literature.
   We provide a Prolog implementation of the BT and its relationships with the other components of the
proposed NEMO framework. We have chosen Prolog for our implementation for reasons of flexibility
and readability apt for the task.
   The paper is organized as follows. Section 2 provides some background on behavior trees and knowl-
edge graphs. Section 3 provides the overview of the envisaged framework. The Prolog implementation
of the BT component is illustrated in Section 4, and a relevant case study for the future implemented
system is outlined in Section 4.1. After discussing related work and some open issues in Section 5, we
conclude the paper in Section 6.


2. Background
2.1. Behavior Trees
Behavior trees were introduced as a tool to enable modular AI in computer games. A BT is essentially a
mathematical model of plan execution, where each element (task and action) of a plan is associated
with a node in the tree. Their strength comes from their ability to create complex tasks composed of
simple tasks without worrying about how the simple functions are implemented. For a comprehensive
survey of BTs in Artificial Intelligence and Robotic applications, see [1, 2]. A BT is a directed acyclic
graph consisting of different types of nodes, each associated with executable code (where such code
enacts an element composing a plan). In most cases, a BT is tree-shaped, hence the name. However,
unlike a traditional tree, a node in a BT can have multiple parents, allowing the reuse of that part of the
tree. The traversal of a behavior tree starts at the top node. When a node is traversed, the associated
code is executed, returning one of the three states: success, failure, or running.
   The critical nodes in a BT include leaf nodes and inner nodes. An action is a leaf node representing a
behavior that the character can perform. The action returns success or failure when it completes its
execution, depending on the outcome. An action is depicted as a white circle. A condition is a leaf node
that checks an internal or external state. It returns either success or failure. A condition is represented
as a grey rounded rectangle. A sequence selector is an inner node with several child nodes executed
sequentially. Once a child node completes its execution successfully, the sequence selector continues
executing the next child node. If every child node returns success, then the sequence selector returns
success. If one of the child nodes returns failure, the sequence selector immediately returns failure.
A sequence selector is depicted as a grey square with an arrow across the links to its child nodes. A
priority selector is an inner node. It has a list of child nodes that it tries to execute one at a time until
one of the child nodes returns success. If none of the child nodes executes successfully, the priority
selector returns failure. A priority selector is represented with a grey circle with a question mark.

2.2. Neural Empathy-Aware Behavior Trees
To consider empathy and mimic human decision-making, in [3] we introduced neural empathy-aware
behavior trees (NEABTs) by introducing a selector node called emotional selector, an empathy node, and
a neural node.
   The emotional selector is a node that orders its child nodes based on the agent’s current affective
state. The agent elaborates on the affective state during repeated interactions with the user and then
tunes its reaction accordingly. Once the ordering has been established, the emotional selector behaves
as a priority selector. A white circle with the character E represents an emotional selector. In contrast,
an empathy node provides an emotional evaluation of its single child node. An empathy node can only
be a child of an emotional selector. Its child can be a leaf or an inner node. A dashed circle line with the
name of the empathy emotion represents an empathy node.
   To enable the integration of deep learning models for emotion recognition and symbolic models for
planning and decision-making within emotional behavior trees, we introduced neural nodes. A neural
node takes the current state of the environment and agent as input and, using a deep learning model,
makes inferences about the emotional state. It contains a model, such as an emotion recognition system,
that estimates the emotional state. These estimates are then mapped into the agent’s affective state
variables that parameterize the emotional selector. The neural node continually updates the agent’s
internal emotional state, allowing the dynamic adaptation of behavior trees to the emotional context.
   At present, machine learning algorithms can help classify individuals’ emotions depending on the
input data. Emotions can be recognized from a wide spectrum of input data, encompassing physiological
data, speech [4], facial expression [5, 6], and behavioral change [7]. Physiological input data encompasses
factors like heart rate, frequency of respiratory movements, sweating and skin-galvanic reaction, and
EEG (electroencephalogram) signals [8, 9, 10, 11, 12].

2.3. Knowledge Graphs
Knowledge Graphs are a particular type of knowledge base [13] consisting of sets of facts (i.e., triples
such as “Dante,” “wrote,” “Divine Comedy”) that interconnect entities (“Dante,” “Divine Comedy”) via
relationships (“wrote”) [14, 15]. Entities and relationships correspond to nodes and (labelled) edges of
the KG, respectively. KGs have been extensively used in a plethora of application scenarios, including
knowledge completion [16], head/tail prediction [17], rule mining [18], query answering [19], and entity
alignment [20, 21, 22]. KGs are also known as information graphs [23], or heterogeneous information
networks [24].
   A noteworthy technique that is commonly exploited for tasks on KGs is Knowledge graph embeddings
(KGEs) [25], which aims at generating a vector representation for entities and relationships of a KG.
   In this work, we use KGs to represent (various kinds of) knowledge in the proposed NEMO framework.
The main reasons why we employ KGs and prefer it over other types of knowledge base are as follows.

    • First, the graph structure underlying KGs allows for capturing well the interrelations between the
      various entities of interest, and having such interrelations always and easily available, without
      performing possibly expensive operations (e.g., join operations in relational databases). This is
      eagerly needed in our context, as the proposed NEMO framework needs to continuously select
      specific portions of knowledge to be passed to the NEABT (see Section 3). Thus, this operation
      needs to be done efficiently.
        Figure 1: The proposed NEMO framework


    • Second, KGs allow for the representation of heterogeneous entities and relationships among
      them.
    • Third, KGs are highly flexible and versatile, as the knowledge therein need not to be organized
      according to any a-priori fixed schema. As such, KGs can easily integrate knowledge from different
      sources and perform updates on the acquired knowledge. Support for source heterogeneity and
      flexibility/versatility are particularly required for the proposed NEMO framework. In fact, as
      detailed in Section 3, the KG in NEMO encompasses different types of knowledge, possibly coming
      from different sources (i.e., domain knowledge of various forms and user profiles). Also, our
      NEMO framework has a mechanism for continuously updating the KG.
    • Lastly, KGs can easily, yet effectively be represented as numerical vectors. This is required in
      our context in order to make KGs amenable to be processed by components of the proposed
      NEMO framework which do require numerical representations (i.e., the NEABT component, see
      Section 3). KGE techniques (see above) can represent KGs numerically. These are well-established
      techniques for which one can exploit the results of the corresponding research area, which has
      been very active and fruitful in the last few years.


3. Framework
The architecture of the proposed NEMO framework is illustrated in Figure 1. The main components of
the architecture are the User, the Environment, a Knowledge Graph, and a Behavior Tree, specifically a
NEABT. In this framework, the agent is identified by the NEABT. The overall interaction between the
components of the framework is described next.
   The NEABT is fed with signals from Environment, User, and KG. Such signals are exploited by the
NEABT to perform its computation and to output (𝑖) an action to be suggested to the User, (𝑖𝑖) an action
actually performed by the agent (e.g., an empathetic action), and (𝑖𝑖𝑖) User’s emotion detected by its
neural node (‘N’, see below). Threefold NEABT’s output passes through an “Aggregator”, responsible
for suitably aggregating and presenting NEABT’s outputs to the User. The Aggregator may perform
something either very simple (e.g., derive a textual representation of the three outputs and concatenate
them) or more sophisticated (e.g., exploit a large language model (LLM)). NEABT’s outputs and User’s
feedback are used to update back the KG. This way, we have a two-way, loop-back mechanism in which
the NEABT exploits the KG for its internals, and the NEABT is exploited to update the KG properly.
  In the following, we describe the User, Environment, KG, and NEABT components in more detail.
User. The User performs reactions and actions based on the signals provided by the NEABT. Sensory
data from users flow into the NEABT through a sensor, which represents them in a proper numerical
format. Also, the User’s feedback – e.g., whether (or to which extent) the User has adopted the Agent’s
suggested action – is sent back to the KG. User’s reactions/actions are assumed to be determined by all
three NEABT output types. In particular, the User’s emotion detected by the NEABT at the previous
iteration is essential for establishing the emotional conditions that most influence the User. In fact, the
User likely performs certain actions only under certain emotional circumstances. Thus, the emotional
conditions that determine a certain User’s behavior are an important signal to consider for both the
presentation of NEABT’s output to the User and the subsequent KG update step.
Environment. A sensor detects signals from the surrounding environment and represents them in
some numerical format. Thus, the NEABT (along with the KG representation) is ready to process them.
KG. The Knowledge Graph (KG) contains information about domain knowledge and user profiles. KG’s
information is provided to the NEABT in a twofold form. It is first encoded in some proper numerical
format and passed to NEABT’s neural node (see below). The encoding is performed by a KG encoder
component, which can be implemented, e.g., with a KGE (see Section 2.3). KG’s encoded information is
then decoded into a format suitable for processing by the internal nodes of the NEABT. A KG-to-BT
decoder performs KG’s information decoding. This can be implemented, e.g., as a neural network
component whose training can be performed on a ground truth defined through manual annotation or
the agent’s historical data. The KG is fed with NEABT’s output and user feedback. Such data in input to
the KG need to be translated in a format suitable for updating the KG; for example, a set of KG triples
to be added and a set of KG triples to be removed. A further encoder-decoder component performs this
translation. Again, such an encoder-decoder can be implemented as a neural network and trained with
a ground truth defined manually or through historical data.
   Note that more sophisticated implementations and training strategies of the two encoder-decoder
components may be possible. For instance, one could consider training the two encoder-decoder
components simultaneously in an end-to-end fashion. This is expected to increase the effectiveness of
those components. However, a downside of this solution is that it is technically challenging. A major
hardness in this regard consists in making the NEABT a “differentiable” component so that it can be
safely involved in the envisaged end-to-end neural training setting. For this reason, this could be an
interesting direction of research that deserves dedicated effort, and we defer to future work.
   Another interesting technical challenge is how to self-supervise the training of the two encoder-
decoder components to overcome the burden of building a ground truth. An idea in this regard could
be to mask a subset of the triples of the KG and use them in substitution for the ground truth. However,
this leaves non-trivial technical questions open, such as how to select the triples of the KG to be masked
and, more importantly, how to map the output of the NEABT into KG triples. For this reason, we leave
this problem for future work as well.
   Finally, a further intriguing idea – still deferred to future work – consists in avoiding the encoder-
decoder components at all, and letting the interaction between the KG and the NEABT be carried out in
an unsupervised way.
NEABT. The NEMO framework deploys a NEABT as a behavior tree. NEABT’s neural node receives
the KG’s information and the user’s sensory data and makes inferences about the user’s emotional state.
These estimates are mapped into the user affective state variables that parametrize the neural node
child, the emotional selector. In turn, the emotional state selector passes the values of the affective state
variables to its child nodes, empathy nodes. Each child empathy node provides an empathic evaluation
of its subtree. In Figure 1, every subtree has a root node, a sequence selector with a condition node
as a child and several action nodes. The condition child node returns success/failure by performing a
test condition upon the input pair (KG’, Env). The corresponding action child nodes are executed if the
condition node returns success. By doing so, the NEABT can execute actions over the environment.
Some of these action nodes define the NEABT threefold output.
   Specifically, in the envisaged class of applications, to Human-AI Teaming, the Knowledge Graph
will include the user profile along with the history of past interactions, allowing the KG to interact
with the NEABT in two ways: by passing data from the user profile to the neural node indicating the
user’s “classification” among several possibilities (in Section 4 some basic examples will be provided),
and then passing to the leaves the type of action most suitable for that user (here too, we will provide
some minimal example). Additionally, the KG should encompass an ontology on driving to assess, for
instance, whether the user’s action is allowed and/or ethical.


4. Prolog implementation
The implementation of the envisaged system is in an embryonic phase. However, we have first chosen
to complete the implementation of our NEABT, making its connections with the other architecture
components explicit. To do so, we chose to employ Prolog for the following reasons:

    • The Prolog representation of a BT is easily readable and modifiable;
    • Prolog rules provide the modularity and flexibility to represent the various components of the BT;
    • Due to Prolog’s fast prototyping capabilities, the implementation is ready to use and to be
      connected with any other component when ready.

   In Figure 2, we show the first implementation of our NEABT. This implementation adheres to the
workflow described in Figure 1: the neural network component processes knowledge graph embeddings
and sensor data from the user to estimate the user’s emotional state and related probability. The emotion
selector component uses the predicted emotional state and context from the knowledge graph to select
relevant sub-trees or nodes representing potential empathetic responses or actions. The context and
action evaluation component tests the conditions. It executes the appropriate actions from the selected
empathetic nodes, considering the user’s emotional state and the context from the knowledge graph.
If an action fails or the context changes, the system can select and execute alternative empathetic
responses or actions based on the updated information.
   Accordingly, the main predicates of our program are:

    • 𝑛𝑒𝑎𝑏𝑡_𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒/2, it extracts all the structural knowledge about the behavior trees described
      utilizing the 𝑐ℎ𝑖𝑙𝑑/3 facts.
    • 𝑛𝑒𝑢𝑟𝑎𝑙_𝑛𝑜𝑑𝑒/4, (node N in Figure 1) this predicate encodes a neural network (indicated as nn,
      and accessed via a dedicated plugin). It takes as input the user sensor data, the KG embeddings,
      and the associated probability values from which it estimates the recognized emotional state.
    • 𝑒𝑚𝑜_𝑠𝑒𝑙𝑒𝑐𝑡𝑜𝑟/6, (node E in Figure 1) this predicate is responsible for selecting relevant sub-
      trees or nodes based on the user’s emotional state, the associated probability values, the con-
      text provided by the knowledge graph decoder and the environment state. It uses knowledge
      about the NEABT structure to discover nodes that succeed given the current situation using the
      𝑒𝑚𝑝𝑎𝑡ℎ𝑦 _𝑛𝑜𝑑𝑒_𝑠𝑢𝑐𝑐𝑒𝑠𝑠/2 predicate.
    • 𝑒𝑚𝑝𝑎𝑡ℎ𝑦 _𝑛𝑜𝑑𝑒_𝑠𝑢𝑐𝑐𝑒𝑠𝑠/2 (nodes [e1, .., en] in Figure 1), this predicate attempts to execute the
      sequence of actions within a sub-tree while testing the context provided by the knowledge graph
      decoder (𝐾𝐺𝑑) and environment context (𝐸𝑛𝑣), recursively descending the tree; if a condition
      is met, it then launches the respective sequence of actions within the sub-tree or descends to
      sub-nodes. If an action fails, either because the user does not “comply" or impeding obstacles
      show up, it returns the failure to the upper selector node 𝑒𝑚𝑜_𝑠𝑒𝑙𝑒𝑐𝑡𝑜𝑟/6, so that the next best
      empathy node can be selected according to the probability ranking.
    • 𝑒𝑥𝑒𝑐𝑢𝑡𝑒_𝑎𝑐𝑡𝑖𝑜𝑛/1, it generates commands to the agent to execute the action as an external
      plugin. Each atomic action can fail independently, even if the context is appropriate. If a failure
      occurs, it returns the failure signal to the main 𝑒𝑚𝑜_𝑠𝑒𝑙𝑒𝑐𝑡𝑜𝑟/6 predicate, which then selects the
      next best emotional sub-tree.
Figure 2: Prolog prototype implementation


    • 𝑛𝑒𝑎𝑏𝑡/4 is the main entry point, combining the neural network node, emotion selector, KG,
      environment and execution of appropriate empathy nodes. The predicate 𝑛𝑒𝑢𝑟𝑎𝑙_𝑐𝑜𝑑𝑒𝑐 en-
      codes the KG into a suitable format for the neural network (𝐾𝐺𝐸𝑚𝑏), and generates a decode
      representation (𝐾𝐺𝑑) that provide the context.

4.1. Motivating example: Driver Co-Pilot
Here, we envision a case study that involves developing an intelligent agent that actively functions as
a "companion" (co-driver) and support system for drivers. The agent will assist drivers by providing
interventions in risky situations that may arise due to external circumstances and/or the driver’s
health condition and emotional state, taking into account emotional aspects that could impact driving
performance.
   The intelligent agent will also be trained through interaction with the human user following the
recent "Human-AI teaming" paradigm. A human driver could cooperatively train the agent by collabo-
ratively performing various driving-related tasks, even under challenging scenarios. In this synergistic
relationship during the training phase, humans enhance the effectiveness of automation (capabilities
and performance). At the same time, the agent installed in each vehicle improves human efficiency and
compensates for human inadequacies by intercepting and correcting potential erroneous behaviors,
possibly resulting from compromised physical or emotional states.
   Potential intervention modes for the agent to assist a struggling driver could include automatically
activating (semi-)autonomous driving mode (if available) so the driver can momentarily divert their
attention. Alternatively, the agent could more actively engage with the driver to regain attentiveness,
for instance, by recommending stimulating music on a dedicated radio station. In case of health issues,
the agent could recommend pulling over to rest or take medication (e.g., for hypertension) or, in critical
cases, seek emergency assistance by contacting emergency services.
   Consider, for instance, the case of truck drivers. Trucks come equipped with tachographs, devices that
capture various data, including the operational hours of the vehicle and the driver’s behavior concerning,
e.g., speed, stops, etc. Regulations govern drivers’ activities for safety and compliance purposes. The
data necessary for checking compliance are stored within the tachograph to identify any breaches and
generate a report detailing any violations that can lead to fines (also from the truck company) or even
constitute a criminal offence. The KG can be interfaced with such devices to gather the data to feed the
BT so as to issue suitable actions to help the driver avoid or alleviate violations, possibly explaining
their reasons (e.g., too much stop because of some driver’s discomfort). Importantly, our system might
detect severe physical or psychological pain and alert human operators and emergency services.
   Accordingly, in our use case, the system considers the following values for its decision-making
process:
    • The neural network can detect and categorize the driver’s emotional state into one of the following
      values: [sad, angry, happy, neutral, tired]
    • Based on the assessed emotional state and environmental factors, the system can select from a
      set of predefined actions to assist the driver:
        1. advisory, provide advisory messages or recommendations to the driver.
        2. explicative, offer explanatory information or guidance to the driver.
        3. stop, recommend or initiate the process of stopping the vehicle in critical situations.
        4. reassurance, deliver reassuring messages to alleviate the driver’s stress or frustration.
    • The system continuously monitors the driving environment through various sensors, collecting
      data such as:
        1. 𝑟𝑜𝑎𝑑_𝑠𝑡𝑎𝑡𝑢𝑠, information about the current road conditions, traffic, and potential hazards.
        2. 𝑑𝑟𝑖𝑣𝑖𝑛𝑔_ℎ𝑜𝑢𝑟𝑠, the duration of the current driving session.
        3. 𝑠𝑝𝑒𝑒𝑑_𝑑𝑎𝑡𝑎, the vehicle’s speed data, including also current speed and historical speed
           patterns.
        4. 𝑠𝑡𝑜𝑝_𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛, the frequency and duration of stops taken during the driving session.
  To provide an example, consider the following scenario: a user is driving, and the system receives
environment sensor data [road_status(busy), driving_hours(5), speed_data(120, 150), stop_duration(1,
30)], indicating prolonged driving at high speeds with few stops. The emotional state detected is also
identified as tired. Based on this input, our framework reasoning flow is as follows:
   1. The neural network node processes the user sensor data and detects the emotional state tired.
   2. The emotion selector identifies relevant nodes based on the tired emotional state.
   3. The empathy node execution evaluates the user’s state and selects an appropriate action, such as
      an advisory:
      AgentAction = advisory_message("Slow down and take a break soon.").
5. Related Work and Discussion
Neural architecture and the role of emotions. According to Damasio [26], emotions are unconscious
reactions to internal or external stimuli that activate neural patterns in the brain. Emotions are innate
reactions of the brain that are expressed through facial expressions, body language, and attitudes [27].
They affect the way people feel (consciously or unconsciously) since feelings are mental experiences
of body states, which arise as the brain interprets emotions [28]. That, in turn, triggers changes in
behavior and well-being.
   The NEMO architecture follows Damasio’s definition of emotions. Emotions are elicited in the neural
nodes of NEABTs from both the environment and the agent’s input. The recognized emotions form the
affective state of the agent. The agent is conscious of the emotions in its affective state. The emotional
values are then passed down in the behavior tree to the emotional and empathy nodes. Doing so triggers
changes in the agent’s behavior.
Knowledge graphs. Knowledge Graphs [14, 15] is a particular type of knowledge base [13] where
knowledge is organized in a graph-like structure, i.e., with triples that define relationships (edges)
among entities (nodes) of interest. KGs are also known as information graphs [23], or heterogeneous
information networks [24].
   KGs have been extensively used in a plethora of application scenarios, including knowledge com-
pletion [16], head/tail prediction [17], rule mining [18], query answering [19], and entity align-
ment [20, 21, 22]. KGs have also recently recently emerged as supporting tools for Retrieval-Augmented
Generation (RAG) for Large Language Models (LLMs) [29, 30, 31, 32].
   A well-established technique that is commonly exploited for tasks on KGs is Knowledge graph embed-
dings (KGEs) [25, 17]. KGEs generate numerical vector representations for entities and relationships of a
KG, thus making them amenable to be processed in downstream tasks where a numerical representation
is required (e.g., neural network-based machine-learning tasks). Although KGEs can differ (significantly)
from one another in their definition, a shared key aspect of all KGEs is that they are typically defined
based on a so-called embedding scoring function or simply embedding score. This function quantifies
how likely a triple exists in the KG based on the embeddings of the entities and the relationship of that
triple. Several KGEs have appeared in the last few years. The distinctive features among embeddings
are the score function and the optimization loss. Translational embeddings in the TransE [33] family
and the recent PairRE [34] assumes that the relationship of a triple performs a translation between
the entities of that triple. Semantic embeddings, such as DistMult [18] or HolE [35], interpret the
relationship as a multiplicative operator. Complex embeddings, such as RotatE [36] and ComplEx [37],
use complex-valued vectors and operations in the complex plane. Neural-network embeddings, such as
ConvE [38], perform sequences of nonlinear operations.
   In this work, we use KGs as a building block of the proposed NEMO framework and KGEs as a tool
to derive numerical representations of the KG. The latter is needed to make the knowledge in the KG
processable by other components of our NEMO framework (which require knowledge represented in a
numerical form).
Knowledge graphs and behavior trees. A significant use of the KG component in our NEMO
framework is to have it interact with the NEABT component so that (𝑖) the KG can adequately guide the
actions to be performed by the agent and to be suggested to the user and (𝑖𝑖) the output of NEABT can
be employed to update the knowledge in the KG suitably. This gives a “two-way” interaction between
KGs and BTs, where the KG influences the processing logic of the BT, and, the other way around, the
BT contributes to the knowledge stored in the KG. To the best of our knowledge, this is the first work
where a two-way interaction of this kind between KGs and BTs has been employed. A few works exist
in the literature about the simultaneous use of KGs and BTs. For instance, Axelsson and Skantze [39]
devise a BT-based model that exploits a KG to generate the presentation of information by an agent to
an audience. However, Axelsson and Skantze’s model utilizes the KG solely for presenting information to
the user, and not for guiding the choices made by the BT, like in our NEMO framework. Zhou et al. [40]
propose a methodology that exploits a KG to generate a BT for robot task planning. However, in
Zhou et al.’s methodology, the KG is used to generate a BT (to be in turn used to help a robot perform its
task planning), and not for interacting with a pre-existing BT, and contributing together with it to perform
agent’s actions and suggest actions to the user, like in our NEMO framework. Venkata et al. [41] devise a
framework for knowledge transfer through BTs in a multi-agent system. Specifically, Venkata et al.’s
framework encompasses a mechanism where the knowledge acquired by single agents is shared,
through BTs, with the other agents of the multi-agent system. Thus, the use of BTs and knowledge
representation Venkata et al.’s work is conceptually far from the use we do in our NEMO framework.
Moreover, Venkata et al.’s framework does not use KGs to represent agents’ knowledge.
   From the above discussion, it is apparent that the literature about the simultaneous use of KGs and
BTs is only marginally related to what we propose in this work. This makes our proposal of a two-way
interaction between KGs and BTs entirely novel.


6. Conclusions and Future Directions
We have proposed an architecture for a system to be adopted in human-computer interaction and
human-AI teaming. The architecture features relevant elements of novelty, encompassing a Knowledge
Graph, a Neural Network, and a Behavior Tree, that moreover, as never done before, interact in two
ways. We have presented a partial implementation of a part of the architecture, particularly the behavior
tree and the related neural nodes, developed in Prolog, and we have discussed the advantages of the
implementation. We have outlined a significant case study for testing our future system. However, once
implemented, we intend to apply our system to other applications we are developing, firstly, assistive
robotics. Future work includes further development of the implementation – including investigation of
alternative paradigms, e.g., answer set programming (ASP) – and a suitable experimentation phase.


References
 [1] M. Colledanchise, P. Ögren, Behavior trees in robotics and AI: An introduction, CRC Press, 2018.
 [2] M. Iovino, E. Scukins, J. Styrud, P. Ögren, C. Smith, A survey of behavior trees in robotics and ai,
     Robotics and Autonomous Systems 154 (2022) 104096.
 [3] S. Costantini, P. Dell’Acqua, G. De Gasperis, A. Rafanelli, Empowering emotional behavior trees
     with neural computation for digital forensic, 15th European Symposium on Computational
     Intelligence and Mathematics (ESCIM 2024) (in press).
 [4] X. Lu, Deep learning based emotion recognition and visualization of figural representation,
     Frontiers in psychology 12 (2022) 818833.
 [5] W. Wei, Q. Jia, F. Yongli, G. Chen, M. Chu, Multi-modal facial expression feature based on
     deep-neural networks, Journal on Multimodal User Interfaces 14 (2019).
 [6] S. Hossain, G. Muhammad, An audio-visual emotion recognition system using deep learning
     fusion for a cognitive wireless framework, IEEE Wireless Communications 26 (2019) 62–68.
 [7] M. N. Shiota, C. Vornlocher, L. Jia, Emotional mechanisms of behavior change: Existing techniques,
     best practices, and a new approach, Policy Insights from the Behavioral and Brain Sciences 10
     (2023) 201–211.
 [8] S. Alsubai, Emotion detection using deep normalized attention-based neural network and modified-
     random forest, Sensors 23 (2022) 225.
 [9] S. An, Z. Yu, Mental and emotional recognition of college students based on brain signal features
     and data mining., Security & Communication Networks (2022).
[10] A. Subasi, T. Tuncer, S. Dogan, D. Tanko, U. Sakoglu, Eeg-based emotion recognition using tunable
     q wavelet transform and rotation forest ensemble classifier, Biomedical Signal Processing and
     Control 68 (2021) 102648.
[11] I. S. Ahmad, S. Zhang, S. Saminu, L. Wang, A. E. K. Isselmou, Z. Cai, I. Javaid, S. Kamhi, U. Kulsum,
     Deep learning based on cnn for emotion recognition using eeg signal, WSEAS (2021).
[12] R. Sánchez-Reolid, A. S. García, M. A. Vicente-Querol, L. Fernández-Aguilar, M. T. López,
     A. Fernández-Caballero, P. González, Artificial neural networks to assess emotional states from
     brain-computer interface, Electronics 7 (2018) 384.
[13] O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan,
     A. Doan, Building, maintaining, and using knowledge bases: a report from the trenches, in:
     SIGMOD, 2013, pp. 1209–1220.
[14] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. de Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo,
     R. Navigli, S. Neumaier, A. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula, L. Schmelzeisen, J. F.
     Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM CSUR 54 (2022) 71:1–71:37.
[15] G. Weikum, Knowledge graphs 2021: A data odyssey, PVLDB 14 (2021) 3233–3238.
[16] X. Wang, L. Chen, T. Ban, M. Usman, Y. Guan, S. Liu, T. Wu, H. Chen, Knowledge graph quality
     control: A survey, Fundamental Research 1 (2021) 607–626.
[17] S. Ji, S. Pan, E. Cambria, P. Marttinen, S. Y. Philip, A survey on knowledge graphs: Representation,
     acquisition, and applications, Trans. Neural Netw. Learn. Syst. 33 (2021) 494–514.
[18] B. Yang, S. W.-t. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and
     inference in knowledge bases, in: ICLR, 2015.
[19] Y. Wu, Y. Xu, X. Lin, W. Zhang, A holistic approach for answering logical queries on knowledge
     graphs, in: ICDE, 2023, pp. 2345–2357.
[20] S. S. Bhowmick, E. C. Dragut, W. Meng, Globally aware contextual embeddings for named entity
     recognition in social media streams, in: ICDE, 2023, pp. 1544–1557.
[21] J. Huang, Z. Sun, Q. Chen, X. Xu, W. Ren, W. Hu, Deep active alignment of knowledge graph
     entities and schemata, PACMMOD 1 (2023) 159:1–159:26.
[22] A. Zeakis, G. Papadakis, D. Skoutas, M. Koubarakis, Pre-trained embeddings for entity resolution:
     An experimental analysis, PVLDB 16 (2023) 2225–2238.
[23] M. Lissandrini, D. Mottin, T. Palpanas, D. Papadimitriou, Y. Velegrakis, Unleashing the power of
     information graphs, ACM SIGMOD Record 43 (2015) 21–26.
[24] C. Shi, Y. Li, J. Zhang, Y. Sun, S. Y. Philip, A survey of heterogeneous information network analysis,
     TKDE 29 (2016) 17–37.
[25] Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: A survey of approaches and
     applications, TKDE 29 (2017) 2724–2743.
[26] A. Damasio, Descartes’ Error: Emotion, Reason and the Human Brain, Science and psychology,
     Papermac, 1996.
[27] J. P. Eberhard, Brain Landscape: The Coexistence of Neuroscience and Architecture, Oxford
     University Press, 2009.
[28] A. Damasio, Looking for Spinoza, A Harvest book, Harcourt, 2003.
[29] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, Q. Guo, M. Wang, H. Wang, Retrieval-
     augmented generation for large language models: A survey, CoRR abs/2312.10997 (2023).
[30] S. Hao, T. Liu, Z. Wang, Z. Hu, ToolkenGPT: Augmenting frozen language models with massive
     tools via tool embeddings, in: NeurIPS, 2023.
[31] X. Wang, Q. Yang, Y. Qiu, J. Liang, Q. He, Z. Gu, Y. Xiao, W. Wang, KnowledGPT: Enhancing large
     language models with retrieval and storage access on knowledge bases, CoRR abs/2308.11761
     (2023).
[32] J. Zhang, Graph-toolformer: To empower LLMs with graph reasoning ability via prompt augmented
     by ChatGPT, CoRR abs/2304.11116 (2023).
[33] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for
     modeling multi-relational data, NeurIPS 26 (2013).
[34] L. Chao, J. He, T. Wang, W. Chu, PairRE: Knowledge graph embeddings via paired relation vectors,
     in: ACL, 2021, pp. 4360–4369.
[35] M. Nickel, V. Tresp, H.-P. Kriegel, et al., A three-way model for collective learning on multi-
     relational data, in: ICML, 2011, pp. 3104482–3104584.
[36] Z. Sun, Z. Deng, J. Nie, J. Tang, RotatE: Knowledge graph embedding by relational rotation in
     complex space, in: ICLR, 2019.
[37] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple link
     prediction, in: ICML, 2016, pp. 2071–2080.
[38] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowledge graph embeddings,
     in: AAAI, 2018.
[39] N. Axelsson, G. Skantze, Using knowledge graphs and behaviour trees for feedback-aware
     presentation agents, in: Proc. of Int. Conf. on Intelligent Virtual Agents (IVA), 2020, pp. 4:1–4:8.
[40] Y. Zhou, S. Zhu, W. Song, J. Gu, J. Ren, X. Xi, T. Jin, Z. Mu, Robot planning based on behavior
     tree and knowledge graph, in: Proc. of Int. Conf. on Robotics and Biomimetics ROBIO, 2022, pp.
     827–832.
[41] S. S. O. Venkata, R. Parasuraman, R. M. Pidaparti, KT-BT: A framework for knowledge transfer
     through behavior trees in multirobot systems, IEEE Trans. Robotics 39 (2023) 4114–4130.