<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Algorithms for Cognitive and Autonomous BDI Agents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ömer Ibrahim Erduran</string-name>
          <email>erduran@cs.uni-frankfurt.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Multi-Agent System, BDI Agent, Machine Learning, Agent-oriented Programming, Cognitive Agents</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Goethe University</institution>
          ,
          <addr-line>Frankfurt am Main</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LWDA'22: Lernen</institution>
          ,
          <addr-line>Wissen, Daten, Analysen</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The concept of cognitive Agents has its roots in the early stages of multi-agent systems research. In early agent research, the understanding of the term agent was referring to software agents capabilities of perception and action in a proper environment adding potential cognitive capabilities inside the agent architecture. A fundamental drawback of the concept is the barrier to learning capabilities since the full properties of the agent are hard coded. Over the years, the research in Agent-oriented software engineering has provided interesting approaches with promising results in the interplay between Machine Learning methods and cognitive software agents. Such a combination is realized by an integration process of Machine Learning algorithms into the cognitive agent cycle in the specific architecture. This paper gives an overview of the combination of both paradigms including the applied concepts and architectures for diferent scenarios. A three-dimensional cube perspectives for both paradigms. Therefore, the considered literature refers to ML-COG by arranging them into a cube. After a concise literature review, a selection of relevant research questions and open issues is presented as worthwhile to be investigated.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Over the years, research in autonomous Agents and Multi-agent systems (MAS) has emerged
into a multi-disciplinary field with influences from a wide range of related scientific fields
containing a plethora of paradigms. Due to the rapid advancement of Machine Learning (ML)
algorithms, especially in Deep Learning and Reinforcement Learning, the understanding of
agency reflected by the term</p>
      <p>
        agent has gained diferent meanings. This circumstance has been
pointed out by Dignum et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], according to which the diferent understandings of
agent
could be fundamentally seen on the one side as a concept or on the other side as a paradigm
for autonomous software systems. Furthermore, Shoham pointed out the fundamental shift
from logic-based AI and Knowledge Representation to ML and statistical algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the
viewpoint paper of Bordini et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the authors proclaim a cognitive era and investigate the
contributions of Agent-oriented Programming (AOP) to future intelligent systems. Specifically,
they mention AOP as an approach for the rapid development of cognitive agents which are
context sensitive. This means, that for a given scenario or a task that has to be processed,
software agents can be applied on large scale being extended or specified with capabilities for a
given scenario, e.g. as autonomous vehicle agents for transportation in Mobility or warehouse
agents for sorting and packing goods for deliveries. Since the goals and plans as well as the
predefined set of possible actions are usually implemented into the agent architecture, it shows
a robust behavior in its corresponding environment. This circumstance represents a contrast to
the learned behavior in ML. A main disadvantage of ML is the black box component, where
the insight into the underlying structure of the learning process can not be seen. That is why
the behavior of a learning agent based on Russel &amp; Norvig[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for instance, can not be explained
thoroughly, especially in sub-symbolic ML approaches. In Deep Reinforcement Learning (DRL),
the learning agent behavior leads to actions, which are also dificult for humans to understand 1.
      </p>
      <p>Since over the years research has been done in the considered intersection, the contribution
of this survey is to bring together a significant amount of research, where BDI is added with
ML methods categorizing them concerning the technical realization as well as the considered
ML methods. Furthermore, the survey points out research areas in this intersection worthwhile
for deeper investigation. To clarify the corresponding setting of the work handled, we first
explain the fundamentals which are considered in this survey. Specifically, we also distinguish
the sections of Multi-agent Learning (MAL) which is a prevalent research direction in the
combination of agency as a concept and usually considering reinforcement learning algorithms.
To structure the literature which we investigate in this survey, we clarify our approach to set
the research focus of this survey. As mentioned, the integration of ML and AOP is the core
research intersection, where the works that have been done so far are represented in this survey.
To the best of our knowledge, this is the first survey, which explicitly considers ML and AOP
for the cognitive BDI agent architecture. Furthermore, it comprises mainly the work done in
1Here, one can look at the well-known ”Move 37” of AlphaGo from DeepMind, mentioned in
https://www.deepmind.com/research/highlighted-research/alphago, last access: 04/18/2022.
the last two decades. The remainder of this survey is structured as follows: section 2 contains
the preliminaries as well as a distinction of the topic handled with other directions to prevent
misconceptions. The categorization approach of this survey is handled in section 3, where the
ML-COG cube is described in particular (Fig.1). In the main part, section 4, we examine the
existing literature presenting diferent approaches to tackle the challenge of integrating ML
and AOP and furthermore categorize the considered works into the ML-COG cube. After the
categorization, we present in section 5 the elaborated open challenges and directions that are
worthwhile for profound research. Finally, we conclude our survey in section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Fundamentals</title>
      <sec id="sec-2-1">
        <title>2.1. Machine Learning</title>
        <p>A compact exposition of both paradigms ML and AOP is presented in the following subsections
focusing on the main aspects.</p>
        <p>
          Learning algorithms are data-driven, which means that for specific learning behavior, the
algorithm gets exposed to a large data set. Here, the learning process can vary according to the
learning objective and furthermore, the setting. In principle, the relevant learning algorithms
can be subdivided into 3 categories: supervised, unsupervised and reinforcement learning. All of
them have been investigated with respect to the integration into the cognitive agent architecture
[
          <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
          ]. In supervised learning, the learning algorithm gets a proper training data set to apply
the learning process therefore learning a specific behavior 2. After the training process, the
testing step examines the performance of the learned behavior with a smaller sample from the
data set which is not considered during the training phase. In contrast, unsupervised learning
considers learning algorithms that are given the objective to find contextual structures in a
given data set. Thus, the learning algorithm does not get information about the objective but
has to find an underlying structure. In reinforcement learning, a learning agent is considered,
that interacts in an environment to learn and perform a specific behavior. Here, the agent itself
gets rewarded or punished for its actions in this environment. Based on a reward function,
the goal of the agent is to maximize the reward which leads to a specific behavior in the given
environment. These 3 learning approaches are also covered in the ML-COG cube and also are
objected to in this survey.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. BDI agent architecture</title>
        <p>
          Autonomous Agents have been broadly investigated in distributed artificial intelligence.
Different applications, where agents come into play, are among others ranging from Negotiation
mechanisms and Game Theory to Distributed problem-solving. In Agent Programming, we
suppose an internal cognitive architecture based on the observation, thought, act cycle that each
considered cognitive agent applies during processing in its environment [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Starting from the
fewer capabilities of a reactive agent that only reacts to senses from the environment, the more
        </p>
        <sec id="sec-2-2-1">
          <title>2Here, proper means the suitable choice of a data set for the learning objective.</title>
          <p>
            complex cognitive architecture is usually represented by the Belief, Desire, Intention - in short
BDI - architecture. The BDI model is a goal-oriented practical reasoning system and it has its
roots in practical psychology. A pre-version of the BDI model is the Practical Reasoning System
(PRS). Bryson[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], for example, presents learning for PRS and cognitive agents based on the
cognitive logical model of Marvin Minsky [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. Learning, therefore, has been the main challenge
since the beginning of cognitive reasoning systems development. In the Agent literature, there
exist multiple variations of the BDI architecture, where one example is depicted in Fig. 2. The
agent observes information from the environment, defining its belief. The desires are derived
from the beliefs, indicating the planned behavior of the agent. For each desire, a sequence of
goals and plans as combinations, which are defined, come into play. A single plan can contain
multiple actions. An action is then executed by the agent in its environment and the beliefs are
updated at the same time. A more comprehensive survey that covers the BDI agent architecture
and its variations, is examined by Silva et al. [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Integration type</title>
        <p>As a third dimension in the ML-COG cube (Fig. 1), the integration type denotes, in which form
both paradigms ML and AOP are deployed during the architectural design and implementation
phase of the intelligent agent system. It ranges from a fully designed and programmed agent
behavior to the end where the agent behavior is fully trained with ML techniques. If the learning
algorithm is implemented into the BDI architecture, we consider it as    . If the learning
algorithm is modular, it is called   . Consequently, a combination of both is called
  &amp;  3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. ML-COG cube categorization scheme</title>
      <p>
        The integration of ML into the BDI architecture as two distinct paradigms is the core area that
we consider in this survey. To provide a clear view of this intersection with the corresponding
published works, we set up a categorization scheme. The rationale for this scheme is based
on a problem-solution order since we focus on an integration problem, which can be seen
3Another interesting scale representation distinguishing between learned and hard-coded behavior is introduced
by Ricci, A. in his talk ”Agent Programming in the Cognitive Era: A New Era for Agent Programming?”, EMAS 2021.
equally as an implementation problem in AOP. Displayed as a cube structure, we present the
ML-COG cube to classify the considered research. In its basic features, ML-COG comprises 3
main dimensions. The first dimension, which is defined as the cognitive agent development, is
reflected on the  -axis. In this dimension, we distinguish between diferent agent development
approaches leaning on the fundamental literature of Multi-agent research based on Wooldridge
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The development of cognitive agent architecture ranges from a single agent approach in
Agent Programming (AP) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], to multiple agents interacting with each other in Multi-Agent
Programming (MAP)[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] up to Agent-oriented Software Engineering (AOSE)[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where the
whole Software engineering process for MAS is covered including organizational roles and the
interacting environment. All of the 3 approaches in the  -axis are constricted to BDI agents.
In the second dimension, we accordingly envisage the ML perspective, which is reflected in
the  -axis. Here, we diferentiate between the learning approach, which can be in general
divided into supervised learning, unsupervised learning and reinforcement learning. Since the
core of this survey is the integration of both ML and AOP, we focus on adding both dimensions
together accordingly by investigating the diferent approaches. Therefore, we distinguish in
the third dimension the type of integration, represented in the  -axis. Here, we distinguish a
loosely coupled integration, a hard-coded, and a combination of soft and hard-coded integration
types. Throughout the survey process, we assign each considered work and approach. It is
important to note that, we constrict the surveyed literature mainly to the approaches, where
ML is considered for BDI agents. Thus, we have to neglect prominent works where Learning is
investigated into other types of cognitive architectures like SOAR or Act-R 4. From this starting
point, we went through the publications cited in the considered works. Due to space constraints,
we consider specific representative works for ML-COG. We also apologize to the authors whose
work we had to omit due to space constraints.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Literature Review</title>
      <p>
        Based on the approach explained in section 3, this survey covers works that combine ML for
AOP, especially considering the BDI architecture. The literature collection is processed by
selecting works where ML approaches are applied to BDI agents, i.e. the learning algorithm is
integrated into the BDI cycle. We examined a plethora of works neglecting approaches, where
ML is though considered but not for BDI agents. One work mentioned before is from Bordini et
al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where the literature is examined with respect to Artificial Intelligence in general for BDI
agents. The mentioned work considers ML approaches but is not limited to. Whereas in this
survey, the focus solely lies on ML for BDI agents. Based on the fact, that they cover a broader
range of the literature spectrum, they do not go into detail for specifically mentioned works
that are also subject to this survey. In this survey, the ML paradigm and BDI architecture are
opposed, and thus, we explain the related literature in this more specific context considering the
introduced categorization scheme. In this survey, we filter in section 5 challenges concerning
ML and AOP.
      </p>
      <sec id="sec-4-1">
        <title>4Interested readers are referred to Broekens et al.[16] and Nason &amp; Laird [17].</title>
        <sec id="sec-4-1-1">
          <title>4.1. BDI and Decision trees</title>
          <p>
            One of the first works mentioning ML approaches for BDI agents explicitly is from
GuerraHernández et al.[
            <xref ref-type="bibr" rid="ref18 ref19 ref5">5, 18, 19</xref>
            ] where the plan selection process is investigated by applying logical
decision trees. As a typical Supervised Learning approach, this method is integrated into the BDI
cycle by adding the decision tree into the interpreter of the agent, transforming the selected
plans into intentions.Phung et al. [20] apply decision trees for BDI agents using a learning-based
framework, where the learning component is added to the BDI cycle. The agent processes its
past experience to adapt it to the current behavior with respect to background knowledge. The
result of the learning algorithm is then added to the beliefs of the agent. In the work of Airiau
et al. [21], the BDI agent is investigated to learn from past experience by preventing failed
plan executions. In an initial step, the relation of goals and plans is represented by means of
goal-plan trees. A goal-plan tree contains the defined goals and their corresponding plans of
a BDI agent, leading to a hierarchical tree structure with goals and possible sub-plans. In the
thesis of Singh [22], the plan selection step in the BDI cycle is tackled with diferent approaches.
Multiple works related to the author are therefore considered. The works of Singh et al.[23, 24]
build upon the previous paper [21] and add context conditions for the plan selection process in
form of decision trees. In common, a context condition is a Boolean function that needs to be
predetermined during the implementation phase. It is attached to each plan and describes the
conditions and whether a plan is useful to a corresponding goal in a specific situation. Focusing
on the learned behavior, the decision tree is built up for each considered plan in the agent’s
library. Each tree, therefore, leads to a decision of whether the plan will be successful or fail
with a probability score. A further extension of this work is from Singh et al. [25], where plan
selection considering changing dynamics is investigated. A confidence measure function for the
degree of stability of plans is presented with respect to execution traces and the environment of
the agents. The resulting weights are added to the plans denoting the success of being applied
for a corresponding goal.
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.2. BDI and Reinforcement Learning</title>
          <p>
            The thesis of Feliu [26] considers the application of Reinforcement Learning (RL) for generating
plans in BDI agents without relying on earlier knowledge. The author covers some related
works concerning BDI and learning, which are also objects of this survey. Related to this
setting, where RL is applied for BDI are from Pereira &amp; Dimuro[27, 28] and Chang [29] . In these
works, BDI and RL are combined by integrating learning into the BDI cycle. The work of Qi &amp;
Bo-ying [30] investigates a combination of RL and BDI for robot soccer simulation. Here, RL is
considered as a feedback process by using the Q-Learning algorithm for the simulation steps.
The learning algorithm is not integrated into the BDI architecture but processes the outcome of
the BDI agent’s action. Another approach in the same setting is presented by Wan et al. [31]
where a BDI agent is extended with Q-Learning in AgentSpeak, which is the language definition
instance the agents use to communicate with each other. More specifically, the plan library is
improved by the Q-learning decision algorithm in an uncertain environment. What they found
out is, that in state space exploration, which is the obligatory step in RL, the communication
of AgentSpeak slowed down. For faster convergence, Deep Reinforcement Learning (DRL)
seems to be a suitable approach. The latter is also mentioned in section 5. Action selection
based on rules is a challenge inside this area which is tackled by Broekens &amp; Hindriks [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. In
this work, the authors use RL for the rule selection step, which slightly difers from the action
selection process. In the typical RL setting, the learned behavior is the corresponding action. In
contrast in this work, an internal uninstantiated rule is selected during the learning process.
They consider the GOAL agent programming language. The relevant components for learning
are reflected in the states, which are built up with a set of rules of the agents and the number
of active goals. The considered state representation seems to be an initial version for learning
but is capable to deliver interesting results for rule selection. The learning process takes place
inside the agent architecture.
          </p>
          <p>Initial works of combining elements of the RL setting with partial observability has been
investigated by Rens et al. [32]. Here, the authors combine the BDI architecture with the
Partially observable Markov decision process (POMDP) plan approach providing initial results by
considering small experimental settings. They argue in favor of a more complex simulation
environment. For this approach, Chen et al. [33] integrate the POMDP into the planning phase
of the BDI architecture by considering AgentSpeak. Nair &amp; Tambe also investigate in [34] the
concept of POMDP for the BDI paradigm. They consider Multi-agent teaming by POMDP and
team-oriented programming. Another work concerning this specification is from Rens &amp; Moodley
[35], where the reward-maximizing approach of POMDP and the management of multiple goals
in BDI systems are combined. These works open up opportunities for investigating RL and
BDI in Multi-agent settings. Bosello &amp; Ricci [36] extend the BDI architecture with RL. They
consider SARSA algorithm for the decision-making of the agent. A Low-level learning approach
represented in the BDI-FALCON agent architecture, which is presented in Tan et al. [37, 38]. At
its lowest level, BDI-FALCON contains a reactive learning module based on Temporal Diference
Learning (TD), an RL algorithm that estimates a value function of state-action pairs (, )
that indicates the learning step of the system. Two other modules contain the BDI-native
components like goals and plans which are sent to the low-level RL environment. Karim et al.
[39] propose an approach, where learning with a high level of abstraction by a BDI agent is
connected to a low-level RL environment, based on BDI-FALCON [37]. Resulting in a hybrid
architecture, the BDI agent generates plans that are derived from the RL environment. Norling
[40] integrates the Q-Learning algorithm into the BDI cycle to learn rules for pathfinding in
a grid world. It is evaluated in a simple grid environment. Subagdja &amp; Sonenberg [41] also
integrate Q-Learning algorithm into the BDI agent cycle. They introduce meta-level plans which
are considered for monitoring, the reasoning step, and the monitoring of plans executed. Badica
et al. [42] apply TD for BDI agents. Considering a grid scenario, they define the agent’s actions
as well as specific states representing the corresponding goals. Singh &amp; Hindriks investigate in
[43] the Q-Learning algorithm for adaptive behaviors in autonomous BDI agents.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.3. Literature categorization overview</title>
          <p>In table 1, we have listed a selection of the research works handled in this survey and classified
them with respect to the ML-COG dimensions neglecting mentioned other surveys in the
previous section. For future research in this area, the open research challenges in Section 5
can be considered. For the sake of clarity, we set up the columns reflecting the dimensions of
Overview of selected surveyed literature based on ML-COG</p>
          <p>SA/MA</p>
          <p>SA
SA
SA
SA
SA</p>
          <p>SA
SA/MA</p>
          <p>SA
SA
SA
SA
SA</p>
          <p>SA
SA/MA</p>
          <p>SA
SA
SA
SA
SA
SA
SA
SA</p>
          <p>Learning</p>
          <p>Decision Tree
TD-Learning (RL)</p>
          <p>SARSA (RL)
Model-based RL
Q-Learning (RL)</p>
          <p>CLARET (SL)
Decision Tree</p>
          <p>RL
RL
KAM</p>
          <p>RPDM
RPDM/Q-Learning</p>
          <p>Decision Tree</p>
          <p>Q-Learning
Deep Neural Network</p>
          <p>Decision Tree
Decision Tree
Decision Tree
Q-Learning
Q-Learning</p>
          <p>TD-Learning
Q-Learning (RL)</p>
          <p>Integration</p>
          <p>Hard</p>
          <p>Hard
Hard &amp; Soft</p>
          <p>Hard</p>
          <p>Hard &amp; Soft
Loosely coupled</p>
          <p>Hard
Loosely coupled
Loosely coupled</p>
          <p>Hard &amp; Soft
(?)
(?)
Hard &amp; Soft
Hard &amp; Soft
Hard &amp; Soft</p>
          <p>Hard
Hard</p>
          <p>Hard
Hard &amp; Soft</p>
          <p>Hard
Loosely coupled</p>
          <p>Hard</p>
          <p>Objective
Plan selection
RL-BDI agent
RL-BDI agent
Rule selection
Plan generation
Plan recognition
Plan execution
Plan execution</p>
          <p>Plan execution
Intention selection
Decision Making
Decision Making
Experience learning
Decision Making
Decision making</p>
          <p>Plan selection
Plan selection
Plan selection</p>
          <p>BDI-RL agent
Learning Plans /Actions</p>
          <p>Plan selection
RL-BDI agent
Agent (MA) setting is considered. In the cube, SA is placed into AP and MA is placed into MAP.
In addition, the last column Objective contains the contribution goal of the corresponding work.
Note, that for the dimension Cognitive agent development, we distinguish between Single-agent
(SA) and Multi-agent(MA) development. In the context of the survey, it is equally to AP and
MAP, as explained in section 3. Note, that we have left out the works from the table, where
learning approaches are not explicitly implemented or executed 5.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Open research challenges</title>
      <p>The research done so far in the intersection of ML and AOP provides many diferent applications,
where some of which have been elaborated on in the previous section. The ML-COG cube is in
an ongoing development process. It will be extended with finer-grained categorization scales
thus being subject to future research. Since the categorization process follows the presented
dimensions, we point out the following application areas, which are picked due to their technical
proximity as well as based on the contributions and potential limitations in the investigated
literature. Therefore, we list the following areas for future research:</p>
      <sec id="sec-5-1">
        <title>1. Communication protocols and Emergent communication</title>
        <p>2. Cognitive decision making and Learned behavior
3. Goal-level learning</p>
      </sec>
      <sec id="sec-5-2">
        <title>5A ? entry denotes, that the implementation type is not clearly classifiable.</title>
        <p>
          The overall aim is to provide a high level of abstraction with the usage of learning-based
components. The areas 1 and 2 are intentionally formulated each with two extremes, indicating
the diferent approaches to agent development in the programming phase. The first area ranges
from predefined communication protocols like AgentSpeak [48] which is considered in a MAS
over to emergent communication [49] in learning-based agents interacting with each other.
Current research in emergent communication provides RL algorithms in Multi-agent settings
to encourage agents to communicate with each other based on single and collective rewards.
This area is important, especially in MAS where reliable communication leads to eficient
coordination and cooperation. In table 1, one can see that nearly all works focus on the
singleagent setting. The shift to MAS is therefore a crucial step in inspecting the behavior of BDI
learning agents interacting with each other. A combination of learning-based communication
with initial rules represents such a combination approach. The advantage overall is a better
explainable learned behavior and thus the corresponding actions of the agents [50]. In the
second area, we distinguish rather diferent agent types which are commonly considered in
MAS as it is presented by Russel &amp; Norvig in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Decision-making is the essential step agents
process to reach their goals successfully. The research in Multi-agent Learning based on RL
algorithms has already covered a broad range of settings starting from single-agent settings to
MAS settings with diferent applications [ 51]. Here, we see future work in the MAS settings
based on cognitive decision-making based on the BDI architecture. Works covered in this survey
already provide solution for the single-agent setting [36, 37]. Furthermore, the usage of Neural
networks is rather a rarity and one recent work combining BDI agents and Neural networks are
from Rodrigues et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. As a third area, we see learning at the Goal level as a novel approach to
connecting ML and BDI. In the surveyed literature, learning at the Plan level is predominantly
tackled but no Goal-level learning is presented. Sub-symbolic learning methods, like Neural
Networks, could be therefore considered.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        Learning methods in MAS difer from the traditional ML process since the autonomous and
lfexible behavior of the agents is considered and furthermore, interacting in a complex and
dynamic environment. This survey aims to get in the lane at the intersection of ML and AOP
by comprising the relevant work done over the years in the field. In ML research, the term
agent is predominantly considered as a concept rather than an existing instance with explicit
developed cognitive capabilities as it is in software agents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Such a form of disambiguation also
influences the contextual understanding of our work. In spite of the fact that this intersection is
based on diferent approaches, cognitive software agents have not been considered suficiently
in ML research and therefore represent a relevant direction for current research. The analysis of
such an integration process will lead to better insight into the functioning of learned behaviors
in a cognitive framework.
[20] T. Phung, M. Winikof, L. Padgham, Learning Within the BDI Framework: An Empirical
Analysis, in: Knowledge-Based Intelligent Information and Engineering Systems, volume
3683, Springer Berlin Heidelberg, 2005, pp. 282–288.
[21] S. Airiau, L. Padgham, S. Sardina, S. Sen, Enhancing the Adaptation of BDI Agents Using
Learning Techniques, International Journal of Agent Technologies and Systems 1 (2009)
1–18.
[22] D. Singh, Learning plan selection for BDI agent systems, Ph.D. thesis, RMIT University,
2011.
[23] D. Singh, S. Sardina, L. Padgham, S. Airiau, Learning Context Conditions for BDI Plan
      </p>
      <p>Selection, AAMAS (2010).
[24] D. Singh, S. Sardina, L. Padgham, Extending BDI plan selection to incorporate learning
from experience, Robotics and Autonomous Systems 58 (2010) 1067–1075.
[25] D. Singh, S. Sardina, L. Padgham, G. James, Integrating learning into a bdi agent for
environments with changing dynamics, in: Twenty-Second IJCAI, 2011.
[26] J. Feliu, Use of Reinforcement Learning for Plan Generation in Belief-Desire-Intention
(BDI) Agent Systems, Ph.D. thesis, University of Rhode Island, Kingston, RI, 2013. doi:10.
23860/thesis- feliu- jose- 2013.
[27] D. R. Pereira, L. V. Goncalves, G. P. Dimuro, Constructing BDI Plans from Optimal POMDP
Policies, with an Application to AgentSpeak Programming, Conferencia Latinoamericana
de Informatica, CLEI (2008) 11.
[28] D. R. Pereira, G. P. Dimuro, Um algoritmo para extraçao de um plano bdi que obedece uma
polıtica mdp otima, in: Anais do Workshop-Escola de Sistemas de Agentes para Ambientes
Colaborativos, Pelotas, PPGINF/UCPel, 2007.
[29] S. Chang, Simulacao de ambientes multiagente normativos, Workshop-Escola de Sistemas
de Agentes para Ambientes Colaborativos WESAAC (2007).
[30] G. Qi, W. Bo-ying, Study and Application of Reinforcement Learning in Cooperative
Strategy of the Robot Soccer Based on BDI Model, International Journal of Advanced
Robotic Systems 6 (2009) 15.
[31] Q. Wan, W. Liu, L. Xu, J. Guo, Extending the BDI Model with Q-learning in Uncertain
Environment, in: International Conference on Algorithms, Computing and Artificial
Intelligence, ACM, Sanya China, 2018, pp. 1–6.
[32] G. Rens, A. Ferrein, E. Van Der Poel, A BDI Agent Architecture for a POMDP
Planner, 9th International Symposium on Logical Formalization of Commonsense Reasoning:
Commonsense (2009).
[33] Y. Chen, K. Bauters, W. Liu, J. Hong, K. McAreavey, L. Godo, C. Sierra, Agentspeak+:</p>
      <p>Agentspeak with probabilistic planning, Proc. of CIMA (2014) 15–20.
[34] R. Nair, M. Tambe, Hybrid BDI-POMDP Framework for Multiagent Teaming, Journal of</p>
      <p>Artificial Intelligence Research 23 (2005) 367–420.
[35] G. Rens, D. Moodley, A hybrid POMDP-BDI agent architecture with online stochastic
planning and plan caching, Cognitive Systems Research 43 (2017) 1–20.
[36] M. Bosello, A. Ricci, From Programming Agents to Educating Agents – A Jason-Based
Framework for Integrating Learning in the Development of Cognitive Agents, in: L. A.
Dennis, R. H. Bordini, Y. Lespérance (Eds.), Engineering Multi-Agent Systems, volume
12058, Springer International Publishing, Cham, 2020, pp. 175–194.
[37] A.-H. Tan, Y.-S. Ong, A. Tapanuj, A hybrid agent architecture integrating desire, intention
and reinforcement learning, Expert Systems with Applications 38 (2011).
[38] A.-H. Tan, Falcon: a fusion architecture for learning, cognition, and navigation, in:
2004 IEEE International Joint Conference on Neural Networks (IEEE), volume 4, 2004, pp.
3297–3302 vol.4.
[39] S. Karim, L. Sonenberg, A.-H. Tan, A Hybrid Architecture Combining Reactive Plan
Execution and Reactive Learning, in: PRICAI 2006: Trends in Artificial Intelligence,
volume 4099, Springer Berlin Heidelberg, 2006, pp. 200–211.
[40] E. Norling, Folk psychology for human modelling: Extending the bdi paradigm, in:
Proceedings of the Third International Joint Conference on Autonomous Agents and
Multiagent Systems-Volume 1, 2004, pp. 202–209.
[41] B. Subagdja, L. Sonenberg, Learning plans with patterns of actions in bounded-rational
agents, Springer-Verlag (2005) 30–36.
[42] A. Badica, C. Badica, M. Ivanovic, D. Mitrovic, An Approach of Temporal Diference
Learning Using Agent-Oriented Programming, in: 20th International Conference on
Control Systems and Computer Science, IEEE, 2015, pp. 735–742.
[43] D. Singh, K. V. Hindriks, Learning to Improve Agent Behaviours in GOAL, in: Programming</p>
      <p>Multi-Agent Systems, volume 7837, Springer Berlin Heidelberg, 2013, pp. 158–173.
[44] C. Heinze, S. Goss, A. Pearce, Plan recognition in military simulation: Incorporating
machine learning with intelligent agents, in: Proceedings of IJCAI-99 Workshop on Team
Behaviour and Plan Recognition, Citeseer, 1999, pp. 53–64.
[45] S. Karim, B. Subagdja, L. Sonenberg, Plans as Products of Learning, in: 2006
IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, Hong
Kong, China, 2006, pp. 139–145.
[46] P. Lokuge, D. Alahakoon, Improving the adaptability in automated vessel scheduling
in container ports using intelligent software agents, European Journal of Operational
Research 177 (2007) 1985–2015.
[47] E. Norling, Learning to notice: Adaptive models of human operators, in: Second
International Workshop on Learning Agents, Citeseer, 2001.
[48] R. H. Bordini, J. F. Hübner, M. Wooldridge, Programming multi-agent systems in
AgentSpeak using Jason, John Wiley &amp; Sons, 2007.
[49] M. Noukhovitch, T. LaCroix, A. Lazaridou, A. Courville, Emergent communication under
competition, in: Proceedings of the 20th International Conference on AAMAS,
International Foundation for Autonomous Agents and Multiagent Systems, UK, 2021, p. 974–982.
[50] J. Broekens, M. Harbers, K. Hindriks, K. v. d. Bosch, C. Jonker, J.-J. Meyer, Do you get
it? user-evaluated explainable bdi agents, in: German Conference on Multiagent System
Technologies, Springer, 2010, pp. 28–39.
[51] S. Gronauer, K. Diepold, Multi-agent deep reinforcement learning: a survey, Artificial
Intelligence Review 55 (2022) 895–943.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Dignum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dignum</surname>
          </string-name>
          ,
          <article-title>Agents are dead. long live agents!</article-title>
          ,
          <source>in: Proceedings of the 19th AAMAS</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1701</fpage>
          -
          <lpage>1705</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shoham</surname>
          </string-name>
          ,
          <article-title>Why knowledge representation matters</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <year>2015</year>
          )
          <fpage>47</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Bordini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El Fallah Seghrouchni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hindriks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Logan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <article-title>Agent programming in the cognitive era</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>34</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          ,
          <source>Artificial Intelligence: a modern approach</source>
          , 3 ed.,
          <source>Pearson</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Hernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El</surname>
          </string-name>
          Fallah-Seghrouchni,
          <string-name>
            <given-names>H.</given-names>
            <surname>Soldano</surname>
          </string-name>
          ,
          <article-title>Learning in BDI Multi-agent Systems</article-title>
          , in: J.
          <string-name>
            <surname>Dix</surname>
          </string-name>
          , J. Leite (Eds.),
          <source>Computational Logic in Multi-Agent Systems</source>
          , volume
          <volume>3259</volume>
          , Springer Berlin Heidelberg,
          <year>2004</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Silveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Santiago</surname>
          </string-name>
          ,
          <article-title>A Mediator Agent based on Multi-Context System and Information Retrieval</article-title>
          ,
          <string-name>
            <surname>ICAART</surname>
          </string-name>
          <year>2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O. I.</given-names>
            <surname>Erduran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tarraf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruehl</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Schroth, Multi-agent Learning for Energy-Aware Placement of Autonomous Vehicles</article-title>
          ,
          <source>in: 18th IEEE International Conference On Machine Learning And Applications (ICMLA)</source>
          , IEEE,
          <string-name>
            <surname>Boca</surname>
            <given-names>Raton</given-names>
          </string-name>
          , FL, USA,
          <year>2019</year>
          , pp.
          <fpage>1671</fpage>
          -
          <lpage>1678</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          ,
          <article-title>An introduction to multiagent systems</article-title>
          , 2nd ed ed., John Wiley &amp; Sons, Chichester, U.K,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bryson</surname>
          </string-name>
          ,
          <article-title>Cross-paradigm analysis of autonomous agent architecture</article-title>
          ,
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          <volume>12</volume>
          (
          <year>2000</year>
          )
          <fpage>165</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Minsky</surname>
          </string-name>
          ,
          <article-title>Logical Versus Analogical or Symbolic Versus Connectionist or Neat Versus Scrufy</article-title>
          , in:
          <source>AI Magazine Volume 12 Number 2</source>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L. d.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Meneguzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Logan</surname>
          </string-name>
          ,
          <article-title>BDI Agent Architectures: A Survey, in: Proceedings of the Twenty-</article-title>
          <string-name>
            <surname>Ninth</surname>
            <given-names>IJCAI</given-names>
          </string-name>
          , Yokohama, Japan,
          <year>2020</year>
          , pp.
          <fpage>4914</fpage>
          -
          <lpage>4921</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Deljoo</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. M. van Engers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Gommans</surname>
          </string-name>
          , C. T. de Laat, et al.,
          <article-title>What is going on: Utilitybased plan selection in bdi agents</article-title>
          .,
          <source>in: AAAI Workshops</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shoham</surname>
          </string-name>
          ,
          <article-title>Agent-oriented programming</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>60</volume>
          (
          <year>1993</year>
          )
          <fpage>51</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Bordini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dastani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dix</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . El Fallah Seghrouchni (Eds.),
          <source>Multi-Agent Programming: : Languages, Tools and Applications</source>
          , Springer US,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sturm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Shehory</surname>
          </string-name>
          ,
          <article-title>Agent-Oriented Software Engineering: Revisiting the State of the Art</article-title>
          , Springer Berlin Heidelberg,
          <year>2014</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Broekens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hindriks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wiggers</surname>
          </string-name>
          ,
          <article-title>Reinforcement Learning as Heuristic for Action-Rule Preferences</article-title>
          , in: R.
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dix</surname>
          </string-name>
          , P. Novák (Eds.),
          <source>Programming Multi-Agent Systems</source>
          , volume
          <volume>6599</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nason</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Laird</surname>
          </string-name>
          ,
          <article-title>Soar-RL: integrating reinforcement learning with Soar</article-title>
          ,
          <source>Cognitive Systems Research</source>
          <volume>6</volume>
          (
          <year>2005</year>
          )
          <fpage>51</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El</surname>
          </string-name>
          Fallah-Seghrouchni,
          <string-name>
            <given-names>H.</given-names>
            <surname>Soldano</surname>
          </string-name>
          ,
          <article-title>Distributed learning in intentional bdi multi-agent systems</article-title>
          ,
          <source>in: Proceedings of the Fifth Mexican International Conference in Computer Science</source>
          ,
          <year>2004</year>
          .
          <source>ENC</source>
          <year>2004</year>
          .,
          <year>2004</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.-F.</given-names>
            <surname>Seghrouchni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Soldano</surname>
          </string-name>
          ,
          <article-title>Bdi multiagent learning based on ifrst-order induction of logical decision trees</article-title>
          ,
          <source>in: Intelligent Agent Technology: Research and Development</source>
          , World Scientific,
          <year>2001</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>