=Paper= {{Paper |id=Vol-3341/wm654 |storemode=property |title=Machine Learning Algorithms for Cognitive and Autonomous BDI Agents |pdfUrl=https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_654.pdf |volume=Vol-3341 |authors=Ömer Ibrahim Erduran |dblpUrl=https://dblp.org/rec/conf/lwa/Erduran22 }} ==Machine Learning Algorithms for Cognitive and Autonomous BDI Agents== https://ceur-ws.org/Vol-3341/WM-LWDA_2022_CRC_654.pdf
Machine Learning Algorithms for Cognitive and
Autonomous BDI Agents
Ömer Ibrahim Erduran
Department of Computer Science, Goethe University, Frankfurt am Main, Germany


                                      Abstract
                                      The concept of cognitive Agents has its roots in the early stages of multi-agent systems research. In
                                      early agent research, the understanding of the term agent was referring to software agents with basic
                                      capabilities of perception and action in a proper environment adding potential cognitive capabilities inside
                                      the agent architecture. A fundamental drawback of the concept is the barrier to learning capabilities since
                                      the full properties of the agent are hard coded. Over the years, the research in Agent-oriented software
                                      engineering has provided interesting approaches with promising results in the interplay between Machine
                                      Learning methods and cognitive software agents. Such a combination is realized by an integration process
                                      of Machine Learning algorithms into the cognitive agent cycle in the specific architecture. This paper
                                      gives an overview of the combination of both paradigms including the applied concepts and architectures
                                      for different scenarios. A three-dimensional cube ML-COG is introduced to illustrate the integration
                                      perspectives for both paradigms. Therefore, the considered literature refers to ML-COG by arranging
                                      them into a cube. After a concise literature review, a selection of relevant research questions and open
                                      issues is presented as worthwhile to be investigated.

                                      Keywords
                                      Multi-Agent System, BDI Agent, Machine Learning, Agent-oriented Programming, Cognitive Agents




1. Introduction
Over the years, research in autonomous Agents and Multi-agent systems (MAS) has emerged
into a multi-disciplinary field with influences from a wide range of related scientific fields
containing a plethora of paradigms. Due to the rapid advancement of Machine Learning (ML)
algorithms, especially in Deep Learning and Reinforcement Learning, the understanding of
agency reflected by the term agent has gained different meanings. This circumstance has been
pointed out by Dignum et al. [1], according to which the different understandings of agent
could be fundamentally seen on the one side as a concept or on the other side as a paradigm
for autonomous software systems. Furthermore, Shoham pointed out the fundamental shift
from logic-based AI and Knowledge Representation to ML and statistical algorithms [2]. In the
viewpoint paper of Bordini et al. [3], the authors proclaim a cognitive era and investigate the
contributions of Agent-oriented Programming (AOP) to future intelligent systems. Specifically,
they mention AOP as an approach for the rapid development of cognitive agents which are
context sensitive. This means, that for a given scenario or a task that has to be processed,

LWDA’22: Lernen, Wissen, Daten, Analysen. October 05–07, 2022, Hildesheim, Germany
Envelope-Open erduran@cs.uni-frankfurt.de (Ö. I. Erduran)
Orcid 0000-0002-1586-0228 (Ö. I. Erduran)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
                      Figure 1: The proposed multidimensional ML-COG Cube


software agents can be applied on large scale being extended or specified with capabilities for a
given scenario, e.g. as autonomous vehicle agents for transportation in Mobility or warehouse
agents for sorting and packing goods for deliveries. Since the goals and plans as well as the
predefined set of possible actions are usually implemented into the agent architecture, it shows
a robust behavior in its corresponding environment. This circumstance represents a contrast to
the learned behavior in ML. A main disadvantage of ML is the black box component, where
the insight into the underlying structure of the learning process can not be seen. That is why
the behavior of a learning agent based on Russel & Norvig[4] for instance, can not be explained
thoroughly, especially in sub-symbolic ML approaches. In Deep Reinforcement Learning (DRL),
the learning agent behavior leads to actions, which are also difficult for humans to understand1 .
   Since over the years research has been done in the considered intersection, the contribution
of this survey is to bring together a significant amount of research, where BDI is added with
ML methods categorizing them concerning the technical realization as well as the considered
ML methods. Furthermore, the survey points out research areas in this intersection worthwhile
for deeper investigation. To clarify the corresponding setting of the work handled, we first
explain the fundamentals which are considered in this survey. Specifically, we also distinguish
the sections of Multi-agent Learning (MAL) which is a prevalent research direction in the
combination of agency as a concept and usually considering reinforcement learning algorithms.
To structure the literature which we investigate in this survey, we clarify our approach to set
the research focus of this survey. As mentioned, the integration of ML and AOP is the core
research intersection, where the works that have been done so far are represented in this survey.
To the best of our knowledge, this is the first survey, which explicitly considers ML and AOP
for the cognitive BDI agent architecture. Furthermore, it comprises mainly the work done in

    1
    Here, one can look at the well-known ”Move 37” of AlphaGo from DeepMind, mentioned in https://www.deep-
mind.com/research/highlighted-research/alphago, last access: 04/18/2022.
the last two decades. The remainder of this survey is structured as follows: section 2 contains
the preliminaries as well as a distinction of the topic handled with other directions to prevent
misconceptions. The categorization approach of this survey is handled in section 3, where the
ML-COG cube is described in particular (Fig.1). In the main part, section 4, we examine the
existing literature presenting different approaches to tackle the challenge of integrating ML
and AOP and furthermore categorize the considered works into the ML-COG cube. After the
categorization, we present in section 5 the elaborated open challenges and directions that are
worthwhile for profound research. Finally, we conclude our survey in section 6.


2. Fundamentals
A compact exposition of both paradigms ML and AOP is presented in the following subsections
focusing on the main aspects.

2.1. Machine Learning
Learning algorithms are data-driven, which means that for specific learning behavior, the
algorithm gets exposed to a large data set. Here, the learning process can vary according to the
learning objective and furthermore, the setting. In principle, the relevant learning algorithms
can be subdivided into 3 categories: supervised, unsupervised and reinforcement learning. All of
them have been investigated with respect to the integration into the cognitive agent architecture
[5, 6, 7]. In supervised learning, the learning algorithm gets a proper training data set to apply
the learning process therefore learning a specific behavior 2 . After the training process, the
testing step examines the performance of the learned behavior with a smaller sample from the
data set which is not considered during the training phase. In contrast, unsupervised learning
considers learning algorithms that are given the objective to find contextual structures in a
given data set. Thus, the learning algorithm does not get information about the objective but
has to find an underlying structure. In reinforcement learning, a learning agent is considered,
that interacts in an environment to learn and perform a specific behavior. Here, the agent itself
gets rewarded or punished for its actions in this environment. Based on a reward function,
the goal of the agent is to maximize the reward which leads to a specific behavior in the given
environment. These 3 learning approaches are also covered in the ML-COG cube and also are
objected to in this survey.

2.2. BDI agent architecture
Autonomous Agents have been broadly investigated in distributed artificial intelligence. Dif-
ferent applications, where agents come into play, are among others ranging from Negotiation
mechanisms and Game Theory to Distributed problem-solving. In Agent Programming, we
suppose an internal cognitive architecture based on the observation, thought, act cycle that each
considered cognitive agent applies during processing in its environment [8]. Starting from the
fewer capabilities of a reactive agent that only reacts to senses from the environment, the more

   2
       Here, proper means the suitable choice of a data set for the learning objective.
                   Figure 2: The typical cognitive BDI cycle based on Deljoo et al.[12]


complex cognitive architecture is usually represented by the Belief, Desire, Intention - in short
BDI - architecture. The BDI model is a goal-oriented practical reasoning system and it has its
roots in practical psychology. A pre-version of the BDI model is the Practical Reasoning System
(PRS). Bryson[9], for example, presents learning for PRS and cognitive agents based on the
cognitive logical model of Marvin Minsky [10]. Learning, therefore, has been the main challenge
since the beginning of cognitive reasoning systems development. In the Agent literature, there
exist multiple variations of the BDI architecture, where one example is depicted in Fig. 2. The
agent observes information from the environment, defining its belief. The desires are derived
from the beliefs, indicating the planned behavior of the agent. For each desire, a sequence of
goals and plans as combinations, which are defined, come into play. A single plan can contain
multiple actions. An action is then executed by the agent in its environment and the beliefs are
updated at the same time. A more comprehensive survey that covers the BDI agent architecture
and its variations, is examined by Silva et al. [11].

2.3. Integration type
As a third dimension in the ML-COG cube (Fig. 1), the integration type denotes, in which form
both paradigms ML and AOP are deployed during the architectural design and implementation
phase of the intelligent agent system. It ranges from a fully designed and programmed agent
behavior to the end where the agent behavior is fully trained with ML techniques. If the learning
algorithm is implemented into the BDI architecture, we consider it as 𝐻 𝑎𝑟𝑑. If the learning
algorithm is modular, it is called 𝑙𝑜𝑜𝑠𝑒𝑙𝑦 𝑐𝑜𝑢𝑝𝑙𝑒𝑑. Consequently, a combination of both is called
𝐻 𝑎𝑟𝑑&𝑠𝑜𝑓 𝑡 3 .


3. ML-COG cube categorization scheme
The integration of ML into the BDI architecture as two distinct paradigms is the core area that
we consider in this survey. To provide a clear view of this intersection with the corresponding
published works, we set up a categorization scheme. The rationale for this scheme is based
on a problem-solution order since we focus on an integration problem, which can be seen
    3
     Another interesting scale representation distinguishing between learned and hard-coded behavior is introduced
by Ricci, A. in his talk ”Agent Programming in the Cognitive Era: A New Era for Agent Programming?”, EMAS 2021.
equally as an implementation problem in AOP. Displayed as a cube structure, we present the
ML-COG cube to classify the considered research. In its basic features, ML-COG comprises 3
main dimensions. The first dimension, which is defined as the cognitive agent development, is
reflected on the 𝑦-axis. In this dimension, we distinguish between different agent development
approaches leaning on the fundamental literature of Multi-agent research based on Wooldridge
[8]. The development of cognitive agent architecture ranges from a single agent approach in
Agent Programming (AP) [13], to multiple agents interacting with each other in Multi-Agent
Programming (MAP)[14] up to Agent-oriented Software Engineering (AOSE)[15], where the
whole Software engineering process for MAS is covered including organizational roles and the
interacting environment. All of the 3 approaches in the 𝑦-axis are constricted to BDI agents.
In the second dimension, we accordingly envisage the ML perspective, which is reflected in
the 𝑥-axis. Here, we differentiate between the learning approach, which can be in general
divided into supervised learning, unsupervised learning and reinforcement learning. Since the
core of this survey is the integration of both ML and AOP, we focus on adding both dimensions
together accordingly by investigating the different approaches. Therefore, we distinguish in
the third dimension the type of integration, represented in the 𝑧-axis. Here, we distinguish a
loosely coupled integration, a hard-coded, and a combination of soft and hard-coded integration
types. Throughout the survey process, we assign each considered work and approach. It is
important to note that, we constrict the surveyed literature mainly to the approaches, where
ML is considered for BDI agents. Thus, we have to neglect prominent works where Learning is
investigated into other types of cognitive architectures like SOAR or Act-R 4 . From this starting
point, we went through the publications cited in the considered works. Due to space constraints,
we consider specific representative works for ML-COG. We also apologize to the authors whose
work we had to omit due to space constraints.


4. Literature Review
Based on the approach explained in section 3, this survey covers works that combine ML for
AOP, especially considering the BDI architecture. The literature collection is processed by
selecting works where ML approaches are applied to BDI agents, i.e. the learning algorithm is
integrated into the BDI cycle. We examined a plethora of works neglecting approaches, where
ML is though considered but not for BDI agents. One work mentioned before is from Bordini et
al. [3], where the literature is examined with respect to Artificial Intelligence in general for BDI
agents. The mentioned work considers ML approaches but is not limited to. Whereas in this
survey, the focus solely lies on ML for BDI agents. Based on the fact, that they cover a broader
range of the literature spectrum, they do not go into detail for specifically mentioned works
that are also subject to this survey. In this survey, the ML paradigm and BDI architecture are
opposed, and thus, we explain the related literature in this more specific context considering the
introduced categorization scheme. In this survey, we filter in section 5 challenges concerning
ML and AOP.



    4
        Interested readers are referred to Broekens et al.[16] and Nason & Laird [17].
4.1. BDI and Decision trees
One of the first works mentioning ML approaches for BDI agents explicitly is from Guerra-
Hernández et al.[5, 18, 19] where the plan selection process is investigated by applying logical
decision trees. As a typical Supervised Learning approach, this method is integrated into the BDI
cycle by adding the decision tree into the interpreter of the agent, transforming the selected
plans into intentions.Phung et al. [20] apply decision trees for BDI agents using a learning-based
framework, where the learning component is added to the BDI cycle. The agent processes its
past experience to adapt it to the current behavior with respect to background knowledge. The
result of the learning algorithm is then added to the beliefs of the agent. In the work of Airiau
et al. [21], the BDI agent is investigated to learn from past experience by preventing failed
plan executions. In an initial step, the relation of goals and plans is represented by means of
goal-plan trees. A goal-plan tree contains the defined goals and their corresponding plans of
a BDI agent, leading to a hierarchical tree structure with goals and possible sub-plans. In the
thesis of Singh [22], the plan selection step in the BDI cycle is tackled with different approaches.
Multiple works related to the author are therefore considered. The works of Singh et al.[23, 24]
build upon the previous paper [21] and add context conditions for the plan selection process in
form of decision trees. In common, a context condition is a Boolean function that needs to be
predetermined during the implementation phase. It is attached to each plan and describes the
conditions and whether a plan is useful to a corresponding goal in a specific situation. Focusing
on the learned behavior, the decision tree is built up for each considered plan in the agent’s
library. Each tree, therefore, leads to a decision of whether the plan will be successful or fail
with a probability score. A further extension of this work is from Singh et al. [25], where plan
selection considering changing dynamics is investigated. A confidence measure function for the
degree of stability of plans is presented with respect to execution traces and the environment of
the agents. The resulting weights are added to the plans denoting the success of being applied
for a corresponding goal.

4.2. BDI and Reinforcement Learning
The thesis of Feliu [26] considers the application of Reinforcement Learning (RL) for generating
plans in BDI agents without relying on earlier knowledge. The author covers some related
works concerning BDI and learning, which are also objects of this survey. Related to this
setting, where RL is applied for BDI are from Pereira & Dimuro[27, 28] and Chang [29] . In these
works, BDI and RL are combined by integrating learning into the BDI cycle. The work of Qi &
Bo-ying [30] investigates a combination of RL and BDI for robot soccer simulation. Here, RL is
considered as a feedback process by using the Q-Learning algorithm for the simulation steps.
The learning algorithm is not integrated into the BDI architecture but processes the outcome of
the BDI agent’s action. Another approach in the same setting is presented by Wan et al. [31]
where a BDI agent is extended with Q-Learning in AgentSpeak, which is the language definition
instance the agents use to communicate with each other. More specifically, the plan library is
improved by the Q-learning decision algorithm in an uncertain environment. What they found
out is, that in state space exploration, which is the obligatory step in RL, the communication
of AgentSpeak slowed down. For faster convergence, Deep Reinforcement Learning (DRL)
seems to be a suitable approach. The latter is also mentioned in section 5. Action selection
based on rules is a challenge inside this area which is tackled by Broekens & Hindriks [16]. In
this work, the authors use RL for the rule selection step, which slightly differs from the action
selection process. In the typical RL setting, the learned behavior is the corresponding action. In
contrast in this work, an internal uninstantiated rule is selected during the learning process.
They consider the GOAL agent programming language. The relevant components for learning
are reflected in the states, which are built up with a set of rules of the agents and the number
of active goals. The considered state representation seems to be an initial version for learning
but is capable to deliver interesting results for rule selection. The learning process takes place
inside the agent architecture.
Initial works of combining elements of the RL setting with partial observability has been
investigated by Rens et al. [32]. Here, the authors combine the BDI architecture with the
Partially observable Markov decision process (POMDP) plan approach providing initial results by
considering small experimental settings. They argue in favor of a more complex simulation
environment. For this approach, Chen et al. [33] integrate the POMDP into the planning phase
of the BDI architecture by considering AgentSpeak. Nair & Tambe also investigate in [34] the
concept of POMDP for the BDI paradigm. They consider Multi-agent teaming by POMDP and
team-oriented programming. Another work concerning this specification is from Rens & Moodley
[35], where the reward-maximizing approach of POMDP and the management of multiple goals
in BDI systems are combined. These works open up opportunities for investigating RL and
BDI in Multi-agent settings. Bosello & Ricci [36] extend the BDI architecture with RL. They
consider SARSA algorithm for the decision-making of the agent. A Low-level learning approach
represented in the BDI-FALCON agent architecture, which is presented in Tan et al. [37, 38]. At
its lowest level, BDI-FALCON contains a reactive learning module based on Temporal Difference
Learning (TD), an RL algorithm that estimates a value function of state-action pairs 𝑄(𝑠, 𝑎)
that indicates the learning step of the system. Two other modules contain the BDI-native
components like goals and plans which are sent to the low-level RL environment. Karim et al.
[39] propose an approach, where learning with a high level of abstraction by a BDI agent is
connected to a low-level RL environment, based on BDI-FALCON [37]. Resulting in a hybrid
architecture, the BDI agent generates plans that are derived from the RL environment. Norling
[40] integrates the Q-Learning algorithm into the BDI cycle to learn rules for pathfinding in
a grid world. It is evaluated in a simple grid environment. Subagdja & Sonenberg [41] also
integrate Q-Learning algorithm into the BDI agent cycle. They introduce meta-level plans which
are considered for monitoring, the reasoning step, and the monitoring of plans executed. Badica
et al. [42] apply TD for BDI agents. Considering a grid scenario, they define the agent’s actions
as well as specific states representing the corresponding goals. Singh & Hindriks investigate in
[43] the Q-Learning algorithm for adaptive behaviors in autonomous BDI agents.

4.3. Literature categorization overview
In table 1, we have listed a selection of the research works handled in this survey and classified
them with respect to the ML-COG dimensions neglecting mentioned other surveys in the
previous section. For future research in this area, the open research challenges in Section 5
can be considered. For the sake of clarity, we set up the columns reflecting the dimensions of
Table 1
                         Overview of selected surveyed literature based on ML-COG
                Work                   SA/MA          Learning              Integration            Objective
       Airiau et al., 2009 [21]          SA         Decision Tree              Hard              Plan selection
      Badica et al., 2012 [42]           SA        TD-Learning (RL)            Hard              RL-BDI agent
    Bosello & Ricci, 2019 [36]           SA          SARSA (RL)             Hard & Soft          RL-BDI agent
     Broekens et al., 2012 [16]          SA        Model-based RL              Hard              Rule selection
           Feliu, 2013 [26]              SA        Q-Learning (RL)          Hard & Soft         Plan generation
      Heinze et al., 1999 [44]           SA          CLARET (SL)          Loosely coupled      Plan recognition
    Hernandez et al., 2004 [5]         SA/MA        Decision Tree              Hard             Plan execution
     Karim et al., 2006-a [39]           SA              RL               Loosely coupled       Plan execution
     Karim et al., 2006-b [45]           SA              RL               Loosely coupled       Plan execution
  Lokuge & Alahakoon, 2007 [46]          SA             KAM                 Hard & Soft       Intention selection
         Norling, 2001 [47]              SA            RPDM                     (?)            Decision Making
         Norling, 2004 [40]              SA       RPDM/Q-Learning               (?)            Decision Making
      Phung et al., 2005 [20]            SA         Decision Tree           Hard & Soft       Experience learning
         Qi et al., 2009 [30]          SA/MA         Q-Learning             Hard & Soft        Decision Making
     Rodrigues et al., 2022 [6]          SA      Deep Neural Network        Hard & Soft        Decision making
      Singh et al., 2010-a [24]          SA         Decision Tree              Hard              Plan selection
      Singh et al., 2010-b [23]          SA         Decision Tree              Hard              Plan selection
       Singh et al. 2011 [25]            SA         Decision Tree              Hard              Plan selection
   Singh & Hindriks, 2013 [43]           SA          Q-Learning             Hard & Soft          BDI-RL agent
 Subagdja & Sonenberg, 2005 [41]         SA          Q-Learning                Hard         Learning Plans /Actions
        Tan et al., 2011 [37]            SA          TD-Learning          Loosely coupled        Plan selection
        Wan et al., 2018 [31]            SA        Q-Learning (RL)             Hard              RL-BDI agent


ML-COG. The second column contains the information, whether a Single-Agent (SA) or a Multi-
Agent (MA) setting is considered. In the cube, SA is placed into AP and MA is placed into MAP.
In addition, the last column Objective contains the contribution goal of the corresponding work.
Note, that for the dimension Cognitive agent development, we distinguish between Single-agent
(SA) and Multi-agent(MA) development. In the context of the survey, it is equally to AP and
MAP, as explained in section 3. Note, that we have left out the works from the table, where
learning approaches are not explicitly implemented or executed 5 .


5. Open research challenges
The research done so far in the intersection of ML and AOP provides many different applications,
where some of which have been elaborated on in the previous section. The ML-COG cube is in
an ongoing development process. It will be extended with finer-grained categorization scales
thus being subject to future research. Since the categorization process follows the presented
dimensions, we point out the following application areas, which are picked due to their technical
proximity as well as based on the contributions and potential limitations in the investigated
literature. Therefore, we list the following areas for future research:

   1. Communication protocols and Emergent communication
   2. Cognitive decision making and Learned behavior
   3. Goal-level learning
   5
       A ? entry denotes, that the implementation type is not clearly classifiable.
   The overall aim is to provide a high level of abstraction with the usage of learning-based
components. The areas 1 and 2 are intentionally formulated each with two extremes, indicating
the different approaches to agent development in the programming phase. The first area ranges
from predefined communication protocols like AgentSpeak [48] which is considered in a MAS
over to emergent communication [49] in learning-based agents interacting with each other.
Current research in emergent communication provides RL algorithms in Multi-agent settings
to encourage agents to communicate with each other based on single and collective rewards.
This area is important, especially in MAS where reliable communication leads to efficient
coordination and cooperation. In table 1, one can see that nearly all works focus on the single-
agent setting. The shift to MAS is therefore a crucial step in inspecting the behavior of BDI
learning agents interacting with each other. A combination of learning-based communication
with initial rules represents such a combination approach. The advantage overall is a better
explainable learned behavior and thus the corresponding actions of the agents [50]. In the
second area, we distinguish rather different agent types which are commonly considered in
MAS as it is presented by Russel & Norvig in [4]. Decision-making is the essential step agents
process to reach their goals successfully. The research in Multi-agent Learning based on RL
algorithms has already covered a broad range of settings starting from single-agent settings to
MAS settings with different applications [51]. Here, we see future work in the MAS settings
based on cognitive decision-making based on the BDI architecture. Works covered in this survey
already provide solution for the single-agent setting [36, 37]. Furthermore, the usage of Neural
networks is rather a rarity and one recent work combining BDI agents and Neural networks are
from Rodrigues et al. [6]. As a third area, we see learning at the Goal level as a novel approach to
connecting ML and BDI. In the surveyed literature, learning at the Plan level is predominantly
tackled but no Goal-level learning is presented. Sub-symbolic learning methods, like Neural
Networks, could be therefore considered.


6. Conclusion
Learning methods in MAS differ from the traditional ML process since the autonomous and
flexible behavior of the agents is considered and furthermore, interacting in a complex and
dynamic environment. This survey aims to get in the lane at the intersection of ML and AOP
by comprising the relevant work done over the years in the field. In ML research, the term
agent is predominantly considered as a concept rather than an existing instance with explicit
developed cognitive capabilities as it is in software agents [1]. Such a form of disambiguation also
influences the contextual understanding of our work. In spite of the fact that this intersection is
based on different approaches, cognitive software agents have not been considered sufficiently
in ML research and therefore represent a relevant direction for current research. The analysis of
such an integration process will lead to better insight into the functioning of learned behaviors
in a cognitive framework.
References
 [1] V. Dignum, F. Dignum, Agents are dead. long live agents!, in: Proceedings of the 19th
     AAMAS, 2020, pp. 1701–1705.
 [2] Y. Shoham, Why knowledge representation matters, Communications of the ACM 59
     (2015) 47–49.
 [3] R. H. Bordini, A. El Fallah Seghrouchni, K. Hindriks, B. Logan, A. Ricci, Agent programming
     in the cognitive era, Autonomous Agents and Multi-Agent Systems 34 (2020).
 [4] S. J. Russell, P. Norvig, Artificial Intelligence: a modern approach, 3 ed., Pearson, 2009.
 [5] A. G. Hernández, A. El Fallah-Seghrouchni, H. Soldano, Learning in BDI Multi-agent
     Systems, in: J. Dix, J. Leite (Eds.), Computational Logic in Multi-Agent Systems, volume
     3259, Springer Berlin Heidelberg, 2004, pp. 218–233.
 [6] R. Rodrigues, R. A. Silveira, R. D. Santiago, A Mediator Agent based on Multi-Context
     System and Information Retrieval, ICAART 2022 (2022).
 [7] O. I. Erduran, M. Minor, L. Hedrich, A. Tarraf, F. Ruehl, H. Schroth, Multi-agent Learning for
     Energy-Aware Placement of Autonomous Vehicles, in: 18th IEEE International Conference
     On Machine Learning And Applications (ICMLA), IEEE, Boca Raton, FL, USA, 2019, pp.
     1671–1678.
 [8] M. J. Wooldridge, An introduction to multiagent systems, 2nd ed ed., John Wiley & Sons,
     Chichester, U.K, 2009.
 [9] J. Bryson, Cross-paradigm analysis of autonomous agent architecture, Journal of Experi-
     mental & Theoretical Artificial Intelligence 12 (2000) 165–189.
[10] M. Minsky, Logical Versus Analogical or Symbolic Versus Connectionist or Neat Versus
     Scruffy, in: AI Magazine Volume 12 Number 2, 1991.
[11] L. d. Silva, F. Meneguzzi, B. Logan, BDI Agent Architectures: A Survey, in: Proceedings of
     the Twenty-Ninth IJCAI, Yokohama, Japan, 2020, pp. 4914–4921.
[12] A. Deljoo, T. M. van Engers, L. Gommans, C. T. de Laat, et al., What is going on: Utility-
     based plan selection in bdi agents., in: AAAI Workshops, 2017.
[13] Y. Shoham, Agent-oriented programming, Artificial Intelligence 60 (1993) 51–92.
[14] R. H. Bordini, M. Dastani, J. Dix, A. El Fallah Seghrouchni (Eds.), Multi-Agent Programming:
     : Languages, Tools and Applications, Springer US, 2009.
[15] A. Sturm, O. Shehory, Agent-Oriented Software Engineering: Revisiting the State of the
     Art, Springer Berlin Heidelberg, 2014, pp. 13–26.
[16] J. Broekens, K. Hindriks, P. Wiggers, Reinforcement Learning as Heuristic for Action-Rule
     Preferences, in: R. Collier, J. Dix, P. Novák (Eds.), Programming Multi-Agent Systems,
     volume 6599, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 25–40.
[17] S. Nason, J. E. Laird, Soar-RL: integrating reinforcement learning with Soar, Cognitive
     Systems Research 6 (2005) 51–59.
[18] A. Hernandez, A. El Fallah-Seghrouchni, H. Soldano, Distributed learning in intentional
     bdi multi-agent systems, in: Proceedings of the Fifth Mexican International Conference in
     Computer Science, 2004. ENC 2004., 2004, pp. 225–232.
[19] A. G. Hernandez, A. E.-F. Seghrouchni, H. Soldano, Bdi multiagent learning based on
     first-order induction of logical decision trees, in: Intelligent Agent Technology: Research
     and Development, World Scientific, 2001, pp. 160–169.
[20] T. Phung, M. Winikoff, L. Padgham, Learning Within the BDI Framework: An Empirical
     Analysis, in: Knowledge-Based Intelligent Information and Engineering Systems, volume
     3683, Springer Berlin Heidelberg, 2005, pp. 282–288.
[21] S. Airiau, L. Padgham, S. Sardina, S. Sen, Enhancing the Adaptation of BDI Agents Using
     Learning Techniques, International Journal of Agent Technologies and Systems 1 (2009)
     1–18.
[22] D. Singh, Learning plan selection for BDI agent systems, Ph.D. thesis, RMIT University,
     2011.
[23] D. Singh, S. Sardina, L. Padgham, S. Airiau, Learning Context Conditions for BDI Plan
     Selection, AAMAS (2010).
[24] D. Singh, S. Sardina, L. Padgham, Extending BDI plan selection to incorporate learning
     from experience, Robotics and Autonomous Systems 58 (2010) 1067–1075.
[25] D. Singh, S. Sardina, L. Padgham, G. James, Integrating learning into a bdi agent for
     environments with changing dynamics, in: Twenty-Second IJCAI, 2011.
[26] J. Feliu, Use of Reinforcement Learning for Plan Generation in Belief-Desire-Intention
     (BDI) Agent Systems, Ph.D. thesis, University of Rhode Island, Kingston, RI, 2013. doi:10.
     23860/thesis- feliu- jose- 2013 .
[27] D. R. Pereira, L. V. Goncalves, G. P. Dimuro, Constructing BDI Plans from Optimal POMDP
     Policies, with an Application to AgentSpeak Programming, Conferencia Latinoamericana
     de Informatica, CLEI (2008) 11.
[28] D. R. Pereira, G. P. Dimuro, Um algoritmo para extraçao de um plano bdi que obedece uma
     polıtica mdp otima, in: Anais do Workshop-Escola de Sistemas de Agentes para Ambientes
     Colaborativos, Pelotas, PPGINF/UCPel, 2007.
[29] S. Chang, Simulacao de ambientes multiagente normativos, Workshop-Escola de Sistemas
     de Agentes para Ambientes Colaborativos WESAAC (2007).
[30] G. Qi, W. Bo-ying, Study and Application of Reinforcement Learning in Cooperative
     Strategy of the Robot Soccer Based on BDI Model, International Journal of Advanced
     Robotic Systems 6 (2009) 15.
[31] Q. Wan, W. Liu, L. Xu, J. Guo, Extending the BDI Model with Q-learning in Uncertain
     Environment, in: International Conference on Algorithms, Computing and Artificial
     Intelligence, ACM, Sanya China, 2018, pp. 1–6.
[32] G. Rens, A. Ferrein, E. Van Der Poel, A BDI Agent Architecture for a POMDP Plan-
     ner, 9th International Symposium on Logical Formalization of Commonsense Reasoning:
     Commonsense (2009).
[33] Y. Chen, K. Bauters, W. Liu, J. Hong, K. McAreavey, L. Godo, C. Sierra, Agentspeak+:
     Agentspeak with probabilistic planning, Proc. of CIMA (2014) 15–20.
[34] R. Nair, M. Tambe, Hybrid BDI-POMDP Framework for Multiagent Teaming, Journal of
     Artificial Intelligence Research 23 (2005) 367–420.
[35] G. Rens, D. Moodley, A hybrid POMDP-BDI agent architecture with online stochastic
     planning and plan caching, Cognitive Systems Research 43 (2017) 1–20.
[36] M. Bosello, A. Ricci, From Programming Agents to Educating Agents – A Jason-Based
     Framework for Integrating Learning in the Development of Cognitive Agents, in: L. A.
     Dennis, R. H. Bordini, Y. Lespérance (Eds.), Engineering Multi-Agent Systems, volume
     12058, Springer International Publishing, Cham, 2020, pp. 175–194.
[37] A.-H. Tan, Y.-S. Ong, A. Tapanuj, A hybrid agent architecture integrating desire, intention
     and reinforcement learning, Expert Systems with Applications 38 (2011).
[38] A.-H. Tan, Falcon: a fusion architecture for learning, cognition, and navigation, in:
     2004 IEEE International Joint Conference on Neural Networks (IEEE), volume 4, 2004, pp.
     3297–3302 vol.4.
[39] S. Karim, L. Sonenberg, A.-H. Tan, A Hybrid Architecture Combining Reactive Plan
     Execution and Reactive Learning, in: PRICAI 2006: Trends in Artificial Intelligence,
     volume 4099, Springer Berlin Heidelberg, 2006, pp. 200–211.
[40] E. Norling, Folk psychology for human modelling: Extending the bdi paradigm, in:
     Proceedings of the Third International Joint Conference on Autonomous Agents and
     Multiagent Systems-Volume 1, 2004, pp. 202–209.
[41] B. Subagdja, L. Sonenberg, Learning plans with patterns of actions in bounded-rational
     agents, Springer-Verlag (2005) 30–36.
[42] A. Badica, C. Badica, M. Ivanovic, D. Mitrovic, An Approach of Temporal Difference
     Learning Using Agent-Oriented Programming, in: 20th International Conference on
     Control Systems and Computer Science, IEEE, 2015, pp. 735–742.
[43] D. Singh, K. V. Hindriks, Learning to Improve Agent Behaviours in GOAL, in: Programming
     Multi-Agent Systems, volume 7837, Springer Berlin Heidelberg, 2013, pp. 158–173.
[44] C. Heinze, S. Goss, A. Pearce, Plan recognition in military simulation: Incorporating
     machine learning with intelligent agents, in: Proceedings of IJCAI-99 Workshop on Team
     Behaviour and Plan Recognition, Citeseer, 1999, pp. 53–64.
[45] S. Karim, B. Subagdja, L. Sonenberg, Plans as Products of Learning, in: 2006
     IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, Hong
     Kong, China, 2006, pp. 139–145.
[46] P. Lokuge, D. Alahakoon, Improving the adaptability in automated vessel scheduling
     in container ports using intelligent software agents, European Journal of Operational
     Research 177 (2007) 1985–2015.
[47] E. Norling, Learning to notice: Adaptive models of human operators, in: Second Interna-
     tional Workshop on Learning Agents, Citeseer, 2001.
[48] R. H. Bordini, J. F. Hübner, M. Wooldridge, Programming multi-agent systems in AgentS-
     peak using Jason, John Wiley & Sons, 2007.
[49] M. Noukhovitch, T. LaCroix, A. Lazaridou, A. Courville, Emergent communication under
     competition, in: Proceedings of the 20th International Conference on AAMAS, Interna-
     tional Foundation for Autonomous Agents and Multiagent Systems, UK, 2021, p. 974–982.
[50] J. Broekens, M. Harbers, K. Hindriks, K. v. d. Bosch, C. Jonker, J.-J. Meyer, Do you get
     it? user-evaluated explainable bdi agents, in: German Conference on Multiagent System
     Technologies, Springer, 2010, pp. 28–39.
[51] S. Gronauer, K. Diepold, Multi-agent deep reinforcement learning: a survey, Artificial
     Intelligence Review 55 (2022) 895–943.