Epistemic Planning in a Fast and Slow Setting
Francesco Fabiano1,* , Marianna Bergamaschi Ganapini2 , Lior Horesh3 ,
Andrea Loreggia4 , Keerthiram Murugesan3 , Vishal Pallagani5 , Francesca Rossi3 and
Biplav Srivastava5
1
  Department of Mathematical, Physical and Computer Sciences, University of Parma, Parma, Italy
2
  Philosophy Department, Union College, New York, USA
3
  Thomas J. Watson Research Center, Yorktown Heights, New York, USA
4
  Department of Information Engineering, University of Brescia, Brescia, Italy
5
  Artificial Intelligence Institute, University of South Carolina, Columbia, United States


                                         Abstract
                                         AI applications are by now pervading our everyday life. Nonetheless, most of these systems lack many
                                         capabilities that, we humans, naturally consider to be included in a notion of “intelligence”. In this paper
                                         we present a multi-agent system, inspired by the cognitive theory known as thinking fast and slow by D.
                                         Kahneman, to solve Multi-agent Epistemic Planning (MEP) problems. This is an instance of a general
                                         AI architecture, referred to as SOFAI (for Slow and Fast AI). This paradigm exploits multiple solving
                                         approaches (referred to as fast and slow solvers) and a metacognition module to arbitrate between them
                                         and enhance the reasoning process, that, in this specific case, is concerned with planning in epistemic
                                         settings. The behavior of this system is then compared to a state-of-the-art MEP solver, showing that
                                         the newly introduced system presents better results in terms of generality, solving a much wider set of
                                         problems with an acceptable trade-off between solving times and solution accuracy.

                                         Keywords
                                         Fast and Slow AI, Epistemic Planning, Metacognitive Reasoning


1. Introduction
The AI community has formalized and developed several techniques that permit to model agents
which can solve intricate problems in autonomy. These techniques range from the use of various
formal logics to the creation of neural-based structures. The former is an area of study that
tries to define rational behavior for our systems while the latter, imitating the physiology of
our brain, aims to emulate (to some degree) the human behavior.
   In this work we present an architecture that aims to allow multiple techniques to be exploited
and we show its behavior on solving multi-agent epistemic planning problems (MEPs). This

AAAI 2022 FALL SYMPOSIUM SERIES, Thinking Fast and Slow and Other Cognitive Theories in AI track, November
17-19, Westin Arlington Gateway in Arlington, Virginia, USA
*
  Corresponding author.
$ francesco.fabiano@unipr.it (F. Fabiano); bergamam@union.edu (M. B. Ganapini); lhoresh@us.ibm.com
(L. Horesh); andrea.loreggia@unibs.it (A. Loreggia); keerthiram.murugesan@ibm.com (K. Murugesan);
vishalp@mailbox.sc.edu (V. Pallagani); francesca.rossi2@ibm.com (F. Rossi); biplav.s@sc.edu (B. Srivastava)
 0000-0002-1161-0336 (F. Fabiano); 0000-0001-6652-9853 (M. B. Ganapini); 0000-0001-6350-0238 (L. Horesh);
0000-0002-9846-0157 (A. Loreggia); 0000-0001-8898-219X (F. Rossi); 0000-0002-7292-3838 (B. Srivastava)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
architecture is inspired by the well-know cognitive theory Thinking Fast and Slow by Kahneman
[1], since it includes both “fast” and “slow” solvers and a metacognitive module to arbiter between
the two. Slow (or System 2, System-2) solvers are solving problems by reasoning and (usually)
exploiting symbolic techniques, while fast (or System 1, System-1) solvers just employ past
experience to identify the solution to a given problem. A metacognitive module provides a
centralized governance and chooses the best solver for the problem at hand [2].
   In this work, we consider the planning domain and we employ an instance of this architecture
that includes an existing epistemic planner as the slow solver and three case-based plan selectors
as fast solvers. Experimental results on a widely used planning problem domain show that the
behavior of our architecture is better than using the existing planner alone, both in terms of
solved instance and average solving time.
   Summarizing, the main contributions of this paper are:

    • The definition of an architecture, inspired by the thinking fast and slow theory, to tackle
      epistemic planning problems;

    • The characterization of fast and slow planners/solvers for the architecture, as well as the
      metacognitive module;

    • Experimental results on an epistemic planning domain, showing the behavior of the
      architecture when using each of three different fast planners.

  The paper is structured as follow. After this brief introduction, we introduce the background
knowledge on the thinking fast and slow theory and multi-agent epistemic planning. We
then describe how to unify the main concepts of the thinking fast and slow theory in the
epistemic planning environment and we give a complete characterization of the architecture,
with special emphasis on the metacognitive module and the two kinds of solvers. We follow
with a description of the experimental setting, tables and graphs providing the experimental
results showing the behavior of our system on planning problems, and a discussion of such
results. We conclude the paper summarizing the main contribution and hinting at ongoing
work.


2. Background
2.1. Thinking Fast and Slow
Thanks to improved algorithms, techniques, computational power, and dedicated hardware [3],
the various automated reasoning tools are becoming more and more efficient and reliable in
dealing with their areas of interest. However, all of these tools still lack capabilities that, we
humans, naturally consider to be included in a notion of “intelligence” as, for example, generaliz-
ability, robustness, and abstraction. For these reasons, a growing segment of the AI community
is attempting to address these limitations and is trying to create systems that display more
“human-like qualities”. One of the central strategies to tackle this problem, adopted by various
research groups [4, 5, 6, 7, 8], envisions tools, usually referred to as cognitive architectures [9],
that exploit a combination of both the aforementioned approaches. In particular, in this paper,
we explore multi-agent epistemic planning in the context of one of these architecture that stems
from a modern cognitive theory, i.e., the well-known Thinking Fast and Slow paradigm proposed
by D. Kahneman [1].
   Kahneman’s theory states that humans’ reasoning capabilities are categorized into two diverse
“Systems”, called System-1 (S1) and System-2 (S2). In particular, S1 identifies the intuitive
and imprecise decision-making processes (“thinking fast”), while S2 provides tools to tackle
all those situations where complex decisions need to be taken following logical and rational
thinking (“thinking slow”). Other than problem difficulty, S1 and S2 discern which problem
they should tackle based on the experience accumulated on the problem itself. That is, when
a new non-trivial problem has to be solved, it is handled by S2. However, certain problems
that initially can be solved only by S2, can later be solved by S1 after having accumulated a
certain amount of experience. The reason is that the procedures used by S2 to find solutions to
such problems also accumulate examples that S1 can later use readily with little effort. We note
that S1 and S2 are not systems in the multi-agent sense, but rather they encapsulate two wide
classes of information processing.

2.2. Multi-agent Epistemic Planning
Automated planning is a branch of artificial intelligence where the objective is to find plans, that
is, sequences of actions, that can lead the agent to achieve desired goals. Epistemic planning is
a specific form of planning where the agent must additionally deal with the epistemic notions
of knowledge and beliefs. In the Multi-agent Epistemic Planning (MEP) problem, there are at
least two planning agents. In what follows, we will only give an overview of MEP, referring the
readers to Fagin et al. [10], Baral et al. [11], Van Ditmarsch et al. [12], Bolander [13] for details.
    The idea of representing knowledge and beliefs has always been central in many research
areas such as logics, philosophy, and computer science, e.g., see Hintikka [14]. In this setting, we
are concerned with finding the best series of action that modifies the information flows to enable
the agents to reach goals that (might) refer to physical objects or agents’ knowledge/beliefs [15].
We will use the term “knowledge” to encapsulate both the notions of an agent’s knowledge
and their beliefs. These concepts are distinct in Dynamic Epistemic Logic (DEL), which is the
underlying basis of the MEP problem, but for simplicity in our high level introduction we will
treat them as one. More details on the actual differences on these concept can be found in Fagin
et al. [10].
    While completely presenting DEL and the multi-agent epistemic setting background is not in
the scope of this paper, let us quickly introduce some fundamental concepts that would allow
to provide an intuitive meaning to this area of study. In particular, in a MEP problem we are
concerned with the knowledge of the agents about the world or about others’ knowledge. This
information is expressed through belief formulae, i.e., formulae that can express: i) physical
properties of the worlds; ii) knowledge of some agent (or group of agents) about these properties;
and iii) nested knowledge about others’ knowledge.
    The semantics of DEL formulae is traditionally expressed using pointed Kripke structures [16].
The epistemic action language that we used in our work implements three types of action and
three observability relations following the standard proposed by Baral et al. [11]. In particular,
each agent is assumed to be able to perform one of the following types of action: 1) World-altering
action: used to modify certain properties (i.e., fluents) of the world; 2) Sensing action: used by
an agent to refine her beliefs about the world; and 3) Announcement action: used by an agent to
affect the beliefs of other agents. Moreover, each agent is associated to one of the following
observability relations during an action execution: 1) Fully-observant: the agent is aware of the
action execution and also knows the effect of the action; 2) Partially-observant: the agent is
aware of the action execution without knowing the effects of the action; and 3) Oblivious: the
agent is not even aware of the action execution. Each type of action defines a diverse transition
function and alters an epistemic state in different ways.
   Let us note that the amount of information carried within a single epistemic state (i.e., a Kripke
structure) and the high degree of liberty derived by complex transition functions (e.g. Baral
et al. [11], Fabiano et al. [17, 18]) make planning in the multi-agent epistemic planning a very
heavy-resource process that often brings unfeasibility to the planning task itself [19].


3. Thinking Fast and Slow in MEP
Two of the prominent lines of work in AI, i.e., data-driven approaches and symbolic reasoning,
seem to embody (even if loosely) the two Systems presented above. In particular, data-driven
approaches shares with S1 the ability to build (possibly imprecise and biased) models from past
experience, often represented by sets of data. For example, perception activities, such as seeing,
that in humans are handled by S1, are currently addressed with machine learning techniques in
AI. Similarly, S2’s capability to solve complex problems using a knowledge-based approach is
somewhat emulated by AI techniques based on logic, search, and planning, that make use of
explicit and well-structured knowledge. While the parallelism data-driven–S1 and symbolic
–S2 represent a starting point in developing an automated fast and slow AI, we should not
assume these two techniques to be the exclusive representative of the respective System.
  In this paper, we transpose the concepts derived by the thinking fast and slow paradigm into
the MEP setting. We will start by presenting a general definitions for S1 and S2 solvers and
then describe the actual implementations of S1 and S2 reasoners in the epistemic setting. We
will make use of three models to represent key modules of our architecture, that from now on
we will call Plan-SOFAI. In particular, the model of self is used to store the experience of the
architecture, the model of the world contains the knowledge accumulated by the system over the
external environment and the expected tasks, while the model of others contains the knowledge
and beliefs about other agents who may act in the same environment. Finally, the model updater
acts in the background to keep all models updated as new knowledge of the world, of other
agents, or new decisions are generated and evaluated.
  The general characterization of a S1 solver, triggered immediately when the problem is
presented to the Plan-SOFAI, does not require many factors.

    • These solvers are assumed to rely on the past experience of the Plan-SOFAI itself.

    • Moreover, we assume that the running time for S1 approaches to be independent of the
      input and, instead, to depend on the experience accumulated by the overall architecture,
      in the model of self.
    • Finally, we consider a S1 solver to be an entity that relies on “intuition” (with a slight
      abuse of notation).

Considering these characteristics, the next question that naturally arises is can MEP ever be
considered as a S1 task, considering that epistemic planners, in literature, always rely on look-ahead
strategies? We considered some ideas that could help us develop a S1 epistemic planner. Among
those, only a few were not using search methods (intensively) but rather mostly relied on
experience. Finally, we identified a feasible, yet functional, way to exploit experience in the
epistemic planning setting. The idea is to make use of pre-computed plans; that is, S1 can be
used to determine which of the plans already generated by past experiences is the one that
“fits the best” the current problem. Of course, determining if an already computed plan is a
good choice or not for the current problem is a difficult research question on its own. Since the
focus of this work is to devise a fast and slow architecture for epistemic planning rather than
optimizing its internal components, we decided to use a simple, yet effective, criterion to select
the best fitting plan. In particular, S1 selects, among past solutions for the same domain, the
pre-computed plan that is the closet in term of Levenshtein Distance and Jaccard Similarity [20],
as we will see in more detail later.
   Our Plan-SOFAI is a S1-by-default architecture: whenever a new problem is presented,
a S1 solver with the necessary skills to solve the problem starts working on it, generating a
solution and a confidence level. This allows to minimize the resource consumption making use
of the much faster S1 solving process when there is no need for S2—that is when the solution
proposed by S1 is “good enough”. Nevertheless, as for the human brain, S1 may encounter
problems that it cannot solve, either due to its lack of experience or the inherent intricacy
of the problem itself. These situations require, then, the use of more thought-out resolution
processes, generally provided by S2 approaches. Notice that we do not assume S2 solvers to be
always better than S1 solvers: given enough experience, some tasks could be better solved by
S1 solvers. This behavior also happens in human reasoning [21]. In the particular case of MEP,
we can consider as S2 solving procedures the tools that employ traditional planning strategies.
These can be, for example, the planner RP-MEP [22] or EFP 2.0 [17]. While these two solvers
adopt different strategies to solve a Multi-agent Epistemic Planning problem, they both explore
the search space and do not rely on experience. In particular, as we will see later, in this work
we will make use of EFP 2.0 by Fabiano et al. [17].


4. The Fast and Slow MEP Architecture
4.1. The SOFAI Architecture
As main contribution of this paper, we present an architecture inspired by cognitive theories
to solve the multi-agent epistemic planning problem. In particular, our tool is heavily based
on a recent architecture called SOFAI [8] that is, in turn, inspired by the dual-system proposed
by Kahneman [1]. Following the ideas of Kahneman, the architecture is equipped with two
types of Systems dedicated to computing a solution to an incoming task, and a third agent in
charge of orchestrating the overall reasoning task.
   In this architecture, incoming problems are initially handled by S1 solvers that have the
required skills to tackle them. S1 solvers compute a solution relying on the past experience
collected by the architecture. The computation is not affected by the size of the input problem
and thus S1 solvers provide a solution in constant time. Past experience is maintained in the
model of the self. A model updater agent is in charge of keeping all models updated as new
something happens (e.g., new decisions are generated). Let us note that, in our case, the updater
agent is in charge of storing each new solution found by S2.
   The solution computed by the S1 solver (from now on for the sake of simplicity, let us
assume it is just one S1 solver) and the corresponding confidence level is made available to the
metacognitive (MC) module which can now choose between the proposed solution or engaging
a S2 solver. An S2 agent is typically a reasoning model that is able to deal with the current
problem, this kind of solver are more demanding in terms of time and other types of resources.
For this reason, only MC agent can decide to activate an S2 solver.
   MC agent assessment is based on several checks that are devoted to establishing whether it
is worth adopting the solution proposed by S1 or engaging S2 for a more accurate solution.
This allows for minimizing time to action when there is no need for S2 processing.

4.2. The Metacognitive Module
For our MEP solver, following the SOFAI architecture Ganapini et al. [8], we also defined a
metacognition process. This means that we want our Plan-SOFAI to be equipped with a set
of mechanisms that would allow it to both monitor and control its own cognitive activities,
processes, and structures. The goal of this form of control is to improve the quality of the
system’s decisions [23]. Metacognition models have been largely studied [24, 25, 9, 26] in the
past years. Among the various proposed modalities, we envisioned our Plan-SOFAI to have a
centralized metacognitive module that exploits both internal and external data and arbitrates
between S1 and S2 solvers. Let us note that this module is structurally and conceptually
different from an algorithm portfolio selection [27, 28].
   We propose a metacognitive module that itself follows the thinking fast and slow paradigm.
This means that our MC module is comprised of two main phases: the first one takes intuitive
decisions without considering many factors, while the second one is in charge of carefully
selecting the best solving strategy, considering all the available elements whenever the first
phase did not manage to return an adequate solution. We will refer to the former with MC-1 and
to the latter with MC-2. MC-1 is in charge of deciding whether to accept the solution proposed
by the S1 solver or to activate MC-2. MC-1 takes this decision considering the confidence,
among others few factors, of the S1 solver: if the confidence, which usually depends on the
amount of experience, is high enough, MC-1 adopts the S1 solver’s solution.
   If MC-1 decides that the solution of the S1 solver is not “good enough”, it engages MC-2.
Intuitively, this module needs to evaluate whether to accept the solution proposed by the S1
solver or which S2 solver to activate for the task. To do this, MC-2 compares the expected
reward for the S2 solver with the expected reward of the S1 one: if the expected additional
reward of running the S2 solver, compared to the S1 one, is large enough, then MC-2 activates
the S2 solver. MC-2, following the human reasoning model [29], is designed to avoid costly
reasoning processes unless the additional cost is compensated by an even greater expected
reward for the solution that the S2 solver will devise.
5. Metacognition at Work
In what follows, we provide a “concrete” view of the S1/S2 framework for the multi-agent
epistemic planning setting. To do so, we will make use of Algorithms 1–3.
   Before going into detail, let us briefly comment on the input and on the parameters of these
algorithms. The process requires the domain description (D), a particular instance (I ) that we
want to solve on such domain, and the time limit (TL) within which the instance needs to
be solved. The parameters, instead, represent some internal values that capture some sort of
“inclination” of the architecture towards employing S1. In particular, we have that:

   i) the acceptable correctness (A) represents the minimum ratio of solved goals, w.r.t. the
      total number of them, that defines an acceptable solution. Let us note that this measure
      can also be changed to depend on other factors or to account for goals importance, for
      example. Its default value is 0.5;

   ii) T1 represents the minimal amount of experience required by the Plan-SOFAI to consider
       a solution proposed by S1. Its default value is set to 20;

  iii) T2 represents the minimum number of S1 usages after which it will consider S1 account-
       able for its mistakes. This threshold allows the architecture to initially try to employ S1
       more freely, to augment its experience. Conversely, after the minimum number T2 of
       solutions, the metacognition actually uses the previous performances of S1 to check for
       S1 accountability. Its default value is set to 20;

  iv) T3 is a value between 0 and 1 and it is used to represents the risk-aversion of the archi-
      tecture: the higher the value the more incline the Plan-SOFAI is to use S2. The default
      value is set to 0.9;

   v) 𝜖 is a factor that is used to scale the probability that S1 solution may actually be employed
      even if it was not considered convenient. This is added to increase the number of S1
      usages, and consequently its experience, in those situations where the low confidence of
      S1 itself may limit it too aggressively. Let us note that the solution proposed by S1 needs
      to be validated before being accepted in any case (Line 2 of Algorithm 2). Its default value
      is 0.1;

  vi) (M) represents the experience of the Plan-SOFAI. Every time a solution for a problem is
      found, this is stored in the memory with a series of useful information, e.g., the correctness
      value, the employed system (i.e., S1 or S2), the difficulty of the instance, the required
      time, and so on.
Algorithm 1 Fast and Slow MEP Architecture
Input: Domain (D), Instance (I ), Time Limit (TL)
Parameter: Acceptable Correctness (A), T1, T2, T3, 𝜖, Memory (M),
Output: Plan (S), Correctness (C)
  1: Let p be the solution of I found by S1
  2: Let cx be the confidence of S1 on p.
  3: if |M.solved_instances(D)| ≥ T1 then
  4:    if |M.solved_instances(D, S1)| < T2 then
  5:       Let K = 0
  6:    else
  7:       Let avg_corr = 0
  8:       for all i ∈ M.solved_instances(D, S1) do
  9:          avg_corr + = |i.solved_goals()|
                                 |i.tot_goals()|
 10:       end for
 11:       Let K = 1− avg_corr
 12:    end if
 13:    if cx × (1 − K) ≥ T3 then
 14:       return ⟨S,C⟩ = try_S1(p, D, I, TL)
 15:    end if
 16: end if
 17: Let diff = I.compute_difficulty()
 18: Let est_t = M.avg_t_from_diff(diff)
 19: Let rem_t = TL − elapsed_t
 20: Let est_cost = rem_t
                        est_t

 21: if est_cost > 1 then
 22:    return ⟨S,C⟩ = try_S1(p, D, I, TL)
 23: else
 24:    Let prob = (1 − T3) × 𝜖
 25:    if prob ≥ generate_random_number(0, 1) then
 26:       return ⟨S,C⟩ = try_S1(p, D, I, TL)
 27:    else
 28:       C = |I.solved_goals(p)|
                   |I.tot_goals()|
 29:       if C ≥ A then
 30:          if (1 − (est_cost × (1 − T3)) ≥ C × (1 − K) then
 31:             return ⟨S, C⟩ = solve_with_S2(p,D,I,rem_t)
 32:          else
 33:             return ⟨S = p,C⟩
 34:          end if
 35:       else
 36:          return ⟨S, C⟩ = solve_with_S2(null,D,I,rem_t)
 37:       end if
 38:    end if
 39: end if
   We are now ready to describe in more detail Algorithms 1–3. Let us start by presenting
Algorithms 1. In particular, we can identify MC-1 in Lines 1–16, and MC-2 in Lines 17–39.
As already mentioned, to better emulate the thinking fast and slow paradigm, we assume
System-1 to automatically start and provide a solution at the beginning of MC-1. That is why
we start the metacognitive process by storing the results of such process in the variables p and
cx, that represent the solution found by S1 and the confidence that S1 has about this solution
appropriateness, respectively. The metacognitive process then proceeds to check whether the
experience accumulated by the architecture is enough to consider S1 reliable (Line 3). If the
architecture has enough experience, the metacognition considers the confidence of S1, adjusted
to take into account the previous solutions proposed by S1 itself (Lines 4–12), and determines
whether S1’s confidence is within the tolerated risk, identified by T3 (Line 13). If the confidence
of S1 is enough, then the Plan-SOFAI tries to employ S1’s solution (line 14).
   If, at any point, S1’s solution is considered not appropriate by the metacognitive process—
because it violates some checks—then MC-2 starts. This part of the procedure begins by
determining a value that represents the difficulty of the problem instance (derived by various
factors such as the number of agents, possible actions, fluents, and so on) at Line 17. This
measure is then used to determine the average solving time for a given difficulty and to estimate
the cost of solving the given problem (Line 18–20). If the cost exceeds 1 then there is not enough
time to call S2 and Plan-SOFAI tries to employ S1. The system can also adopt S1 with a
probability that is related with the risk aversion T3 and a parameter 𝜖, this is done in Lines
24–26, to improve the exploration skill of the architecture itself. The Plan-SOFAI evaluates
the solution proposed by S1 and, if it is acceptable (Line 29), whether the extra time required
by S2 counterbalanced the cost (Line 30). If the solution is not acceptable or the increase in
correctness is big enough S2 is called (Line 36 and 31, respectively), otherwise the solution
proposed by S1 is used (Line 33).

Algorithm 2 try_S1 function
Input: Plan (p), Domain (D), Instance (I ), Time Limit (TL)
Parameter: Acceptable Correctness (A)
Output: Plan (S), Correctness (C)
           |I.solved_goals(p)|
 1: C =       |I.tot_goals()|
 2: if C ≥ A then
 3:    return ⟨S = p, C⟩
 4: else
 5:   return ⟨S, C⟩ = solve_with_S2(null,D,I,TL)
 6: end if


   Algorithms 2 and 3 are instead used to try and adopt the solution proposed by S1 and to
try and solve the problem with S2, respectively. In particular, Algorithms 2 takes the solution
proposed by S1 and checks whether it has an acceptable degree of correctness. If it does then
the solution is employed, otherwise Algorithm 3 is called. This function simply calls the planner
EFP 2.0 on the instance of the problem to solve and, if it terminates before the available time
ends, it returns the plan found by the S2 planner with confidence equal to 1. If EFP 2.0 cannot
Algorithm 3 solve_with_S2 function
Input: Plan (p), Domain (D), Instance (I ) Time Limit (TL)
Output: Plan (S)
 1: if EFP 2.0 (D,I ) terminates within TL then
 2:    return ⟨S = EFP 2.0.get_plan(), C = 1⟩
 3: else if p ! = null then
 4:    return ⟨S = p, C = |I.solved_goals(p)|
                                |I.tot_goals()| ⟩
 5: else
 6:    OPT-OUT
 7: end if


find the solution within the time limit then the solution from S1, if acceptable, is adopted;
otherwise Plan-SOFAI returns no solution and terminates.

5.1. MEP S1 and S2 solvers
While in the previous paragraph, we described how our architecture decides which solving
approach is the most appropriate, here we will provide an high level overview of solving
processes themselves. In particular, we designed our S1 solver to solely rely on past experience,
accumulated by using either S1 itself or S2. The S1 solver analyzes the memory and looks,
through the various solved instances, which one is the closest to the problem that is being
tackled. Once the closest instance is identified, S1 returns the plan associated to it as a solution
and the distance value as measure for confidence.
   This distance can be calculated in two different ways, generating effectively two different S1
solvers1 . The first measure of distance considers the problems as a set of formulae, i.e., the ones
that comprise the initial and goal states, and adopts the well-known Jaccard Similarity [20],
that is the ratio between the union and the intersection of the two sets we are considering, as a
metric for finding similarity between the input problem and the instances existing in the case
library. The second metric is calculated by transforming the two instances into two distinct
strings, once again comprised of all the initial and goal states information (separated by the
special characters “|”), that are then compared using the Levenshtein distance [30] to determine
the actual distance measure. Let us note that since we are considering only instances of the
same domain, there is no need to incorporate other information. Nonetheless, if we would like
to compare also instances of different domains, the domains’ description could easily be added
to the representative of each instance.
   While MEP S1 solvers did not exists in literature and, therefore, we needed to implement
ad-hoc solutions, the same is not true for S2 planners. In fact, as mentioned we employed the
state-of-the-art comprehensive epistemic planner EFP 2.0 as our S2 solver. Given that explaining
how this planner works is beyond the scope of this paper, we can safely assume this approach to
be a black-box that returns the best solution possible to a MEP problem, if exists. Nonetheless,
we refer the interested readers to Fabiano et al. [17] for a detailed explanation on the internal

1
    We compare the performances of these two S1 solvers later in the paper.
mechanisms of EFP 2.0.


6. Experimental Results and Discussion
6.1. Experimental Setup
In this section, we compare the new multi-agent epistemic planning architecture introduced as
main contribution of this paper with EFP 2.0 [17] that, to the best of our knowledge, is the state-
of-the-art comprehensive multi-agent epistemic solver. All the experiments were performed on
3.00GHz Intel Core i7-5500U machine with 16GB of memory.
   As benchmark, we used a small variation of the standard epistemic planning domain known
as Coin in the Box (CB) [31, 32, 11]. In this domain 𝑛 ≥ 2 agents are in one of two rooms,
one of which contains a box with a coin inside. In the initial configuration, everybody knows
that: i) none of the agents know whether the coin lies heads or tails up; ii) the box is locked;
iii) only one agent has the key that opens the box; iv) each agent might be attentive or not
w.r.t. to certain actions execution; Moreover, we know that each agent can execute one of the
following actions: i) move: an agent can move to the other room; ii) open: an agent, if it has the
key, can open the box; iii) peek: to learn whether the coin lies heads or tails up, an agent can
peek into the box, but this requires the box to be open; iv) announce: this will result in all the
listening (i.e., attentive) agents to believe that the coin lies heads or tails up depending on the
announced value; v) distract/signal another agent: these actions will make an attentive
agent no more attentive or vice-versa, respectively. The goals usually consist in some agents
knowing whether the coin lies heads or tails up while other agents know that it knows, or are
ignorant about this.
   The experiments consists of a set of 240 different instances of the CB domain that differ on the
initial configurations, the goals, and in the number of acting agents. Regarding the various input
and parameters of the architecture (used in Algorithms 1, 2, 3) we imposed: (i) a Time Limit
(TL) of 90 seconds to solve each instance; (ii) an Acceptable Correctness (A) of 0.5, meaning that
at least half of the goals must be satisfied for a S1 solution to be considered; (iii) the various
thresholds (i.e., T1, T2, T3) and 𝜖 to have they default values; and (iv) the Memory (M) to be
empty at the beginning of the overall solving procedure.

6.2. Results
As baseline for our experiments, we used the solver EFP 2.0 [17]. As mentioned, we let the solver
tackle all the 240 instances with a time-out of 90 second per instance. The main idea is that the
results of the execution of EFP 2.0 represents the current capabilities of the MEP community
and are comparable to a solely S2-based architecture. While this approach is guaranteed to find
the best solution, if this exists (as it makes use of Breadth-First Search), it is not flexible enough
to adapt to situations where the resources, namely time, are limited.
   We then compared the results of EFP 2.0 with four different configurations of the architecture
presented in this paper (Table 1). To avoid unnecessary clutter, let us identify the various
configurations, with the following abbreviations:
Figure 1: Time comparison between our Jac and EFP 2.0.


    • Jac: the configuration of the architecture where the S1 makes use of the Jaccard similarity
      as notion of distance;

    • Lev: the configuration where the metric for distance between problem is defined through
      Levenshtein distance;

    • Mix: the configuration where S1 chooses the most similar instance in the memory, w.r.t.
      the given problem, by selecting the highest (normalized) score between the ones calculated
      with both the Jaccard similarity and the Levenshtein distance;

    • Rng: the configuration where the S1 randomly picks one of the instances in memory as
      “most similar” without even considering a notion of distance. This approach was inserted
      as a baseline to outperform with a well-thought S1.


                                   EFP 2.0     Jac       Lev       Mix       Rng
                     Solved         149         222       207       197       181
                                              (+49%)    (+39%)    (+32%)    (+21%)
                     Time (avg)     42.737    14.495    19.608    20.001    27.755
                     Corr (avg)      0.62      0.81       0.89      0.67      0.66
                     S1 calls          0         86        74       115        69

Table 1
Comparison between EFP 2.0 and the diverse configurations of our architecture on various parameters.
The row Solved express the number of instances, out of the 240, that have been solved with correctness
at least superior to 0.5. Time is expressed in seconds and the correctness (Corr) being the ratio between
solved goals and their total number is a measure between 0 and 1. All the results also consider the
instances that could not be solved, to these we assigned 90 seconds as time and 0 as correctness.

  Table 1 shows that Jac is the best all-round configuration, with the most number of solved
instances, the lowest average solving times and the highest average correctness. In fact, all
configurations solved more problems (ranging from 21-49% more) and took less time on an
average to find the plan. To provide further analysis we present, in Figure 1, a plot that shows
the resulting times of this configuration against EFP 2.0. In this, the instances (x-axis) are sorted
w.r.t. the time employed by Jac to solve them. As we can see, all the solving time of Jac
outperforms EFP 2.0 (equal or lower height on the y-axis). Let us note that the instances which
are placed exactly on the top of the plot, i.e., on the 90 seconds line, are the ones that have
timed-out.

6.3. Discussion
Table 1 and Figure 1 highlight some interesting results about our architecture. In particular,
thanks to its ability to “adapt” to resources constraints, our architecture can provide a flexible
tool to solve instances even for those settings where the inherit complexity of the problems (i.e.,
multi-agent epistemic planning) brings, most of the time, unfeasibility to the solving process.
Our proposal can be seen as a way to exploit the accumulated experience, in a human fashion,
to quickly solve known problems that otherwise would require a lot of efforts.
   Moreover, the trade-off offered by allowing not-fully correct, but sound, plans is also a way
to produce a solution that, even if partial, can be more useful than no solution at all. Let us
note that some of the solved problem by S1, and not by EFP 2.0, do not have a solution that can
satisfy all the goals. While the planning process of EFP 2.0 cannot detect these situations given
the undecidability of MEP [19], our architecture simply reports a plan to reach some of the
subgoals, providing once again a better alternative to no plan at all. Nonetheless, the parametric
nature of the architecture allows also to tweak it so that it only accept fully correct plans. This
can be done by setting the value of the acceptable correctness (A) to 1. With this small change
the architecture will only return plans that reach the goal state while still taking advantage of
the S1 capabilities of the architecture (albeit S1 solutions would be adopted less times). The
same goes for the other internal parameters that easily allow the user to modify the architecture
so that it is more prone to accept less accurate solution in favor of saving time, and vice-versa.
   Another interesting result that we can deduce from Table 1 is that the employment of Mix is
worse than both Jac and Lev. While, at first glance, this might seem a contradictory result it
actually shows that the balance between the usage of S1 and S2 need to be preserved by the
architecture. In fact, as Mix uses the combination of Jac and Lev, the number of time that a
solution of S1 is employed is higher than the two approaches (115 against 86 and 74). This
makes it so that the architecture has fewer opportunities to increase its experience that can
be re-used later to solve different problems. While it is not easy to find the best trade-off for
limiting the employment of S1, it is our intention to define ways to automatically tune the
internal parameters of the architecture so that it can exploits at maximum its capabilities.


7. Conclusions and Ongoing Work
In this work, we presented an architecture to solve multi-agent epistemic planning problems that
is heavily inspired by the well-known cognitive theory Thinking Fast and Slow by Kahneman
[1]. This tool builds on the SOFAI architecture [8], which makes use of a metacognitive process
to arbitrate the solving processes, and two solvers referred to as System-1 and System-2.
While S2 is directly derived from the literature, that is the comprehensive MEP solver known
as EFP 2.0 [17], the S1 solver (and its variations) have been designed ad-hoc for the proposed
architecture to exploit past experience by selecting a plan for a given planning instance. The
SOFAI-inspired approach showed very promising results outperforming the state-of-the-art
epistemic planner EFP 2.0 in terms of both solved instance and average solving times, on a set
of 240 different planning problems. Another advantage of the proposed architecture is that it
can be used to incorporate new solving techniques developed by the MEP community. In fact,
our tool can be easily modified to employ different S1 or S2, or multiple versions of them. We
are currently working on a version of the architecture that allows for multiple S2, in particular
both EFP 2.0 and RP-MEP [22].


References
 [1] D. Kahneman, Thinking, Fast and Slow, Macmillan, 2011.
 [2] G. Booch, F. Fabiano, L. Horesh, K. Kate, J. Lenchner, N. Linck, A. Loreggia, K. Murugesan,
     N. Mattei, F. Rossi, B. Srivastava, Thinking fast and slow in AI, in: Proceedings of the 35th
     AAAI conference, 2021, pp. 15042–15046.
 [3] G. Marcus, The next decade in ai: Four steps towards robust artificial intelligence, 2020.
     arXiv:2002.06177.
 [4] A. Newell, SOAR as a unified theory of cognition: Issues and explanations, Behavioral
     and Brain Sciences 15 (1992) 464–492.
 [5] D. Friedlander, S. Franklin, Lida and a theory of mind, Frontiers in Artificial Intelligence
     and Applications 171 (2008) 137.
 [6] J. G. Trafton, L. M. Hiatt, A. M. Harrison, F. P. Tamborello, S. S. Khemlani, A. C. Schultz,
     ACT-R/E: An embodied cognitive architecture for human-robot interaction, Journal of
     Human-Robot Interaction 2 (2013) 30–55.
 [7] G. Goel, N. Chen, A. Wierman, Thinking fast and slow: Optimization decomposition
     across timescales, in: 2017 IEEE 56th Annual Conference on Decision and Control (CDC),
     IEEE, 2017, pp. 1291–1298.
 [8] M. B. Ganapini, M. Campbell, F. Fabiano, L. Horesh, J. Lenchner, A. Loreggia, N. Mattei,
     F. Rossi, B. Srivastava, K. B. Venable, Thinking fast and slow in AI: the role of metacognition,
     CoRR abs/2110.01834 (2021). URL: https://arxiv.org/abs/2110.01834. arXiv:2110.01834.
 [9] I. Kotseruba, J. K. Tsotsos, 40 years of cognitive architectures: core cognitive abilities
     and practical applications, Artificial Intelligence Review 53 (2020) 17–94. doi:10.1007/
     s10462-018-9646-y.
[10] R. Fagin, Y. Moses, J. Y. Halpern, M. Y. Vardi, Reasoning About Knowledge, MIT press,
     1995.
[11] C. Baral, G. Gelfond, E. Pontelli, T. C. Son, An action language for multi-agent domains, Ar-
     tificial Intelligence 302 (2022) 103601. URL: https://www.sciencedirect.com/science/article/
     pii/S0004370221001521. doi:https://doi.org/10.1016/j.artint.2021.103601.
[12] H. Van Ditmarsch, W. van Der Hoek, B. Kooi, Dynamic Epistemic Logic, volume 337,
     Springer Netherlands, 2007. doi:10.1007/978-1-4020-5839-4.
[13] T. Bolander, A gentle introduction to epistemic planning: The del approach, arXiv preprint
     arXiv:1703.02192 (2017).
[14] J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Ithaca:
     Cornell University Press, 1962.
[15] T. Bolander, M. B. Andersen, Epistemic planning for single-and multi-agent systems,
     Journal of Applied Non-Classical Logics 21 (2011) 9–34. doi:10.1016/0010-0277(83)
     90004-5.
[16] S. A. Kripke, Semantical analysis of modal logic i normal modal propositional calculi,
     Mathematical Logic Quarterly 9 (1963) 67–96. doi:10.1002/malq.19630090502.
[17] F. Fabiano, A. Burigana, A. Dovier, E. Pontelli, EFP 2.0: A multi-agent epistemic solver with
     multiple e-state representations, in: Proceedings of the Thirtieth International Conference
     on Automated Planning and Scheduling, Nancy, France, October 26-30, 2020, AAAI Press,
     2020, pp. 101–109. URL: https://aaai.org/ojs/index.php/ICAPS/article/view/6650.
[18] F. Fabiano, A. Burigana, A. Dovier, E. Pontelli, T. C. Son, Multi-agent epistemic planning
     with inconsistent beliefs, trust and lies, in: PRICAI 2021, Hanoi, Vietnam, November 8-12,
     2021, Proceedings, Part I, volume 13031 of Lecture Notes in Computer Science, Springer,
     2021, pp. 586–597. doi:10.1007/978-3-030-89188-6\_44.
[19] T. Bolander, M. Jensen, F. Schwarzentruber, Complexity results in epistemic planning, in:
     IJCAI International Joint Conference on Artificial Intelligence, volume 2015-January, 2015,
     pp. 2791–2797.
[20] K. Rinartha, W. Suryasa, L. G. S. Kartika, Comparative analysis of string similarity on
     dynamic query suggestions, in: 2018 Electrical Power, Electronics, Communications,
     Controls and Informatics Seminar (EECCIS), 2018, pp. 399–404. doi:10.1109/EECCIS.
     2018.8692996.
[21] G. Gigerenzer, H. Brighton, Homo heuristicus: why biased minds make better inferences,
     Top Cogn Sci 1 (2009) 107–143. doi:10.1111/j.1756-8765.2008.01006.x.
[22] C. J. Muise, V. Belle, P. Felli, S. A. McIlraith, T. Miller, A. R. Pearce, L. Sonenberg, Planning
     over multi-agent epistemic states: A classical planning approach, in: Proceedings of the
     Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin,
     Texas, USA, AAAI Press, 2015, pp. 3327–3334. URL: http://www.aaai.org/ocs/index.php/
     AAAI/AAAI15/paper/view/9974.
[23] M. T. Cox, A. Raja, Metareasoning: Thinking about Thinking, The MIT Press, 2011.
[24] M. T. Cox, Metacognition in computation: A selected research review, Artificial In-
     telligence 169 (2005) 104–141. URL: https://www.sciencedirect.com/science/article/pii/
     S0004370205001530. doi:https://doi.org/10.1016/j.artint.2005.10.009, spe-
     cial Review Issue.
[25] J. D. Kralik, J. H. Lee, P. S. Rosenbloom, P. C. Jackson, S. L. Epstein, O. J. Romero,
     R. Sanz, O. Larue, H. R. Schmidtke, S. W. Lee, K. McGreggor, Metacognition for a
     common model of cognition, Procedia Computer Science 145 (2018) 730–739. URL:
     https://www.sciencedirect.com/science/article/pii/S1877050918323329. doi:https://doi.
     org/10.1016/j.procs.2018.11.046, postproceedings of the 9th Annual Interna-
     tional Conference on Biologically Inspired Cognitive Architectures, BICA 2018 (Ninth
     Annual Meeting of the BICA Society), held August 22-24, 2018 in Prague, Czech Republic.
[26] I. Posner, Robots thinking fast and slow: On dual process theory and metacognition in
     embodied AI, 2020. URL: https://openreview.net/forum?id=iFQJmvUect9.
[27] P. Kerschke, H. H. Hoos, F. Neumann, H. Trautmann, Automated algorithm selection:
     Survey and perspectives, Evol. Comput. 27 (2019) 3–45. doi:10.1162/evco\_a\_00242.
[28] A. Tarzariol, Evolution of algorithm portfolio for solving strategies, in: A. Casagrande,
     E. G. Omodeo (Eds.), Proceedings of 34th CILC, Trieste, Italy, June 19-21, 2019, volume
     2396 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 327–341. URL: http://ceur-ws.
     org/Vol-2396/paper37.pdf.
[29] A. Shenhav, M. M. Botvinick, J. D. Cohen, The expected value of control: An integrative
     theory of anterior cingulate cortex function, Neuron 79 (2013) 217–240. doi:10.1016/j.
     neuron.2013.07.007.
[30] R. Haldar, D. Mukhopadhyay, Levenshtein distance technique in dictionary lookup meth-
     ods: An improved approach, arXiv preprint arXiv:1101.1232 (2011).
[31] F. Kominis, H. Geffner, Beliefs in multiagent planning: From one agent to many, in:
     Proceedings of the Twenty-Fifth International Conference on Automated Planning and
     Scheduling, ICAPS 2015, Jerusalem, Israel, June 7-11, 2015, AAAI Press, 2015, pp. 147–155.
     URL: http://www.aaai.org/ocs/index.php/ICAPS/ICAPS15/paper/view/10617.
[32] X. Huang, B. Fang, H. Wan, Y. Liu, A general multi-agent epistemic planner based on higher-
     order belief change, in: Proceedings of the Twenty-Sixth International Joint Conference
     on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org,
     2017, pp. 1093–1101. doi:10.24963/ijcai.2017/152.