An Agent Framework for Manipulation Games


                                                          Javier M. Torres
                                                             Brainific S.L.
                                                     javier.m.torres@brainific.com


                           Abstract                                   and believes, so the agent can adjust their own plans. After
                                                                      these models are built, one agent can plan to provide such
  Current agents use communication in a collaborative setting,
  exchanging truthful information to achieve a common plan.           information that other agents conclude beliefs or take ac-
  This paper defines games where agents may exchange infor-           tions in a way that benefits the planning agent. We need to
  mation about the physical situation (both fluents a nd action       consider such capabilities as:
  events), arbitrarily nested beliefs, and action consequences,        • Reasoning and representation: reason about predicates
  to manipulate other agents for their own goals, i.e. guide
                                                                         with open world assumptions with a base of prioritized
  the other agents’ own reasoning and planning. We propose
  a model for an ”agent mind” that can cater for all these as-           beliefs, and represent communicative actions
  pects through revisable, prioritized belief bases; goal recog-       • Goals, goal recognition and goal recognition design: find
  nition including epistemic situations; or planning including           out about other agents’ plans to guide the planning
  speech acts with structured content. We also discuss recent
  algorithms to address each one of them, and propose a con-           • Discourse and action planning: plan actions and predi-
  crete implementation for a future stage.                               cates to communicate to indirectly guide other agents’ ac-
                                                                         tions
                       Introduction                                      Agents, in fact, assume that all of them perform the same
We define m anipulation g ames a s g ames i n w hich players          loop when facing a change in the environment: first non-
not only affect some shared physical state, but also exchange         obvious predicates are deduced from existing and new in-
information in the hope of influencing the other players so           formation; then the goals of the agents involved in the
their goals can be achieved. Goals are often hidden from              new situation are re-assessed; and then current actions are
other players, and may or may not be conflicting. Observ-             re-planned, or new actions are planned. The classic game
ability is often restricted, and information can only be gained       Diplomacy, in fact, restricts its mechanics in such a degree
through third-party accounts. In this paper, we propose an            that these mechanisms are the basis for the game.
action language specification and a doxastic model to reason             A related field, from which we borrow, is that of persua-
about and share both actions and beliefs that enable manip-           sion and argumentation theory. A description of how auto-
ulation of agents in games. Not all aspects of reasoning (e.g.        mated planning techniques can be used to promote argu-
fully reasoning about actions, or the planning of actions it-         ments can be found in (Black, Coles, and Bernardini 2014)
self) have been implemented in source code, but suitable al-          and (Black, Coles, and Hampson 2017). We argue that ma-
ternatives have been identified for each of the processes in          nipulation expands the set of actions available in an argu-
the model.                                                            mentation setting by considering false or incomplete pred-
                                                                      icates, and relying on the other agents’ internal processing
                 Manipulation Games                                   like its own goal recognition or higher order reasoning.
Interactions with NPCs in RPG-like games can be modelled              Small treatise about manipulation for honest
as non-zero sum games with hidden goals. Although there               people1
are sources of knowledge, stemming from observability (ei-
ther an agent observes a situation or another agent directly),        Let us consider a sample fantasy RPG scenario with three
most information is more or less grounded belief, specially           actors: Aisha, Bo Yang and Chinira. The three of them are
about the beliefs and motivations of other agents. Since only         officers in the same army, and Chinira is the common boss
actions can be truthfully observed, intent and plan recogni-          of Aisha and Bo Yang.
tion are the only ways to estimate what another agent wants              Aisha has currently a goal of recovering the McGuffin of
                                                                      Diabolic Wisdom, which she hid in the common room of
Copyright © 2021 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International             1
                                                                             A humble tribute to (Joule, Beauvois, and Deschamps 1987)
(CC BY 4.0).
Bo Yang’s squad instead of her own, so nobody could think
she was in its possession. The issue is not trespassing, she
can freely enter this room, but rather concealment; she can-
not risk Bo Yang’s soldiers and Bo Yang himself seeing the
artifact and learning about it. At least one person from Bo
Yang’s squad is always present there, and they would ques-
tion Aisha if she searched the room. Additionally, Bo Yang
despises Chinira, believing her to act in a purely selfish man-
ner, attributing her a goal of self promotion.
   During the paper we will show a formal description of the
messages exchanged, and the beliefs and plans generated by
the agents within bounding boxes.

                                                                                 Figure 1: Internal Agent Model
                     Previous Work
This framework is very similar to a BDI architecture (Rao,
Georgeff, and others 1995), as it endows agents with beliefs       the other agent. A noteworthy exception is (Black, Coles,
and desires, or goals. We believe, however, that few, if any,      and Bernardini 2014), which studies persuasion (a compo-
BDI implementations use the kind of techniques that we pro-        nent of manipulation). The work on prioritized belief bases
pose support manipulation games, like higher order reason-         with non-idempotent operations presented in (Velázquez-
ing, goal recognition or epistemic planning.                       Quesada 2017) models manipulation of human agents in our
   Horswill’s MKULTRA (Horswill 2018) is a superb imple-           opinion.
mentation of a manipulation game and an inspiration for the
current work. The player can insert beliefs into other agents’       Reasoning and Knowledge Representation
minds to solve various puzzles. The areas where this work          As we have mentioned before, each agent performs a loop
aims at improving MKULTRA are the use of full-fledged              of goal recognition and sensing followed by action planning,
logical reasoning, instead of logic programming; higher-           supported by doxastic reasoning, in a variation of the tradi-
order beliefs; and a more oblique manipulation through             tional Sense-Plan-Act loop.
planning/goal recognition and the re-evaluation of source re-         The internal model of each agent, as illustrated in Figure
liabilities.                                                       1, consists of a prioritized and time-versioned belief base
   Ryan’s Talk of the Town (Ryan et al. 2015) presents a sys-      including certain knowledge (sensing actions), a list of in-
tem where bounded rationality and memory in agents creates         ternal goals and associated plans, and a set of agent models
a compelling narrative. Talk of the Town does not imple-           with the same structure, rebuilt whenever new information
ment a complex model for agent reasoning, but on the other         is added. The state is updated by processes of belief inser-
hand, agents follow complex schedules through which they           tion, planning, goal recognition and higher order epistemic
acquire first- and second information about other agents.          reasoning, as described in the rest of the paper.
   (Ware and Siler 2021) describe a narrative planner that            Dynamic Epistemic Logic has often been used to model
takes into consideration intentions and beliefs. However,          reasoning of higher order beliefs such as those that can
characters themselves seem to use utility functions to choose      be expressed in this model. It is a formalism describing
actions, and do not reason about their own beliefs and those       epistemic states and their changes after executing actions,
of other agents.                                                   which has experienced a substantial growth in the last 15
   Within the automated planning community, epistemic              years. A very complete account of its evolution from pub-
planning (taking the epistemic state of other agents into con-     lic announcement logics to its current form can be found in
sideration) has become so important as to have a dedicated         (Van Ditmarsch, van Der Hoek, and Kooi 2007). It has been
workshop in ICAPS 2020 (https://icaps20subpages.icaps-             applied to areas such as cryptography or logic puzzles, and
conference.org/workshops/epip/). The work in (Shvo et al.          extended to different areas like modelling questions or epis-
2020) includes epistemic plan recognition (which includes          temic planning.
epistemic planning itself) leveraging the planners in (Le et          However, we have found two important shortcomings in
al. 2018), (Wan, Fang, and Liu 2021) and (Muise et al.             this family of logics:
2015). The authors are unsure whether belief (KD45) or             • it puts the burden on the problem to fully specify the ini-
knowledge (S5) is considered in these planners, and whether          tial model;
communication extends beyond the truth value of single flu-
ents.                                                              • it models actions as semantic changes that directly modify
   Multi agent systems such as (Panisson et al. 2018) pro-           this model, without few guidelines about what conditions
vide a good foundation when it comes to theory of mind               should the actions fulfill to preserve model properties (e.g.
and speech acts, but deal with agent collaboration, whether          KD45 for belief or S5 for knowledge) across updates, as
implicit or explicit. This work instead focuses on taking ad-        Herzig has pointed out in (Herzig 2017).
vantage of the agents’ reasoning strategies to obtain the de-         Hence, we have decided to focus on actions with syntac-
sired result, regardless of whether this result is beneficial to   tic effects as much as possible (e.g. forcing as a result that
A believes in p after an action), so the user of the model        refer to any (including the acting) agent’s beliefs. Deduced
needs to build partial models from the acting agent’s point of    propositions are indirectly referenced whenever a check us-
view using techniques like tableaux whenever it is necessary,     ing a tableau starts.
adding computational burden at the expense of flexibility.           We allow the following types of predicates in the belief
We have nonetheless studied the formalization of prioritized      base:
belief bases from a DEL perspective as described in (Baltag       • first order predicates; e.g. has(B, knif e)
and Smets 2008). Observability and sensing primitive ac-
tions also allow us to derive knowledge (S5) modal formu-         • visibility statements of first order predicates; e.g. see(A,
las, in a way similar to that described in (Baral et al. 2015).     has(B, knif e))
A separate language for action specifications describes what      • temporal statements of any other item; e.g. ATt=3 (has
an agent plans to do or is doing, within the syntax of the          (B, knif e))
belief logic.
                                                                  • goal statements about agents; e.g. G OALA (catch killer)
Doxastic Resoning                                                 • statements about actions with preconditions and
The doxastic model proposed consists of prioritized belief          postconditions, with probabilistic outcomes; e.g.
bases as described in (Rott 2009): an ordered list of sets          search room{pre : empty(room 123); post : {t :=
of sentences, with most plausible sentences placed closer to        t + 3 with p = 1; in(knif e, room 123) with p =
the head of the list, followed by first hand, present-time, di-     0.5; ¬in(knif e, room 123) with p = 0.5}}
rect knowledge. We keep an open world assumption in our           • predicates that express that an action has just been per-
framework: having a p or ¬p explicitly in an agent’s belief         formed; e.g. done(fired(A, B))
base means that they actually B(p) or B(¬p). The lack of be-
                                                                  • beliefs from other agents; e.g. BA (is killer(B))
lief about p means that they will not commit on any valuation
for p: complete uncertainties (formulas for which agents do          Special predicates like those described in (Marques and
not have any preference) will not be represented in the base.     Rovatsos 2015) can be expressed using modalities about
Prioritized belief bases generate a corresponding system of       goals, actions and beliefs. The GOALA (p) modality ex-
spheres model, where possible world sets are filtered by each     presses that agent A will take actions that make it more prob-
layer in the base. Since sentences can be removed due to          able for proposition p to become true. One can express the
their origin during an agent’s lifetime, conflicting sentences    preferred next action for agent A as GOALA (done(act())).
may be kept in different levels; the actual belief of the agent   Predicates about knowledge like unknown(a; que) (the an-
will depend on the relative position of each sentence. An ex-     swer to question que is unknown to agent a) can be ex-
ample base is presented in Figure 2 with annotations about        pressed as ∀X(¬BA (que(X))).
the source of the beliefs.                                           To allow some approximation to probabilistic reason-
   These structures are more succinct than, for example,          ing, plausibilities are related to discrete probabilities. Val-
POMDP models, since they use logic sentences to express           ues from 0 to 2n , expressing probabilities from 0 to 1, are
sets of worlds, and human agents tend to use vaguely defined      used. Operations that would result in intermediate values are
confidence or plausibility levels instead of exact probabili-     rounded towards 2(n−1) , a probability of 0.5. This value is
ties.                                                             important since it represents uncertainty, and as such can be
   Sentences in such a model keep track of their origin, such     removed from the belief base. We expect long term reason-
as:                                                               ing to be ”diluted” in this way to control state explosion,
• Past direct observations. A belief could be implicitly          since the further a result is in terms of operations (e.g. a situ-
  formed about the current situation depending on a state         ation several steps ahead in a plan), the more probable it is to
  that was observable in the past, but not anymore, with          turn into an uncertainty. In no way are complex probabilistic
  its plausibility degrading with time up to complete uncer-      logic frameworks (e.g. Markov logic networks) involved in
  tainty being removed from the belief base.                      these estimations: derived statements always inherit the least
                                                                  plausible value from those among all the input statements.
• Accounts from other agents, accepted according to ob-              In Figure 2 we can see a prioritized belief base consist-
  served certainties and the perceived ”honesty record” of        ing of three layers, each with a certain plausibility. In this
  other agents.                                                   example, we will use a value of 0 to 16 levels of probabil-
• Abduction, mainly targeted at action reasoning, so causes       ity, with a plausibility of 0 corresponding to a probability of
  will be ordered according to their plausibility depending       1, and 8 corresponding to complete uncertainty (50/50 esti-
  on the simplicity of their attributions to effects.             mation), and therefore not represented. Note that believing p
                                                                  with plausibility plaus higher than 8 is equivalent to believ-
• Induction, for agents that perform some kind of statistical
                                                                  ing ¬p with plausibility 16 − plaus. In the example, we see
  analysis of observed facts.
                                                                  levels of plausibility from 1 to 6, that would correspond to
   Note that the current paper does not propose explicit          probabilities 15                       9
                                                                                 16 (almost certain) to 16 (more likely than not).
mechanisms for the inclusion of a belief in the base, apart       Note that certain knowledge is assumed to come only from
from these suggestions.                                           direct observation in the current moment, so it is tracked sep-
   A version timeline of the belief base is kept, so past and     arately. This structure induces a system of spheres, where
point temporal modalities like AT(t=3) (p) can be used to         each sphere includes layers from the base incrementally, as
              Figure 2: Prioritized Belief Base                                  Figure 3: Action Postconditions


                                                                   • epistemic, e.g. see(B,
long as the sentences from a less plausible layer do not con-        open(door)). Epistemic effects are tracked through ob-
tradict those from a more plausible one. Let us see what             servability predicates, as opposed to epistemic model
spheres would be induced by this base:                               modification as in logics in the DEL family, due to the is-
                                15
• Plausibility 1 / Probability 16  : all worlds complying with       sues of semantic action models as explained before. Note
  p, q ∨ r (e.g. pqrst, pq̄rst, pqr̄st̄)                             that observability itself is directly observable and appli-
                                 12
                                                                     cable (we know for sure whether agent A sees something
• Plausibility 3 / Probability 16    : all worlds complying with     if we see them), and hence is an epistemic, not doxastic,
  p, q, s ⊕ t (e.g. pqrst, pqr̄s̄t̄), within the previous sphere     effect.
   The last layer contradicts previous, more plausible beliefs,    • doxastic, e.g. BB (G OALA (out(A, room))). These ef-
and therefore does not induce any layer.                             fects can be computed using doxastic logic and goal
   Keeping past states of this base and referring to them            recognition, as will be detailed later, and therefore depend
using dedicated past and point modalities allows us non-             on what the acting agent believes about the other agents;
monotonic reasoning, since only new beliefs are added to the         these complex effects will need to be evaluated again in
base, as all beliefs are implicitly tagged with the moment in        every individual planning step, and may of course be in-
which they were believed. A belief in p is not replaced, but         correct if the higher order beliefs are themselves incorrect.
rather it is asserted that ATt=3 ¬p but also ATt=6 p.                 An agent may communicate action specifications (precon-
   First-hand, certain knowledge is handled apart from be-         ditions and effects). Both linear and contingent action plans
lief. This may exclude certain scenarios of misunderstand-         can be communicated as composite actions using sequence
ings: the step where an agent creates a false belief from a        (;), nondeterministic (∪) and test (?) operators as in dynamic
truthful observation. In the film ”Knives Out”, Great Nana         logics (e.g. as described in (Bolander et al. 2019)). We have
mistakes character Marta for another, even though Marta            decided not to cover unbounded iteration, since finite plans
is standing at plain sight in front of her. Marta then de-         will be easier to check.
rives the incorrect belief that Great Nana has recognized her,        We consider the following basic actions in our framework:
since this model takes a ”sensing” action by another agent as
something about which we can have certain knowledge (and           • Perform an action with pure ontic or epistemic effects;
we see how this may not always be the case).                         e.g. fire(E, D)
                                                                   • Say a proposition to a set of agents; e.g.
Action Representation                                                say(A,     ”∃X ATt=3 (done(fire(X, D)))”,    {B, C}),
                                                                     which means that A says to B and C that someone fired
Actions are first class objects in the language. Precondi-
                                                                     upon D at t=3.
tions and postconditions can be communicated and they are
certainly used in planning, but in no way are they consid-         • Ask an agent about something to a set of agents,
ered immutable or fixed. This has already been described in          that is, check the validity of a statement or re-
(Steedman and Petrick 2007), which uses a special purpose            quest a value for the free variables in a statement
database. This is specially the case for communicative ac-           that makes it true according to their beliefs, e.g.
tions, which have few if any preconditions (e.g. ¬BA (p) ∧           ask(A, ”ATt=3 (done(fire(E, D)))”, {}, {B, C} or
¬BA (¬p) for ask(A, ”p”, {}, {B})) and can be easily ex-             ask(A, ”ATt=3 (done(fire(X, D)))”, {X}, {B, C}
tended as we will see later in .                                   • Request      an  agent   to    do    somehing,     e.g.
   The postconditions of actions can have three different na-        request(A, ”ask(B, ”B ELC (ATt=3 (done(fire(X, D)))”,
tures, as summarized in 3:                                           {X}, {C}”, {B})
• ontic, e.g. open(door). The specification for ontic (phys-          Internal actions are considered in planning, but they are
  ical) actions is very similar to traditional specification in    not modelled as communicative acts. The reason is that they
  STRIPS planning: a list of set/unset atomic predicates.          are clear enough to be modelled by each agent, and also
known to happen whenever enough information is provided
(a goal may be recognized as soon as there is evidence for
it). There is no need to communicate anything about inter-
nal actions because all agents have enough stable knowledge
to reason about them, even though the specific treatment of
each agent of course depends on their beliefs. Agents how-
ever need to have a bounded rationality, so we cannot rely on
other agents to come to conclusions, even though they may
be logically valid.

First Act: Aisha Talks to Chinira
Aisha provides Chinira with the information that an incom-
ing squad includes a spirit, vulnerable to a ritual from a cer-
tain book. This information comes from Deepak, an unreli-                     Figure 4: Plan and Goal Recognition
able source, but Aisha hides this uncertainty from the mes-
sage to let Chinira draw her own conclusions.
                         A                                           The current section only describes the open issue of the
   say(∀X(in(patrol, X) ⇒ BX (in(squad, spirit))))                need for goal recognition mechanisms to compute doxas-
                                                                  tic preconditions of actions in a setting where these include
   Chinira incorporates Aisha’s account with high plausibil-      speech acts like saying, asking or requesting, and beliefs of
ity in their prioritized belief base based on a previous track    other agents may not match the contents of such statements.
record of complete and accurate information. A lower plau-        We do identify some algorithms as potential candidates for
sibility could have been assigned if e.g. induction had shown     implementation.
that information from source Aisha is not reliable when              We approach the concepts of desire and intention from
checked against facts. Also, a higher, conflicting evidence       BDI logics (Rao, Georgeff, and others 1995) deriving it from
already present (e.g. the report from an inside informant)        goal recognition in automated planning and dynamic logics,
would also have invalidated Aisha’s information due to the        rather than the usual Computational Tree Logic. The intu-
construction of the spheres from the base. Using a simple         ition is that an agent desires a formula φ if, when allowed
breadth-first search, Chinira decides to request Bo Yang to       a choice, that agent takes an action that maximizes the esti-
stay in the garrison while she goes with her own squad to         mated probability of reaching a state where φ holds, when
the ruins where this book is located, as other actions (send-     compared with any other action to take, according to that
ing Aisha or Bo Yang, perceived as inferiors; using some          agent’s beliefs. A Bayesian formulation of goal recognition
other strategy against the spirit; not doing anything in the      can be found in (Baker, Tenenbaum, and Saxe 2006). Goal
hope of the spirit posing a lesser threat) would pay a lower      recognition can be performed using the same planning al-
performance/cost balance, always according to her current         gorithms as the agent uses for its own plans, as described
beliefs.                                                          in (Ramı́rez and Geffner 2009) and (Ramı́rez and Geffner
                            C                                     2010), instead of relying on a plan library. As described in
               in(squad, spirit) : plaus 1                        the latter, a prior distribution P (G) over the goals G (pri-
                                                                  ors for goal preference can be incorporated as explained in
        Plan: (go(ruins); get(book); fight(squad))
                                                                  (Gusmão, Pereira, and Meneguzzi 2021)) is used to obtain
         request(C, request(A, stay()), B), {A})
                                                                  the likelihoods P (O|G) of the observation O given the goals
                                                                  G using the cost differences as obtained by a classical plan-
Goals, Goal Recognition and Goal Recognition                      ner. In (Sohrabi, Riabov, and Udrea 2016) additional fea-
                  Design                                          tures like unreliable observations and plan recognition, in
Beliefs and goals are not directly observable: an agent can       addition to goal recognition, are introduced.
only infer them in another agent through observation of their        In Figure 4 we can see two observed steps, with two possi-
behaviors. Goal recognition is, thus, a very important piece      ble plan completions to different goals. We can rule out the
of manipulation games. Whether an action is taken or not          goal in white, due to the second event observed. However,
depends on the beliefs about preconditions and effects, and       without some evidence pointing towards one of the grey-
whether the effects lead to a goal. Goal recognition is a kind    patterned goals, we cannot predict further actions.
of abduction process, where agents’ goals are deemed the             It is worth mentioning Geib’s PHATT algorithm as de-
most probable or concise explanation for those agents’ ac-        scribed in (Geib and Goldman 2009) for its use of proba-
tions. To illustrate the importance of goals and goal recog-      bilistic actions and AND-OR trees (suitable for contingent
nition, let us take the muddy children puzzle, a staple in dy-    planning and very similar to behaviour trees in game agent
namic epistemic logic. Agents need to assume that every-          logic). It relies, however, on the description of the structure
one’s goal includes a truthful account of their observations.     of tasks and a preexisting plan library. However, further re-
Without this assumption, for example with a lying agent, the      search into the induction of task structure and hierarchies,
puzzle cannot proceed.                                            as well as using planning itself as a plan library generation
would be necessary to consider this algorithm.                    action preconditions to the receiver, regardless of their ac-
   A further concept in automated planning is goal recog-         tual truth) that are related to the current goals. The whole
nition design, whereby some action is designed to make an         loop of goal recognition, reasoning and planning is per-
agent’s goals as easy to discover as possible, as described       formed in the simulated model for the other agent, to the
in (Keren, Gal, and Karpas 2014). By making explicit state-       extent to which resources can be dedicated. If we belief that
ments about goals and beliefs in our doxastic model, plan-        GOALA (catch killer), informing A of has(B, knif e) al-
ning algorithms as described in the following section can         lows it to plan further actions, like asking B for the knife. If
include actions that reduce uncertainty, like sensing actions     we believe that BA (∀X(has(X, knif e) ⇒ is killer(X)))
and goal recognition design.                                      this information item is a particularly powerful lever to guide
                                                                  A’s actions.
          Action and Discourse Planning                              Some authors use existing planners adapted with epis-
                                                                  temic predicates. For example (Marques and Rovatsos 2016)
Ontic actions can be planned using planners that can han-         modifies the Contingent-FF planner to include requests and
dle a probabilistic outcome using the action specifications       yes/no questions. Also (Muise et al. 2015) apply a classi-
stored in the agent’s belief base, like probabilistic planners    cal planner, the Fast Downward planner (Helmert 2006), to
or a deterministic planner with replanning like (Yoon, Fern,      a multiagent epistemic setting with higher order beliefs us-
and Givan 2007). Also, other multiagent frameworks and            ing additional fluents derived from epistemic logic axioms.
proposals specify preconditions and postconditions for com-       However, such an approach must accomodate the expensive
municative actions, e.g. the FIPA standards (FIPA 2008).          computations for doxastic postconditions.
FIPA defines epistemic and doxastic preconditions (feasibil-
ity preconditions, FP) and postconditions (rational effects,      Second Act: Aisha Talks to Bo Yang
RE). These preconditions and postconditions are asserted in
the belief base if an agent detects that action. We could con-    Aisha then takes Chinira’s request to Bo Yang. However, she
sider these as ”social protocols” that state clearly the goals    mentions Deepak when mentioning the patrol report to Bo
of the speaker.                                                   Yang. When issuing Chinira’s request, Aisha does not reveal
   However, a complication for communicative actions in           how Chinira has come to this conclusion, but she makes it
our setting is that traditional agent frameworks are oriented     clear that it comes from Chinira.
toward collaborative agents. Feasibility preconditions ex-                                 A
press a socially agreed reason to why the action is per-                       say(BD (in(squad, spirit)))
formed, but in fact nothing prevents an agent to say whatever       say(done(request(C, request(A, stay(), {B}), {A})))
it wants. Furthermore, when seen from the perspective of the                     request(A, stay(), {B})
”sender”, a communicative act may be issued precisely to
guide the goal recognition process in the ”receiver” towards         The specification language for higher order epistemic ac-
a certain goal or plan. The possession or not of a certain        tions would allow Bo Yang to examine a possible plan where
knowledge does not enable us to ask a question; rather, our       Aisha’s goal is represented and this observation is matched,
goal of reducing uncertainty or of making another agent be-       but goal probability priors would rank the corresponding
lieve that one does not know something is what will compel        goal fairly lower than alternative goals from other agents, or
us toward that action. In a similar way, evaluating the out-      would have too many unknown factors. Bo Yang may sus-
come of a communicative action needs to take goals into           pect that ∃X(G OALX (¬in(B, city))), but he would not be
account.                                                          able to progress the reasoning much further without a lot of
   Postconditions in communicative actions become compli-         time consuming sensing actions. Bo Yang believes Aisha to
cated to compute: the sender has to try and replicate a goal      be a loyal individual due again to induction from past ob-
recognition step, using the receiver’s beliefs about the sender   servations. A more plausible goal at play is assuming that
and goals to the extent to the sender itself’s belief, and then   Bo Yang has the goal of impressing higher officers (a wrong
try to predict what will the receiver belief about the sender’s   belief that has nonetheless crept up high in Bo Yang’s pri-
intentions. Note that even ontic actions may carry a dox-         oritized belief base due to previous interactions with Aisha)
astic effect, in the sense that any action is framed within       by recovering the book and keeping him in the garrison. Bo
an estimation of goals. Opening a door is evidence for the        Yang decides instead to face the incoming army (as he can-
other agent to have run through it, but it may have been left     not go for the book himself and clash with Chinira, staying
open on purpose to lure the observer into that conclusion.        would result in a loss of face, and no sentence could possibly
We believe that the increase in memory and computation            land higher in Chinira’s belief base to change their mind in
power of user equipment justifies exploring this modelling.       Bo Yang’s belief). This results in Bo Yang’s soldiers leaving
As mentioned before, re-planning or MCTS techniques in            their barracks for a few days.
automated planning have yielded satisfactory results.
   Selecting the content to present in a speech act can be                                    B
guided by bulding a model for the receiver, including goals                      ¬in(squad, spirit) : plaus 4
and beliefs, so candidates for items in sentences can be pro-           G OALC (honour(C) > honour(B)) : plaus 1
posed from incomplete proofs (e.g. whether they can close
open branches in a tableaux) or plans (e.g. communicating                          Plan: (fight(squad))
                          Conclusion                                   domains. In Twenty-Eighth International Conference on Au-
We have presented a definition for games of manipulation, as           tomated Planning and Scheduling.
games with open communication and unknown goals, whose                 Marques, T., and Rovatsos, M. 2015. Toward domain-
players use models of other agents to guide their actions. We          independent dialogue planning. In Fourth International
                                                                       Workshop on Human-Agent Interaction Design and Models
have pointed out at three aspects that enable these games:
                                                                       (HAIDM 2015).
doxastic higher order reasoning, goal recognition, and epis-
                                                                       Marques, T., and Rovatsos, M. 2016. Classical planning with
temic planning. For each of these areas we have identified al-
                                                                       communicative actions. In Proceedings of the Twenty-second
ternatives for data structures and algorithms that can support         European Conference on Artificial Intelligence, 1744–1745.
these aspects. In future communications we plan to present             Muise, C.; Belle, V.; Felli, P.; McIlraith, S.; Miller, T.; Pearce,
prototypes for each of them.                                           A. R.; and Sonenberg, L. 2015. Planning over multi-agent
                                                                       epistemic states: A classical planning approach. In Twenty-
                          References                                   Ninth AAAI Conference on Artificial Intelligence.
  Baker, C. L.; Tenenbaum, J. B.; and Saxe, R. 2006. Bayesian          Panisson, A. R.; Sarkadi, S.; McBurney, P.; Parsons, S.; and
  models of human action understanding. Advances in neural             Bordini, R. H. 2018. On the formal semantics of theory of
  information processing systems 18:99.                                mind in agent communication. In International Conference
  Baltag, A., and Smets, S. 2008. A qualitative theory of dy-          on Agreement Technologies, 18–32. Springer.
  namic interactive belief revision. Logic and the foundations         Ramı́rez, M., and Geffner, H. 2009. Plan recognition as plan-
  of game and decision theory (LOFT 7) 3:9–58.                         ning. In Twenty-First International Joint Conference on Arti-
  Baral, C.; Gelfond, G.; Pontelli, E.; and Son, T. C. 2015. An        ficial Intelligence.
  action language for multi-agent domains: Foundations. arXiv          Ramı́rez, M., and Geffner, H. 2010. Probabilistic plan recog-
  preprint arXiv:1511.01960.                                           nition using off-the-shelf classical planners. In Proceedings
  Black, E.; Coles, A.; and Bernardini, S. 2014. Automated             of the AAAI Conference on Artificial Intelligence, volume 24.
  planning of simple persuasion dialogues. In International            Rao, A. S.; Georgeff, M. P.; et al. 1995. Bdi agents: From
  Workshop on Computational Logic and Multi-Agent Systems,             theory to practice. In ICMAS, volume 95, 312–319.
  87–104. Springer.                                                    Rott, H. 2009. Shifting priorities: Simple representations for
  Black, E.; Coles, A. J.; and Hampson, C. 2017. Planning for          twenty-seven iterated theory change operators. In Towards
  persuasion. In 16th International Conference on Autonomous           mathematical philosophy. Springer. 269–296.
  Agents and Multiagent Systems, AAMAS 2017, 933–942. In-              Ryan, J.; Summerville, A.; Mateas, M.; and Wardrip-Fruin,
  ternational Foundation for Autonomous Agents and Multia-             N. 2015. Toward characters who observe, tell, misremember,
  gent Systems (IFAAMAS).                                              and lie. In Proceedings of the AAAI Conference on Artifi-
  Bolander, T.; Engesser, T.; Herzig, A.; Mattmüller, R.; and         cial Intelligence and Interactive Digital Entertainment, vol-
  Nebel, B. 2019. The dynamic logic of policies and contin-            ume 11.
  gent planning. In European Conference on Logics in Artificial        Shvo, M.; Klassen, T. Q.; Sohrabi, S.; and McIlraith, S. A.
  Intelligence, 659–674. Springer.                                     2020. Epistemic plan recognition. In Proceedings of the 19th
  FIPA, T.         2008.       Fipa communicative act library          International Conference on Autonomous Agents and Multi-
  specification.        Avaliable online: https://www. fipa.           Agent Systems, 1251–1259.
  org/specs/fipa00037/SC00037J. html (accessed on 19                   Sohrabi, S.; Riabov, A. V.; and Udrea, O. 2016. Plan recog-
  July 2021).                                                          nition as planning revisited. In IJCAI, 3258–3264.
  Geib, C. W., and Goldman, R. P. 2009. A probabilistic plan           Steedman, M., and Petrick, R. 2007. Planning dialog actions.
  recognition algorithm based on plan tree grammars. Artificial        In Proceedings of the 8th SIGdial Workshop on Discourse
  Intelligence 173(11):1101–1132.                                      and Dialogue, 265–272.
  Gusmão, K. M.; Pereira, R. F.; and Meneguzzi, F. 2021. Infer-       Van Ditmarsch, H.; van Der Hoek, W.; and Kooi, B. 2007.
  ring agents preferences as priors for probabilistic goal recog-      Dynamic epistemic logic, volume 337. Springer Science &
  nition. arXiv preprint arXiv:2102.11791.                             Business Media.
  Helmert, M. 2006. The fast downward planning system.                 Velázquez-Quesada, F. R. 2017. On subtler belief revision
  Journal of Artificial Intelligence Research 26:191–246.              policies. In International Workshop on Logic, Rationality and
  Herzig, A. 2017. Dynamic epistemic logics: promises, prob-           Interaction, 314–329. Springer.
  lems, shortcomings, and perspectives. Journal of Applied             Wan, H.; Fang, B.; and Liu, Y. 2021. A general multi-agent
  Non-Classical Logics 27(3-4):328–341.                                epistemic planner based on higher-order belief change. Arti-
  Horswill, I. 2018. Postmortem: Mkultra, an experimental              ficial Intelligence 301:103562.
  ai-based game. In Proceedings of the AAAI Conference on              Ware, S. G., and Siler, C. 2021. The sabre narrative plan-
  Artificial Intelligence and Interactive Digital Entertainment,       ner: multi-agent coordination with intentions and beliefs. In
  volume 14.                                                           Proceedings of the 20th International Conference on Au-
  Joule, R.-V.; Beauvois, J.-L.; and Deschamps, J. C. 1987. Pe-        tonomous Agents and Multiagent Systems, 1698–1700.
  tit traité de manipulation à l’usage des honnêtes gens. Presses   Yoon, S. W.; Fern, A.; and Givan, R. 2007. Ff-replan: A
  universitaires de Grenoble Grenoble.                                 baseline for probabilistic planning. In ICAPS, volume 7, 352–
  Keren, S.; Gal, A.; and Karpas, E. 2014. Goal recognition            359.
  design. In Proceedings of the International Conference on
  Automated Planning and Scheduling, volume 24.
  Le, T.; Fabiano, F.; Son, T. C.; and Pontelli, E. 2018. Efp
  and pg-efp: Epistemic forward search planners in multi-agent