An Agent Framework for Manipulation Games Javier M. Torres Brainific S.L. javier.m.torres@brainific.com Abstract and believes, so the agent can adjust their own plans. After these models are built, one agent can plan to provide such Current agents use communication in a collaborative setting, exchanging truthful information to achieve a common plan. information that other agents conclude beliefs or take ac- This paper defines games where agents may exchange infor- tions in a way that benefits the planning agent. We need to mation about the physical situation (both fluents a nd action consider such capabilities as: events), arbitrarily nested beliefs, and action consequences, • Reasoning and representation: reason about predicates to manipulate other agents for their own goals, i.e. guide with open world assumptions with a base of prioritized the other agents’ own reasoning and planning. We propose a model for an ”agent mind” that can cater for all these as- beliefs, and represent communicative actions pects through revisable, prioritized belief bases; goal recog- • Goals, goal recognition and goal recognition design: find nition including epistemic situations; or planning including out about other agents’ plans to guide the planning speech acts with structured content. We also discuss recent algorithms to address each one of them, and propose a con- • Discourse and action planning: plan actions and predi- crete implementation for a future stage. cates to communicate to indirectly guide other agents’ ac- tions Introduction Agents, in fact, assume that all of them perform the same We define m anipulation g ames a s g ames i n w hich players loop when facing a change in the environment: first non- not only affect some shared physical state, but also exchange obvious predicates are deduced from existing and new in- information in the hope of influencing the other players so formation; then the goals of the agents involved in the their goals can be achieved. Goals are often hidden from new situation are re-assessed; and then current actions are other players, and may or may not be conflicting. Observ- re-planned, or new actions are planned. The classic game ability is often restricted, and information can only be gained Diplomacy, in fact, restricts its mechanics in such a degree through third-party accounts. In this paper, we propose an that these mechanisms are the basis for the game. action language specification and a doxastic model to reason A related field, from which we borrow, is that of persua- about and share both actions and beliefs that enable manip- sion and argumentation theory. A description of how auto- ulation of agents in games. Not all aspects of reasoning (e.g. mated planning techniques can be used to promote argu- fully reasoning about actions, or the planning of actions it- ments can be found in (Black, Coles, and Bernardini 2014) self) have been implemented in source code, but suitable al- and (Black, Coles, and Hampson 2017). We argue that ma- ternatives have been identified for each of the processes in nipulation expands the set of actions available in an argu- the model. mentation setting by considering false or incomplete pred- icates, and relying on the other agents’ internal processing Manipulation Games like its own goal recognition or higher order reasoning. Interactions with NPCs in RPG-like games can be modelled Small treatise about manipulation for honest as non-zero sum games with hidden goals. Although there people1 are sources of knowledge, stemming from observability (ei- ther an agent observes a situation or another agent directly), Let us consider a sample fantasy RPG scenario with three most information is more or less grounded belief, specially actors: Aisha, Bo Yang and Chinira. The three of them are about the beliefs and motivations of other agents. Since only officers in the same army, and Chinira is the common boss actions can be truthfully observed, intent and plan recogni- of Aisha and Bo Yang. tion are the only ways to estimate what another agent wants Aisha has currently a goal of recovering the McGuffin of Diabolic Wisdom, which she hid in the common room of Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International 1 A humble tribute to (Joule, Beauvois, and Deschamps 1987) (CC BY 4.0). Bo Yang’s squad instead of her own, so nobody could think she was in its possession. The issue is not trespassing, she can freely enter this room, but rather concealment; she can- not risk Bo Yang’s soldiers and Bo Yang himself seeing the artifact and learning about it. At least one person from Bo Yang’s squad is always present there, and they would ques- tion Aisha if she searched the room. Additionally, Bo Yang despises Chinira, believing her to act in a purely selfish man- ner, attributing her a goal of self promotion. During the paper we will show a formal description of the messages exchanged, and the beliefs and plans generated by the agents within bounding boxes. Figure 1: Internal Agent Model Previous Work This framework is very similar to a BDI architecture (Rao, Georgeff, and others 1995), as it endows agents with beliefs the other agent. A noteworthy exception is (Black, Coles, and desires, or goals. We believe, however, that few, if any, and Bernardini 2014), which studies persuasion (a compo- BDI implementations use the kind of techniques that we pro- nent of manipulation). The work on prioritized belief bases pose support manipulation games, like higher order reason- with non-idempotent operations presented in (Velázquez- ing, goal recognition or epistemic planning. Quesada 2017) models manipulation of human agents in our Horswill’s MKULTRA (Horswill 2018) is a superb imple- opinion. mentation of a manipulation game and an inspiration for the current work. The player can insert beliefs into other agents’ Reasoning and Knowledge Representation minds to solve various puzzles. The areas where this work As we have mentioned before, each agent performs a loop aims at improving MKULTRA are the use of full-fledged of goal recognition and sensing followed by action planning, logical reasoning, instead of logic programming; higher- supported by doxastic reasoning, in a variation of the tradi- order beliefs; and a more oblique manipulation through tional Sense-Plan-Act loop. planning/goal recognition and the re-evaluation of source re- The internal model of each agent, as illustrated in Figure liabilities. 1, consists of a prioritized and time-versioned belief base Ryan’s Talk of the Town (Ryan et al. 2015) presents a sys- including certain knowledge (sensing actions), a list of in- tem where bounded rationality and memory in agents creates ternal goals and associated plans, and a set of agent models a compelling narrative. Talk of the Town does not imple- with the same structure, rebuilt whenever new information ment a complex model for agent reasoning, but on the other is added. The state is updated by processes of belief inser- hand, agents follow complex schedules through which they tion, planning, goal recognition and higher order epistemic acquire first- and second information about other agents. reasoning, as described in the rest of the paper. (Ware and Siler 2021) describe a narrative planner that Dynamic Epistemic Logic has often been used to model takes into consideration intentions and beliefs. However, reasoning of higher order beliefs such as those that can characters themselves seem to use utility functions to choose be expressed in this model. It is a formalism describing actions, and do not reason about their own beliefs and those epistemic states and their changes after executing actions, of other agents. which has experienced a substantial growth in the last 15 Within the automated planning community, epistemic years. A very complete account of its evolution from pub- planning (taking the epistemic state of other agents into con- lic announcement logics to its current form can be found in sideration) has become so important as to have a dedicated (Van Ditmarsch, van Der Hoek, and Kooi 2007). It has been workshop in ICAPS 2020 (https://icaps20subpages.icaps- applied to areas such as cryptography or logic puzzles, and conference.org/workshops/epip/). The work in (Shvo et al. extended to different areas like modelling questions or epis- 2020) includes epistemic plan recognition (which includes temic planning. epistemic planning itself) leveraging the planners in (Le et However, we have found two important shortcomings in al. 2018), (Wan, Fang, and Liu 2021) and (Muise et al. this family of logics: 2015). The authors are unsure whether belief (KD45) or • it puts the burden on the problem to fully specify the ini- knowledge (S5) is considered in these planners, and whether tial model; communication extends beyond the truth value of single flu- ents. • it models actions as semantic changes that directly modify Multi agent systems such as (Panisson et al. 2018) pro- this model, without few guidelines about what conditions vide a good foundation when it comes to theory of mind should the actions fulfill to preserve model properties (e.g. and speech acts, but deal with agent collaboration, whether KD45 for belief or S5 for knowledge) across updates, as implicit or explicit. This work instead focuses on taking ad- Herzig has pointed out in (Herzig 2017). vantage of the agents’ reasoning strategies to obtain the de- Hence, we have decided to focus on actions with syntac- sired result, regardless of whether this result is beneficial to tic effects as much as possible (e.g. forcing as a result that A believes in p after an action), so the user of the model refer to any (including the acting) agent’s beliefs. Deduced needs to build partial models from the acting agent’s point of propositions are indirectly referenced whenever a check us- view using techniques like tableaux whenever it is necessary, ing a tableau starts. adding computational burden at the expense of flexibility. We allow the following types of predicates in the belief We have nonetheless studied the formalization of prioritized base: belief bases from a DEL perspective as described in (Baltag • first order predicates; e.g. has(B, knif e) and Smets 2008). Observability and sensing primitive ac- tions also allow us to derive knowledge (S5) modal formu- • visibility statements of first order predicates; e.g. see(A, las, in a way similar to that described in (Baral et al. 2015). has(B, knif e)) A separate language for action specifications describes what • temporal statements of any other item; e.g. ATt=3 (has an agent plans to do or is doing, within the syntax of the (B, knif e)) belief logic. • goal statements about agents; e.g. G OALA (catch killer) Doxastic Resoning • statements about actions with preconditions and The doxastic model proposed consists of prioritized belief postconditions, with probabilistic outcomes; e.g. bases as described in (Rott 2009): an ordered list of sets search room{pre : empty(room 123); post : {t := of sentences, with most plausible sentences placed closer to t + 3 with p = 1; in(knif e, room 123) with p = the head of the list, followed by first hand, present-time, di- 0.5; ¬in(knif e, room 123) with p = 0.5}} rect knowledge. We keep an open world assumption in our • predicates that express that an action has just been per- framework: having a p or ¬p explicitly in an agent’s belief formed; e.g. done(fired(A, B)) base means that they actually B(p) or B(¬p). The lack of be- • beliefs from other agents; e.g. BA (is killer(B)) lief about p means that they will not commit on any valuation for p: complete uncertainties (formulas for which agents do Special predicates like those described in (Marques and not have any preference) will not be represented in the base. Rovatsos 2015) can be expressed using modalities about Prioritized belief bases generate a corresponding system of goals, actions and beliefs. The GOALA (p) modality ex- spheres model, where possible world sets are filtered by each presses that agent A will take actions that make it more prob- layer in the base. Since sentences can be removed due to able for proposition p to become true. One can express the their origin during an agent’s lifetime, conflicting sentences preferred next action for agent A as GOALA (done(act())). may be kept in different levels; the actual belief of the agent Predicates about knowledge like unknown(a; que) (the an- will depend on the relative position of each sentence. An ex- swer to question que is unknown to agent a) can be ex- ample base is presented in Figure 2 with annotations about pressed as ∀X(¬BA (que(X))). the source of the beliefs. To allow some approximation to probabilistic reason- These structures are more succinct than, for example, ing, plausibilities are related to discrete probabilities. Val- POMDP models, since they use logic sentences to express ues from 0 to 2n , expressing probabilities from 0 to 1, are sets of worlds, and human agents tend to use vaguely defined used. Operations that would result in intermediate values are confidence or plausibility levels instead of exact probabili- rounded towards 2(n−1) , a probability of 0.5. This value is ties. important since it represents uncertainty, and as such can be Sentences in such a model keep track of their origin, such removed from the belief base. We expect long term reason- as: ing to be ”diluted” in this way to control state explosion, • Past direct observations. A belief could be implicitly since the further a result is in terms of operations (e.g. a situ- formed about the current situation depending on a state ation several steps ahead in a plan), the more probable it is to that was observable in the past, but not anymore, with turn into an uncertainty. In no way are complex probabilistic its plausibility degrading with time up to complete uncer- logic frameworks (e.g. Markov logic networks) involved in tainty being removed from the belief base. these estimations: derived statements always inherit the least plausible value from those among all the input statements. • Accounts from other agents, accepted according to ob- In Figure 2 we can see a prioritized belief base consist- served certainties and the perceived ”honesty record” of ing of three layers, each with a certain plausibility. In this other agents. example, we will use a value of 0 to 16 levels of probabil- • Abduction, mainly targeted at action reasoning, so causes ity, with a plausibility of 0 corresponding to a probability of will be ordered according to their plausibility depending 1, and 8 corresponding to complete uncertainty (50/50 esti- on the simplicity of their attributions to effects. mation), and therefore not represented. Note that believing p with plausibility plaus higher than 8 is equivalent to believ- • Induction, for agents that perform some kind of statistical ing ¬p with plausibility 16 − plaus. In the example, we see analysis of observed facts. levels of plausibility from 1 to 6, that would correspond to Note that the current paper does not propose explicit probabilities 15 9 16 (almost certain) to 16 (more likely than not). mechanisms for the inclusion of a belief in the base, apart Note that certain knowledge is assumed to come only from from these suggestions. direct observation in the current moment, so it is tracked sep- A version timeline of the belief base is kept, so past and arately. This structure induces a system of spheres, where point temporal modalities like AT(t=3) (p) can be used to each sphere includes layers from the base incrementally, as Figure 2: Prioritized Belief Base Figure 3: Action Postconditions • epistemic, e.g. see(B, long as the sentences from a less plausible layer do not con- open(door)). Epistemic effects are tracked through ob- tradict those from a more plausible one. Let us see what servability predicates, as opposed to epistemic model spheres would be induced by this base: modification as in logics in the DEL family, due to the is- 15 • Plausibility 1 / Probability 16 : all worlds complying with sues of semantic action models as explained before. Note p, q ∨ r (e.g. pqrst, pq̄rst, pqr̄st̄) that observability itself is directly observable and appli- 12 cable (we know for sure whether agent A sees something • Plausibility 3 / Probability 16 : all worlds complying with if we see them), and hence is an epistemic, not doxastic, p, q, s ⊕ t (e.g. pqrst, pqr̄s̄t̄), within the previous sphere effect. The last layer contradicts previous, more plausible beliefs, • doxastic, e.g. BB (G OALA (out(A, room))). These ef- and therefore does not induce any layer. fects can be computed using doxastic logic and goal Keeping past states of this base and referring to them recognition, as will be detailed later, and therefore depend using dedicated past and point modalities allows us non- on what the acting agent believes about the other agents; monotonic reasoning, since only new beliefs are added to the these complex effects will need to be evaluated again in base, as all beliefs are implicitly tagged with the moment in every individual planning step, and may of course be in- which they were believed. A belief in p is not replaced, but correct if the higher order beliefs are themselves incorrect. rather it is asserted that ATt=3 ¬p but also ATt=6 p. An agent may communicate action specifications (precon- First-hand, certain knowledge is handled apart from be- ditions and effects). Both linear and contingent action plans lief. This may exclude certain scenarios of misunderstand- can be communicated as composite actions using sequence ings: the step where an agent creates a false belief from a (;), nondeterministic (∪) and test (?) operators as in dynamic truthful observation. In the film ”Knives Out”, Great Nana logics (e.g. as described in (Bolander et al. 2019)). We have mistakes character Marta for another, even though Marta decided not to cover unbounded iteration, since finite plans is standing at plain sight in front of her. Marta then de- will be easier to check. rives the incorrect belief that Great Nana has recognized her, We consider the following basic actions in our framework: since this model takes a ”sensing” action by another agent as something about which we can have certain knowledge (and • Perform an action with pure ontic or epistemic effects; we see how this may not always be the case). e.g. fire(E, D) • Say a proposition to a set of agents; e.g. Action Representation say(A, ”∃X ATt=3 (done(fire(X, D)))”, {B, C}), which means that A says to B and C that someone fired Actions are first class objects in the language. Precondi- upon D at t=3. tions and postconditions can be communicated and they are certainly used in planning, but in no way are they consid- • Ask an agent about something to a set of agents, ered immutable or fixed. This has already been described in that is, check the validity of a statement or re- (Steedman and Petrick 2007), which uses a special purpose quest a value for the free variables in a statement database. This is specially the case for communicative ac- that makes it true according to their beliefs, e.g. tions, which have few if any preconditions (e.g. ¬BA (p) ∧ ask(A, ”ATt=3 (done(fire(E, D)))”, {}, {B, C} or ¬BA (¬p) for ask(A, ”p”, {}, {B})) and can be easily ex- ask(A, ”ATt=3 (done(fire(X, D)))”, {X}, {B, C} tended as we will see later in . • Request an agent to do somehing, e.g. The postconditions of actions can have three different na- request(A, ”ask(B, ”B ELC (ATt=3 (done(fire(X, D)))”, tures, as summarized in 3: {X}, {C}”, {B}) • ontic, e.g. open(door). The specification for ontic (phys- Internal actions are considered in planning, but they are ical) actions is very similar to traditional specification in not modelled as communicative acts. The reason is that they STRIPS planning: a list of set/unset atomic predicates. are clear enough to be modelled by each agent, and also known to happen whenever enough information is provided (a goal may be recognized as soon as there is evidence for it). There is no need to communicate anything about inter- nal actions because all agents have enough stable knowledge to reason about them, even though the specific treatment of each agent of course depends on their beliefs. Agents how- ever need to have a bounded rationality, so we cannot rely on other agents to come to conclusions, even though they may be logically valid. First Act: Aisha Talks to Chinira Aisha provides Chinira with the information that an incom- ing squad includes a spirit, vulnerable to a ritual from a cer- tain book. This information comes from Deepak, an unreli- Figure 4: Plan and Goal Recognition able source, but Aisha hides this uncertainty from the mes- sage to let Chinira draw her own conclusions. A The current section only describes the open issue of the say(∀X(in(patrol, X) ⇒ BX (in(squad, spirit)))) need for goal recognition mechanisms to compute doxas- tic preconditions of actions in a setting where these include Chinira incorporates Aisha’s account with high plausibil- speech acts like saying, asking or requesting, and beliefs of ity in their prioritized belief base based on a previous track other agents may not match the contents of such statements. record of complete and accurate information. A lower plau- We do identify some algorithms as potential candidates for sibility could have been assigned if e.g. induction had shown implementation. that information from source Aisha is not reliable when We approach the concepts of desire and intention from checked against facts. Also, a higher, conflicting evidence BDI logics (Rao, Georgeff, and others 1995) deriving it from already present (e.g. the report from an inside informant) goal recognition in automated planning and dynamic logics, would also have invalidated Aisha’s information due to the rather than the usual Computational Tree Logic. The intu- construction of the spheres from the base. Using a simple ition is that an agent desires a formula φ if, when allowed breadth-first search, Chinira decides to request Bo Yang to a choice, that agent takes an action that maximizes the esti- stay in the garrison while she goes with her own squad to mated probability of reaching a state where φ holds, when the ruins where this book is located, as other actions (send- compared with any other action to take, according to that ing Aisha or Bo Yang, perceived as inferiors; using some agent’s beliefs. A Bayesian formulation of goal recognition other strategy against the spirit; not doing anything in the can be found in (Baker, Tenenbaum, and Saxe 2006). Goal hope of the spirit posing a lesser threat) would pay a lower recognition can be performed using the same planning al- performance/cost balance, always according to her current gorithms as the agent uses for its own plans, as described beliefs. in (Ramı́rez and Geffner 2009) and (Ramı́rez and Geffner C 2010), instead of relying on a plan library. As described in in(squad, spirit) : plaus 1 the latter, a prior distribution P (G) over the goals G (pri- ors for goal preference can be incorporated as explained in Plan: (go(ruins); get(book); fight(squad)) (Gusmão, Pereira, and Meneguzzi 2021)) is used to obtain request(C, request(A, stay()), B), {A}) the likelihoods P (O|G) of the observation O given the goals G using the cost differences as obtained by a classical plan- Goals, Goal Recognition and Goal Recognition ner. In (Sohrabi, Riabov, and Udrea 2016) additional fea- Design tures like unreliable observations and plan recognition, in Beliefs and goals are not directly observable: an agent can addition to goal recognition, are introduced. only infer them in another agent through observation of their In Figure 4 we can see two observed steps, with two possi- behaviors. Goal recognition is, thus, a very important piece ble plan completions to different goals. We can rule out the of manipulation games. Whether an action is taken or not goal in white, due to the second event observed. However, depends on the beliefs about preconditions and effects, and without some evidence pointing towards one of the grey- whether the effects lead to a goal. Goal recognition is a kind patterned goals, we cannot predict further actions. of abduction process, where agents’ goals are deemed the It is worth mentioning Geib’s PHATT algorithm as de- most probable or concise explanation for those agents’ ac- scribed in (Geib and Goldman 2009) for its use of proba- tions. To illustrate the importance of goals and goal recog- bilistic actions and AND-OR trees (suitable for contingent nition, let us take the muddy children puzzle, a staple in dy- planning and very similar to behaviour trees in game agent namic epistemic logic. Agents need to assume that every- logic). It relies, however, on the description of the structure one’s goal includes a truthful account of their observations. of tasks and a preexisting plan library. However, further re- Without this assumption, for example with a lying agent, the search into the induction of task structure and hierarchies, puzzle cannot proceed. as well as using planning itself as a plan library generation would be necessary to consider this algorithm. action preconditions to the receiver, regardless of their ac- A further concept in automated planning is goal recog- tual truth) that are related to the current goals. The whole nition design, whereby some action is designed to make an loop of goal recognition, reasoning and planning is per- agent’s goals as easy to discover as possible, as described formed in the simulated model for the other agent, to the in (Keren, Gal, and Karpas 2014). By making explicit state- extent to which resources can be dedicated. If we belief that ments about goals and beliefs in our doxastic model, plan- GOALA (catch killer), informing A of has(B, knif e) al- ning algorithms as described in the following section can lows it to plan further actions, like asking B for the knife. If include actions that reduce uncertainty, like sensing actions we believe that BA (∀X(has(X, knif e) ⇒ is killer(X))) and goal recognition design. this information item is a particularly powerful lever to guide A’s actions. Action and Discourse Planning Some authors use existing planners adapted with epis- temic predicates. For example (Marques and Rovatsos 2016) Ontic actions can be planned using planners that can han- modifies the Contingent-FF planner to include requests and dle a probabilistic outcome using the action specifications yes/no questions. Also (Muise et al. 2015) apply a classi- stored in the agent’s belief base, like probabilistic planners cal planner, the Fast Downward planner (Helmert 2006), to or a deterministic planner with replanning like (Yoon, Fern, a multiagent epistemic setting with higher order beliefs us- and Givan 2007). Also, other multiagent frameworks and ing additional fluents derived from epistemic logic axioms. proposals specify preconditions and postconditions for com- However, such an approach must accomodate the expensive municative actions, e.g. the FIPA standards (FIPA 2008). computations for doxastic postconditions. FIPA defines epistemic and doxastic preconditions (feasibil- ity preconditions, FP) and postconditions (rational effects, Second Act: Aisha Talks to Bo Yang RE). These preconditions and postconditions are asserted in the belief base if an agent detects that action. We could con- Aisha then takes Chinira’s request to Bo Yang. However, she sider these as ”social protocols” that state clearly the goals mentions Deepak when mentioning the patrol report to Bo of the speaker. Yang. When issuing Chinira’s request, Aisha does not reveal However, a complication for communicative actions in how Chinira has come to this conclusion, but she makes it our setting is that traditional agent frameworks are oriented clear that it comes from Chinira. toward collaborative agents. Feasibility preconditions ex- A press a socially agreed reason to why the action is per- say(BD (in(squad, spirit))) formed, but in fact nothing prevents an agent to say whatever say(done(request(C, request(A, stay(), {B}), {A}))) it wants. Furthermore, when seen from the perspective of the request(A, stay(), {B}) ”sender”, a communicative act may be issued precisely to guide the goal recognition process in the ”receiver” towards The specification language for higher order epistemic ac- a certain goal or plan. The possession or not of a certain tions would allow Bo Yang to examine a possible plan where knowledge does not enable us to ask a question; rather, our Aisha’s goal is represented and this observation is matched, goal of reducing uncertainty or of making another agent be- but goal probability priors would rank the corresponding lieve that one does not know something is what will compel goal fairly lower than alternative goals from other agents, or us toward that action. In a similar way, evaluating the out- would have too many unknown factors. Bo Yang may sus- come of a communicative action needs to take goals into pect that ∃X(G OALX (¬in(B, city))), but he would not be account. able to progress the reasoning much further without a lot of Postconditions in communicative actions become compli- time consuming sensing actions. Bo Yang believes Aisha to cated to compute: the sender has to try and replicate a goal be a loyal individual due again to induction from past ob- recognition step, using the receiver’s beliefs about the sender servations. A more plausible goal at play is assuming that and goals to the extent to the sender itself’s belief, and then Bo Yang has the goal of impressing higher officers (a wrong try to predict what will the receiver belief about the sender’s belief that has nonetheless crept up high in Bo Yang’s pri- intentions. Note that even ontic actions may carry a dox- oritized belief base due to previous interactions with Aisha) astic effect, in the sense that any action is framed within by recovering the book and keeping him in the garrison. Bo an estimation of goals. Opening a door is evidence for the Yang decides instead to face the incoming army (as he can- other agent to have run through it, but it may have been left not go for the book himself and clash with Chinira, staying open on purpose to lure the observer into that conclusion. would result in a loss of face, and no sentence could possibly We believe that the increase in memory and computation land higher in Chinira’s belief base to change their mind in power of user equipment justifies exploring this modelling. Bo Yang’s belief). This results in Bo Yang’s soldiers leaving As mentioned before, re-planning or MCTS techniques in their barracks for a few days. automated planning have yielded satisfactory results. Selecting the content to present in a speech act can be B guided by bulding a model for the receiver, including goals ¬in(squad, spirit) : plaus 4 and beliefs, so candidates for items in sentences can be pro- G OALC (honour(C) > honour(B)) : plaus 1 posed from incomplete proofs (e.g. whether they can close open branches in a tableaux) or plans (e.g. communicating Plan: (fight(squad)) Conclusion domains. In Twenty-Eighth International Conference on Au- We have presented a definition for games of manipulation, as tomated Planning and Scheduling. games with open communication and unknown goals, whose Marques, T., and Rovatsos, M. 2015. Toward domain- players use models of other agents to guide their actions. We independent dialogue planning. In Fourth International Workshop on Human-Agent Interaction Design and Models have pointed out at three aspects that enable these games: (HAIDM 2015). doxastic higher order reasoning, goal recognition, and epis- Marques, T., and Rovatsos, M. 2016. Classical planning with temic planning. For each of these areas we have identified al- communicative actions. In Proceedings of the Twenty-second ternatives for data structures and algorithms that can support European Conference on Artificial Intelligence, 1744–1745. these aspects. In future communications we plan to present Muise, C.; Belle, V.; Felli, P.; McIlraith, S.; Miller, T.; Pearce, prototypes for each of them. A. R.; and Sonenberg, L. 2015. Planning over multi-agent epistemic states: A classical planning approach. In Twenty- References Ninth AAAI Conference on Artificial Intelligence. Baker, C. L.; Tenenbaum, J. B.; and Saxe, R. 2006. Bayesian Panisson, A. R.; Sarkadi, S.; McBurney, P.; Parsons, S.; and models of human action understanding. Advances in neural Bordini, R. H. 2018. On the formal semantics of theory of information processing systems 18:99. mind in agent communication. In International Conference Baltag, A., and Smets, S. 2008. A qualitative theory of dy- on Agreement Technologies, 18–32. Springer. namic interactive belief revision. Logic and the foundations Ramı́rez, M., and Geffner, H. 2009. Plan recognition as plan- of game and decision theory (LOFT 7) 3:9–58. ning. In Twenty-First International Joint Conference on Arti- Baral, C.; Gelfond, G.; Pontelli, E.; and Son, T. C. 2015. An ficial Intelligence. action language for multi-agent domains: Foundations. arXiv Ramı́rez, M., and Geffner, H. 2010. Probabilistic plan recog- preprint arXiv:1511.01960. nition using off-the-shelf classical planners. In Proceedings Black, E.; Coles, A.; and Bernardini, S. 2014. Automated of the AAAI Conference on Artificial Intelligence, volume 24. planning of simple persuasion dialogues. In International Rao, A. S.; Georgeff, M. P.; et al. 1995. Bdi agents: From Workshop on Computational Logic and Multi-Agent Systems, theory to practice. In ICMAS, volume 95, 312–319. 87–104. Springer. Rott, H. 2009. Shifting priorities: Simple representations for Black, E.; Coles, A. J.; and Hampson, C. 2017. Planning for twenty-seven iterated theory change operators. In Towards persuasion. In 16th International Conference on Autonomous mathematical philosophy. Springer. 269–296. Agents and Multiagent Systems, AAMAS 2017, 933–942. In- Ryan, J.; Summerville, A.; Mateas, M.; and Wardrip-Fruin, ternational Foundation for Autonomous Agents and Multia- N. 2015. Toward characters who observe, tell, misremember, gent Systems (IFAAMAS). and lie. In Proceedings of the AAAI Conference on Artifi- Bolander, T.; Engesser, T.; Herzig, A.; Mattmüller, R.; and cial Intelligence and Interactive Digital Entertainment, vol- Nebel, B. 2019. The dynamic logic of policies and contin- ume 11. gent planning. In European Conference on Logics in Artificial Shvo, M.; Klassen, T. Q.; Sohrabi, S.; and McIlraith, S. A. Intelligence, 659–674. Springer. 2020. Epistemic plan recognition. In Proceedings of the 19th FIPA, T. 2008. Fipa communicative act library International Conference on Autonomous Agents and Multi- specification. Avaliable online: https://www. fipa. Agent Systems, 1251–1259. org/specs/fipa00037/SC00037J. html (accessed on 19 Sohrabi, S.; Riabov, A. V.; and Udrea, O. 2016. Plan recog- July 2021). nition as planning revisited. In IJCAI, 3258–3264. Geib, C. W., and Goldman, R. P. 2009. A probabilistic plan Steedman, M., and Petrick, R. 2007. Planning dialog actions. recognition algorithm based on plan tree grammars. Artificial In Proceedings of the 8th SIGdial Workshop on Discourse Intelligence 173(11):1101–1132. and Dialogue, 265–272. Gusmão, K. M.; Pereira, R. F.; and Meneguzzi, F. 2021. Infer- Van Ditmarsch, H.; van Der Hoek, W.; and Kooi, B. 2007. ring agents preferences as priors for probabilistic goal recog- Dynamic epistemic logic, volume 337. Springer Science & nition. arXiv preprint arXiv:2102.11791. Business Media. Helmert, M. 2006. The fast downward planning system. Velázquez-Quesada, F. R. 2017. On subtler belief revision Journal of Artificial Intelligence Research 26:191–246. policies. In International Workshop on Logic, Rationality and Herzig, A. 2017. Dynamic epistemic logics: promises, prob- Interaction, 314–329. Springer. lems, shortcomings, and perspectives. Journal of Applied Wan, H.; Fang, B.; and Liu, Y. 2021. A general multi-agent Non-Classical Logics 27(3-4):328–341. epistemic planner based on higher-order belief change. Arti- Horswill, I. 2018. Postmortem: Mkultra, an experimental ficial Intelligence 301:103562. ai-based game. In Proceedings of the AAAI Conference on Ware, S. G., and Siler, C. 2021. The sabre narrative plan- Artificial Intelligence and Interactive Digital Entertainment, ner: multi-agent coordination with intentions and beliefs. In volume 14. Proceedings of the 20th International Conference on Au- Joule, R.-V.; Beauvois, J.-L.; and Deschamps, J. C. 1987. Pe- tonomous Agents and Multiagent Systems, 1698–1700. tit traité de manipulation à l’usage des honnêtes gens. Presses Yoon, S. W.; Fern, A.; and Givan, R. 2007. Ff-replan: A universitaires de Grenoble Grenoble. baseline for probabilistic planning. In ICAPS, volume 7, 352– Keren, S.; Gal, A.; and Karpas, E. 2014. Goal recognition 359. design. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 24. Le, T.; Fabiano, F.; Son, T. C.; and Pontelli, E. 2018. Efp and pg-efp: Epistemic forward search planners in multi-agent