<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning to Coordinate without Communication under Incomplete Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shenghui Chen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shufang Zhu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe De Giacomo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ufuk Topcu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Liverpool</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Texas at Austin</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Achieving seamless coordination in cooperative games is a crucial challenge in artificial intelligence, particularly when players operate under incomplete information. While communication helps, it is not always feasible. In this paper, we explore how efective coordination can be achieved without verbal communication, relying solely on observing each other's actions. Our method enables an agent to develop a strategy by interpreting its partner's action sequences as intent signals, constructing a finite-state transducer built from deterministic finite automata, one for each possible action the agent can take. Experiments show that these strategies significantly outperform uncoordinated ones and closely match the performance of coordinating via direct communication. A full version with appendix is available at https://arxiv.org/abs/2409.12397v3.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Games under incomplete information</kwd>
        <kwd>implicit Communication</kwd>
        <kwd>shared-control games</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In artificial intelligence, autonomous agents often compete or cooperate, reflecting real-world
interactions. Games ofer structured settings to study such behaviors. Much of the research has focused on
adversarial games, where agents pursue goals despite adversarial environments [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Conversely,
cooperative games [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] require agents to collaborate toward a shared goal. In this paper, we are interested
in shared-control games [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a form of cooperative games in which two players, the seeker and the helper,
collectively control a single token to achieve a goal. For instance, in robotic warehouses, a human
operator (seeker) navigates to retrieve items while a support robot (helper) clears obstacles, allowing
the operator to progress to its location (token). Helper agents with such assistive abilities have the
potential to enhance collaboration with humans in various settings, from virtual games [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] to physical
applications like assistive wheelchairs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Shared-control games are especially challenging when players have incomplete or difering
information. Such asymmetry, from partial observations or limited game understanding, can cause misaligned
or suboptimal actions. In robotic warehouses, poor inference can reduce eficiency and pose safety risks.
Direct communication ofers a solution by enabling the exchange of relevant information between
players. Recent work leverages large language models to express and interpret intentions via natural
language, improving coordination in human-AI teams [
        <xref ref-type="bibr" rid="ref5 ref9">9, 10, 5</xref>
        ]. However, direct communication is not
always feasible due to constraints like limited bandwidth, latency, noise, or task demands. In such cases,
coordination must rely on inferring intent from observed behavior alone.
      </p>
      <p>
        In this paper, we consider scenarios where direct verbal communication is unavailable. In such
settings, the helper must infer when assistance is needed based solely on the seeker’s trajectory. Our
framework generalizes shared-control games [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by allowing multi-step control for the seeker and
introducing a helper strategy that interprets the observed trajectory for efective coordination. To
obtain a helper strategy, we represent it as a finite-state transducer composed of several deterministic
ifnite automata (DFAs), each corresponding to a specific helper action. Each DFA is learned using a
variant of Angluin’s L* algorithm [11]. The learning process is based on sequences of observed seeker
moves, with each DFA accepting those sequences that align with the intention to trigger its associated
action and rejecting those that do not. The learned DFAs are then combined into a finite-state transducer
that encodes the helper’s overall strategy.
      </p>
      <p>
        We empirically evaluate our proposed solution in Gnomes at NightTM, the same testbed used by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
We compare the helper’s performance in our no-communication coordination approach with two other
cases: a worst-case scenario where the helper does not try to coordinate at all, and a best-case scenario
where the helper coordinates through direct communication. We measure success rates and the number
of steps to complete the game across a given number of trials and diferent maze configurations. We
test on 9 × 9 and larger 12 × 12 mazes to assess the solution’s ability to generalize across maze sizes.
Results show that no-communication coordination with our solution significantly improves success
rates over no coordination in both maze sizes and performs comparably to direct communication. It
also reduces steps, wall memory, and wall error rate by more than half.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>The problem of achieving coordination in multi-agent systems involves enabling autonomous agents to
work together toward shared goals. Prior work spans distributed AI [12], swarm intelligence (stigmergy
[13]), and game theory (correlated equilibrium [14]). However, in many settings, all agents are aware of
the goal, typically assuming all agents know the goal. In contrast, we study coordination where only
one agent knows the goal.</p>
      <p>
        A common approach to address these challenges under incomplete information is through explicit
communication, using discrete signals as in Hanabi [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or natural language in negotiation and
coordination games like Deal-or-No-Deal [15, 16], Diplomacy [17], and MutualFriends [18]. Recently,
Gnomes at NightTM was used to highlight the challenges of shared control under incomplete
information, when leveraging natural language dialogue for communication [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In contrast, our study examines
coordination without direct communication, using a mute version of Gnomes at NightTM.
      </p>
      <p>Another approach to understanding coordination is through multi-agent reinforcement learning
(MARL), where agents learn cooperative strategies via trial and error in complex environments,
particularly through self-play and opponent modeling [19]. However, most MARL approaches use neural
networks to represent policies, which often obscures the intent inference process within the learning
model. The automata-learning-based solution technique proposed in this paper provides a more explicit
representation, potentially ofering better explainability. Pedestrian trajectory prediction similarly
involves anticipating future actions from past behavior, environmental conditions, and interactions with
others—analogous to the helper inferring the seeker’s intent. Approaches include knowledge-based
models [20] and supervised deep learning methods [21].</p>
      <p>A process mirroring the challenge of the helper attempting to infer the seeker’s intended actions
is plan recognition in planning [22]. Goal recognition involves identifying all potential goals an agent
might pursue based on a sequence of observed actions [23, 24, 25, 26, 27, 28]. In this context, the domain
is entirely visible, allowing for the calculation of possible goals that can be achieved through an optimal
policy aligning with these observations. However, in our setting, the helper lacks information on the
seeker’s domain. An eficient coordination could help in the mutual understanding of each player’s
domain. Exploring how to develop such coordination aligns with the focus of this study.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>A deterministic finite automaton (DFA) is a tuple  = (2Prop, , 0, ,  ), where Prop is the alphabet,
 the finite set of states, 0 ∈  the initial state,  :  × 2Prop →  the transition function, and  ⊆ 
the accepting states. The language ℒ() denotes the set of accepted traces.</p>
      <p>We use Angluin’s L* algorithm [11] to learn DFAs via two query types: (1) Membership queries, where
the learner asks whether a trace  is accepted; and (2) Equivalence queries, where the learner submits a
hypothesized DFA and, if incorrect, receives a counterexample to refine it.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Formal Framework</title>
      <p>
        We extend the shared-control game under incomplete information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to allow the seeker to retain control
for multiple steps before transferring it to the helper. This modification enables intent to be expressed
over action sequences rather than isolated moves.
      </p>
      <p>A shared-control game with seeker multi-step dynamics is defined as a tuple Γ =
(, init, final , S, H,  S,  H), where  is the finite state space; init and final are initial and goal
states;  and   :  ×   →  are the private action sets and deterministic transition functions for
each agent  ∈ {S, H}. We extend the seeker’s transition to action sequences via:</p>
      <p>* S(, [1, . . . , ]) =  S(. . .  S( S(, 1), 2), . . . , ).</p>
      <p>A common reward function ℛ :  × (S ∪ H) → R captures the cooperative objective of minimizing
steps to the goal. The seeker S takes the initial turn.</p>
      <p>Problem. Given Γ and a reward function ℛ, the seeker follows a policy  S :  ×  H → (S)+
unknown to the helper, but whose resulting actions the helper can observe. The goal is to learn a helper
policy  H :  × (S)+ → H that maximizes cumulative reward:
max
 H

∑︁ ℛ(, )
=0
s. t. a0 = [], 0 = init, ∃ ∈ {0, . . . ,  } ..  = final .</p>
      <p>{︃
{︃aS+1 =  S(, H) on S’s turn,
H+1 =  H(, a ) on H’s turn,</p>
      <p>S
+1 =
* S(, aS+1)
 H(, H+1) on H’s turn,
on S’s turn,
(1a)
(1b)
(1c)
where  indexes turns, and  denotes the total number of turns allowed.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Solution Technique</title>
      <p>The key challenge for the helper is to infer the seeker’s required help by observing its action sequences,
as direct communication is disallowed. We propose an automata-learning-based approach in which the
helper constructs intent-response DFAs—one per helper action—to recognize patterns in the seeker’s
behavior that imply expected responses. These DFAs, unknown to the seeker, are combined into a
ifnite-state transducer that maps seeker action sequences to helper actions.</p>
      <sec id="sec-5-1">
        <title>5.1. Learning Helper’s Intent-Response DFAs</title>
        <p>The seeker pre-determines a policy  S to express intent through action sequences. The helper must
learn to strategically perform the actions expected by the seeker when the seeker cannot proceed. To
develop a corresponding strategy  H for the helper, we introduce an automata-learning-based technique.
The key insight is that when the seeker does not need assistance, it will naturally follow the shortest
path. In this case, if the action sequence taken deviates from the shortest path, the extra actions taken
are interpreted as intent information. We capture such intent information by associating each helper
action with a DFA that accepts such indicative sequences, and use Angluin’s L* algorithm to learn
these intent-response DFAs. The helper plays the role of the learner, querying the seeker (as the teacher)
through membership and equivalence queries, learning one DFA per action in parallel.
Membership Query. The seeker generates an action sequence, knowing which action it expects the
helper to perform. The helper extracts intent segments from the observed sequence, infers an expected
action, and performs it. If the performed action matches the seeker’s intent, the seeker replies “Yes",
Algorithm 1 No-Communication Coordination (NCC)
Input: current state , seeker action sequence aS, action space H, transition function  H,
intentresponse DFAs D
Output: a set of helper actions H
1: Initialize frequency count  () = 0 for all  ∈ H
2: { 1,  2, . . . } ← Capping(aS)
3: for each segment   and each action  ∈ H do
4: if  accepts   then
5:  () ←  () + 1
6: end if
7: end for
8: Set  () = 0 where  H(, ) is invalid
9: return a set of actions with the maximum frequency H = arg max∈H  ()
and all extracted segments are positive examples for the corresponding intent-response DFA  (where
 is the helper’s action) and negative for all others. A “No" indicates negative membership for .</p>
        <p>By counterfactual intuition, if no coordination is needed, the seeker would naturally follow the
shortest path. Hence, redundancies in the sequence suggest that the seeker’s intent is embedded in
segments outside this shortest path. To identify these “informative" segments, the helper constructs a
subgraph of visited states, computes the shortest path from prior to current location, and removes it
from the action sequence. The remaining segments are hence intent segments.</p>
        <p>Equivalence query. For the equivalence query, it is not feasible for the seeker to compare the learned
DFAs with the oracle DFAs it has in mind, as the seeker’s strategy  S inherently embeds these oracles.
We conduct the equivalence query by querying a bounded number of membership. Once the bound
is reached, we conclude that the learned intent-response DFAs, denoted as D = {}∈H where
 = {︁2S , , 0,  ,  }︁, for each helper action  ∈ H, are equivalent to the oracle DFAs.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Helper’s Strategy Construction</title>
        <p>The learned intent-response DFAs D allow the helper to recognize the seeker’s intent solely by analyzing
the seeker’s action sequences. When the seeker cannot proceed, it becomes the helper’s turn to
strategically provide assistance. Given a game Γ = (, init, final , S, H,  S,  H) and the learned
intent-response DFAs D, we define a strategy generator, i.e., finite-state transducer  , from which we
can immediately obtain a helper’s strategy  H :  × (S)+ → H to solve the problem in Section 4,
though there is no guarantee of optimality in general.</p>
        <p>During the helper’s turn, the helper uses the current state  and the seeker’s previous action sequence
aS to infer the expected next action. Informative segments are extracted from aS and evaluated against
each intent-response DFA in D. Accepted actions are filtered by the helper’s transition function,
and those with highest frequency are returned as intended actions. Formally, the strategy generator
 = (, init, S, H,  S,  H, ,  ) is constructed as follows:
• , init, S, H,  S,  H are the same as in Γ.
•  :  × aS+1 → 2 is the transition function, where aS+1 = [S1 , . . . , S ] is the observed seeker’s
action sequence, such that (, aS+1) = { H(, H) | H ∈  (, aS+1)}.
•  :  × aS+1 → 2H is the output function such that  (, aS+1) = NCC(, aS+1, H,  H, D). See</p>
        <p>Algorithm 1.</p>
        <p>This construction avoids the exponential blowup of DFA composition by evaluating each DFA
independently on extracted segments. Hence, the transducer size is linear in the size of the DFAs, and
the cost of obtaining intended actions is also linear in |H|.  generates a strategy by allowing the
helper to arbitrarily select an action returned by the output function  (, aS), which provides all equally
likely intended actions. The strategy is non-Markovian, as  depends on the full seeker sequence rather
than just the last state or action.</p>
        <p>It is worth noting that every helper’s intent-response DFA  in D is defined only based on the
seeker’s actions. Consequently, as long as the seeker utilizes the same policy  S to express its intentions,
we can apply these DFAs D across various games that share the same action space of both players.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Simulation Experiment</title>
      <p>Gnomest at Night Testbed. We illustrate the coordination challenge in shared-control games with
incomplete information using the setup shown in Figure 1. The left board displays the feasible moves
for the seeker, while the right board shows those for the helper. Notably, each player is constrained by
their own set of walls, leading to distinct feasible moves for each. In the example shown in Figure 1,
the token starts in the top-left corner and must reach the bottom-left goal state. However, the seeker
begins inside a T-shaped enclosure that prevents independent progress, making cooperation with the
helper essential. For example, when the token is at L1, the helper must move right to L2 to free the
seeker from the enclosure. Later, at L3, the helper must move down to L4 so the seeker can continue
toward the goal.</p>
      <p>sinit
sfinal</p>
      <p>Seeker</p>
      <p>Helper
sinit
L1 L2</p>
      <p>L1 L2
L3
L4</p>
      <p>L3
L4</p>
      <p>
        Configurations. We evaluate our approach in the Gnomes at NightTM testbed [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where each
configuration consists of a maze layout and a treasure location. To assess generalization, we use 10 unseen
layouts each for 9 × 9 and 12 × 12 mazes, each with 5 distinct treasure positions, 50 configurations per
size. Experiments were run on a MacBook Pro (Apple M1, 8GB RAM, Python 3.9+).
Baselines. We evaluate three coordination types with three diferent levels of information exchange:
In the no coordination (NC) setting, the seeker plans its path using a modified A* algorithm (see
Algorithm 2 in the appendix of the full version), while the helper attempts to guess the seeker’s desired
next action, but due to a lack of communication, its actions are essentially random. With direct
communication coordination (DCC), both players have a clear communication channel, allowing
the seeker to directly inform the helper of its desired help, which the helper then executes on its turn.
In our proposed no communication coordination (NCC) setting, the seeker incorporates its required
help into its trajectory using the proposed DFA-based approach. The helper interprets the seeker’s
trajectory and chooses its next action based on the perceived intent. For all conditions, the seeker
implementation remains, and only the helper strategy varies.
      </p>
      <p>Each coordination type is evaluated with  = 100 trials per configuration. A trial is considered
successful if either agent reaches the treasure within  = 300 steps (for 9 × 9) or  = 600 steps (for
12 × 12); otherwise, it is marked as a failure.</p>
      <p>Seeker and Helper Implementations The seeker plans paths with a modified A* algorithm that
minimizes wall violations under its own and inferred partner constraints. Upon violation, it replans
and inserts intent-expressive actions. Deviations by the helper trigger belief updates about unknown
walls. See Appendix A and B of the full version for details.</p>
      <p>To train the helper, we collect 100 trajectories from 9 × 9 mazes and use the L* algorithm to learn
intent-response DFAs for right, up, left, and down. Average learning time is under 0.4s. Jaccard
similarity [29] with oracle DFAs ranges from 0.58 to 0.80.</p>
      <sec id="sec-6-1">
        <title>No Coordination (NC)</title>
      </sec>
      <sec id="sec-6-2">
        <title>No-Communication Coordination (NCC) Direct-Communication Coordination (DCC)</title>
        <p>100
90.1694.08</p>
        <p>Metrics. We report three metrics for each coordination type: (1) Success rate, defined as the fraction
of successful trials averaged over 50 configurations; (2) Steps taken, reported as the mean and standard
deviation of steps to termination across trials and configurations; and (3) Seeker memory, evaluated
by comparing the seeker’s memorized wall constraints with the helper’s actual maze layout, reporting
the mean and standard deviation of both the number of memorized walls and their error rate.
Hypotheses. (H1) NCC outperforms NC in success rate, but underperforms DCC. (H2) NCC yields
fewer steps than NC, but more than DCC. (H3) NCC lowers both the number and error rate of memorized
walls compared to NC.</p>
        <sec id="sec-6-2-1">
          <title>6.1. Results</title>
          <p>On H1 (Success Rate). The left plot in Figure 2 shows that NCC significantly outperforms NC,
improving success rates by 61.54% (9 × 9) and 72.84% (12 × 12). NCC approaches oracle-level
performance, with success rates within 4-7% of DCC. A Mann-Whitney U test [30] confirms NCC
significantly outperforms NC (  &lt; 0.001) in both sizes, while diferences between NCC and DCC
are not statistically significant (  &gt; 0.1). These results not only support H1 but surpass our initial
expectations.</p>
          <p>On H2 (Steps Taken). The right plot in Figure 2 shows that NCC reduces steps compared to NC
in both maze sizes ( &lt; 0.001), but requires more steps than DCC ( &lt; 0.001), as expected as NCC
requires more steps to efectively express its intentions through its trajectory These results support H2.
On H3 (Seeker Memory). Table 1 shows NCC reduces both constraint count and error rate versus NC:
by 49.4% and 56.2% in 9 × 9, and 69.8% and 60.8% in 12 × 12, respectively. These results support H3,
showing NCC minimizes unnecessary exploration and improves intent identification eficiency.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>We studied how a helper agent can learn to coordinate with a seeker in cooperative games without
communication. Our approach uses automata learning to infer the seeker’s intent by constructing a
DFA for each helper action. Experiments in Gnomes at NightTM show that this method approaches the
performance of an oracle with direct communication.</p>
      <p>Future work includes developing an iterative version that refines the helper’s strategy over time,
extending from standard reachability to temporal objectives, and adapting to settings with greater
non-determinism, such as human or environmental interactions.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the UKRI Erlangen AI Hub on Mathematical and Computational
Foundations of AI (Grant No. EP/Y028872/1), the National Science Foundation (NSF Grant No. 1836900),
and the Army Research Ofice (ARO Grant No. W911NF-23-1-0317).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o for: Grammar and spelling check,
paraphrasing and rewording, and improving writing style. After using this tool, the authors reviewed
and edited the content as needed and take full responsibility for the publication’s content.
[10] J. Liu, C. Yu, J. Gao, Y. Xie, Q. Liao, Y. Wu, Y. Wang, Llm-powered hierarchical language agent
for real-time human-ai coordination, in: International Conference on Autonomous Agents and
Multiagent Systems, 2024, p. 1219–1228.
[11] D. Angluin, Learning regular sets from queries and counterexamples, Information and Computation
75 (1987) 87–106.
[12] M. R. Genesereth, M. L. Ginsberg, J. S. Rosenschein, Cooperation without communication, in:</p>
      <p>Readings in Distributed Artificial Intelligence, Elsevier, 1988, pp. 220–226.
[13] L. Marsh, C. Onof, Stigmergic epistemology, stigmergic cognition, Cognitive Systems Research 9
(2008) 136–149.
[14] R. J. Aumann, Subjectivity and correlation in randomized strategies, Journal of Mathematical</p>
      <p>Economics 1 (1974) 67–96.
[15] M. Lewis, D. Yarats, Y. Dauphin, D. Parikh, D. Batra, Deal or no Deal? End-to-end Learning of
Negotiation Dialogues, in: Conference on Empirical Methods in Natural Language Processing,
2017, pp. 2443–2453.
[16] H. He, D. Chen, A. Balakrishnan, P. Liang, Decoupling Strategy and Generation in Negotiation
Dialogues, in: Conference on Empirical Methods in Natural Language Processing, 2018, pp.
2333–2343.
[17] P. Paquette, Y. Lu, S. S. Bocco, M. Smith, S. O-G, J. K. Kummerfeld, J. Pineau, S. Singh, A. C.</p>
      <p>Courville, No-press Diplomacy: Modeling Multi-Agent Gameplay, in: Advances in Neural
Information Processing Systems, 2019.
[18] H. He, A. Balakrishnan, M. Eric, P. Liang, Learning symmetric collaborative dialogue agents with
dynamic knowledge graph embeddings, in: Annual Meeting of the Association for Computational
Linguistics, 2017, pp. 1766–1776.
[19] J. Foerster, R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, I. Mordatch, Learning with
opponentlearning awareness, in: International Conference on Autonomous Agents and MultiAgent Systems,
2018, p. 122–130.
[20] D. Helbing, P. Molnár, Social force model for pedestrian dynamics, Physical Review E 51 (1995)
4282–4286.
[21] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, S. Savarese, Social LSTM: human
trajectory prediction in crowded spaces, in: IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 961–971.
[22] H. A. Kautz, J. F. Allen, et al., Generalized plan recognition, in: AAAI National Conference on</p>
      <p>Artificial Intelligence, 1986, p. 32–37.
[23] M. B. Vilain, Getting serious about parsing plans: A grammatical analysis of plan recognition, in:</p>
      <p>AAAI National Conference on Artificial Intelligence, 1990, p. 190–197.
[24] E. Charniak, R. P. Goldman, A bayesian model of plan recognition, Artificial Intelligence 64 (1993)
53–79.
[25] N. Lesh, O. Etzioni, A sound and fast goal recognizer, in: International Joint Conference on</p>
      <p>Artificial Intelligence, 1995, p. 1704–1710.
[26] R. P. Goldman, C. W. Geib, C. A. Miller, A new model of plan recognition, in: Conference on</p>
      <p>Uncertainty in Artificial Intelligence, 1999, p. 245–254.
[27] D. Avrahami-Zilberbrand, G. A. Kaminka, Fast and complete symbolic plan recognition, in: The</p>
      <p>International Joint Conference on Artificial Intelligence, 2005, p. 653–658.
[28] M. Ramırez, H. Gefner, Plan recognition as planning, in: The International Joint Conference on</p>
      <p>Artificial Intelligence, 2009, p. 1778–1783.
[29] P. Jaccard, Étude comparative de la distribution florale dans une portion des alpes et des jura, Bull</p>
      <p>Soc Vaudoise Sci Nat 37 (1901) 547–579.
[30] H. B. Mann, D. R. Whitney, On a test of whether one of two random variables is stochastically
larger than the other, The annals of mathematical statistics (1947) 50–60.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Traverso</surname>
          </string-name>
          ,
          <article-title>Strong planning in non-deterministic domains via model checking</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence Planning Systems</source>
          ,
          <year>1998</year>
          , p.
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pistore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Traverso</surname>
          </string-name>
          , Weak, strong, and
          <article-title>strong cyclic planning via symbolic model checking</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>147</volume>
          (
          <year>2003</year>
          )
          <fpage>35</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gefner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bonet</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Concise</surname>
          </string-name>
          <article-title>Introduction to Models and Methods for Automated Planning</article-title>
          ,
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          , Springer Cham,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dafoe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bachrach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hadfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Larson</surname>
          </string-name>
          , T. Graepel,
          <article-title>Cooperative ai: Machines must learn to find common ground</article-title>
          ,
          <source>Nature</source>
          <volume>593</volume>
          (
          <year>2021</year>
          )
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fried</surname>
          </string-name>
          , U. Topcu,
          <article-title>Human-agent cooperation in games under incomplete information through natural language communication</article-title>
          ,
          <source>in: International Joint Conference on Artificial Intelligence</source>
          ,
          <string-name>
            <surname>Human-Centred</surname>
            <given-names>AI</given-names>
          </string-name>
          track,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carroll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grifiths</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dragan</surname>
          </string-name>
          ,
          <article-title>On the Utility of Learning about Humans for Human-AI Coordination</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Foerster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanctot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Parisotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hughes</surname>
          </string-name>
          , I. Dunning,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mourad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Bellemare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bowling</surname>
          </string-name>
          ,
          <article-title>The hanabi challenge: A new frontier for ai research</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>280</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Goil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Derry</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. D. Argall,</surname>
          </string-name>
          <article-title>Using machine learning to blend human and robot controls for assisted wheelchair navigation</article-title>
          ,
          <source>in: IEEE International Conference on Rehabilitation Robotics</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Eficient human-ai coordination via preparatory language-based convention</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2311</volume>
          .
          <fpage>00416</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>