<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Strategy Repair in Reachability Games (short paper)⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierre Gaillard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Patrizi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Perelli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENS Paris-Saclay, University Paris-Saclay</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza University of Rome</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We introduce Strategy Repair, the problem of finding a minimal amount of modifications to turn a strategy for a reachability game from losing into winning. The problem is relevant for a number of settings in Planning and Synthesis, where solutions essentially correspond to winning strategies in a suitably defined reachability game. We show, via reduction from Vertex Cover, that Strategy Repair is NP-complete and devise two algorithms, one optimal and exponential and one polynomial but sub-optimal.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Reachability Games</kwd>
        <kwd>Synthesis</kwd>
        <kwd>Strategic Reasoning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Reachability Games (RGs) [2] can serve as semantic models for reasoning about dynamic
domains, with the resulting strategy representing the behavior that an agent can execute, in
order to achieve a desired state. Typically, however, at execution time, models deviate from the
actual trajectory that stems from strategy execution, resulting in a situation where the actual
state does not match that of the model. There may also be situations where the goal changes
during strategy execution. In both these examples, the agent is unable to keep executing the
computed strategy (which was originally winning) and take appropriate actions to achieve the
desired goal. Thus, the problem arises of coming up with a new strategy that guarantees goal
achievement.</p>
      <p>The original strategy might have been designed to guarantee not only goal achievement,
but also a number of additional properties, such as cost minimization, reward maximization,
or forbidden-state avoidance, which might yield a significant additional computational efort.
Thus, when the unexpected changes are small and yield only a slightly diferent problem wrt
the original one, i.e., only few target states are added or removed and state mismatches occur
rarely, it is reasonable to seek for a solution obtained as a slight modification of the original one,
under the assumption that the new strategy will retain all (or part of) the properties featured
by the initial strategy, without needing the computational overhead required to achieve such
properties.</p>
      <p>This paper investigates this approach from the general perspective of RGs. We introduce a
problem, called Strategy Repair, which requires, given a losing strategy  0, to find a minimal
amount of modifications which turn  0 into a winning strategy.</p>
      <p>
        We make the following contributions. Firstly, we formally define the problem by introducing
a notion of distance between two strategies, which intuitively corresponds to the number of
states over which the strategies difer. Then, based on this notion, we devise a solution algorithm
and characterize its complexity. Specifically, we prove, by reduction from Vertex Cover, that
the decision version of Strategy Repair is NP-complete. We then investigate more eficient,
but sub-optimal, alternatives, devising a polynomial greedy algorithm. The full version of this
paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] also reports on an experimental analysis, which shows that the polynomial algorithm
yields impressive results in terms of running time, scalability and accuracy (measured as distance
from the optimal solution).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>A 2-player arena, or simply arena is a tuple  = ⟨, 0, 1, E⟩, where  is the set of nodes, or
vertices, with  = 0 ∪ 1 and 0 ∩ 1 = ∅, and E ⊆  ×  is the set of edges of the arena.
We say that 0 is the set of nodes controlled by player 0, (0), whereas 1 is the set of nodes
controlled by player 1 (1).</p>
      <p>Definition 1 (Reachability game). A Reachability game is a pair  = ⟨,  ⟩, where  is an
arena, and  ⊆  is a subset of nodes, sometimes called target.</p>
      <p>A path in the arena is a sequence  = 0 · 1 · 2 . . . ∈   such that (, +1) ∈ E for each
 ∈ N. As usual, by  , we denote the -th node occurring in the sequence  , whereas by  ≤ 
we denote the prefix of  up to node  , also called partial path. We say that a path  is winning
for player 0 if   ∈  for some  ∈ N, otherwise it is winning for player 1. A strategy for player
0 is a function  0 :  * · 0 → E mapping partial paths to edges, such that  0(0 . . . ) is an
edge outgoing from , for each partial path in  * · 0. A strategy  1 for player 1 is defined
accordingly. A path  is compatible with strategy  0 if  0( ≤ ) = ( ,  +1) for each   ∈ 0.
Analogously, it is compatible with strategy  1 if  1( ≤ ) = ( ,  +1) for each   ∈ 1.</p>
      <p>
        We say that a strategy  0 is winning for player 0 from , if every path  starting from  and
compatible with  0 is winning. We say that a node  is winning for player 0 if there exists a
strategy  0 winning from . We denote by Win0() and Win1() the sets of nodes in  that
are winning for player 0 and 1, respectively. Finally, a strategy is said to be simply winning
if it is winning from every vertex in Win0(). It is well known that reachability games are
memoryless determined [3], that is, every node  is either winning for player 0 or winning for
player 1 and that there always exists a memoryless winning strategy, i.e., a winning strategy
that is defined as  0 : 0 → E mapping each node belonging to an agent to an outgoing edge.
Therefore, from now on we restrict our attention to only memoryless strategies. Such restriction
allows us to define a very natural distance between two player 0 strategies  0 and  0′ over the
same game, that is dist( 0,  0′) = |{ ∈ 0 |  0() ̸=  0′()}| Intuitively, we count the number
of nodes on which the two strategies map to a diferent outgoing edge. This can be proved to
be an actual distance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>We conclude this section by introducing some useful notation. For a given game  and an
edge  = (1, 2) ∈ E, by  we denote the game induced from  by removing every edge
(1′, 2′) incompatible with , that is, such that 1′ = 1 and 2′ ̸= 2. This can be extended to
subsets E′ ⊆ E of edges, where E′ = (E′∖{}) is recursively defined by projecting the edges
 of E′ one by one. Notice that a (memoryless) strategy  0 can be regarded as a subset of edges,
one for each node in 0, therefore  0 denotes the game induced from  by removing every
edge (, ′) incompatible with  0, that is, such that  ∈ 0 and (, ′) ̸=  0(). Note that every
vertex of 0 has only one successor in  0 , which means that player 0 has only strategy  0
available in the game.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The Strategy Repair Problem</title>
      <p>We now introduce the strategy repair problem for reachability games. First, for a given
reachability game  and a player 0 strategy  0, define Win0(,  0) to be the set of nodes from
which  0 is winning. It is not hard to show that Win0(,  0) = Win0( 0 ), that is, the nodes
that are winning for player 0 when it is using strategy  0 can be obtained by considering the
game  0 where the choices incompatible with  0 have already been ruled out. Observe that it
always holds that Win0(,  0) ⊆ Win0(), with Win0(,  0) = Win0() if, and only if,  0 is
winning for player 0. We define the strategy repair problem as follows.</p>
      <p>Definition 2 (Strategy repair problem). For a given reachability game  and a strategy  0,
ifnd a winning strategy  0′ such that dist( 0,  0′) ≤ dist( 0,  0′′) for each winning strategy  0′′.</p>
      <p>The problem introduced requires to minimize the number of modifications that are required to
turn a strategy  0 into a strategy  0′ winning for a given reachability game . The corresponding
decision problem, instead, consists in fixing a given threshold  ∈ N and checking whether
some winning strategy  0′ exists with dist( 0,  0′) ≤ . We now prove that the decision version
of the strategy repair problem for reachability games is NP-complete. To do so, we show a
reduction from the NP-complete problem vertex cover [4]. Given a vertex cover instance, the
idea is to construct a RG with one cycle for each edge in such a way that selecting a vertex  onto
the cover corresponds to one change in the strategy that breaks all the cycles corresponding to
adjacent edges of .</p>
      <p>
        Theorem 1. The strategy repair problem for reachability games is NP-complete [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithmic Solutions</title>
      <p>We now present two algorithms for Strategy Repair, which we called Opt and Greedy,
respectively. The former returns the optimal solution to the problem, but runs in exponential time.
The latter, instead, returns a sub-optimal solution but runs in polynomial time. It is important to
remark that they both produce correct winning strategies for the game. However, the algorithm
Greedy does not provide the best one in terms of distance from the originally specified strategy.</p>
      <p>We now proceed with the description of Algorithm Opt. In order to do so, we first introduce
some useful definition. For a given game  and a set  ⊆  of nodes, the Frontier of ,
denoted Frontier0() = ((0 ∖ ) × ) ∩ E, is the set of edges that are outgoing from
a Player 0 node and incoming to a node in . Intuitively, the edges in Frontier0 can be
used by Player 0 to enter in a single step the region . Consider a game  and a strategy
 0, and let  = Win0(,  0) be the set of nodes that are winning for strategy  0. Observe
that for an edge (, ′) ∈ Frontier0(), it holds that  0() ̸= (, ′), otherwise  would
have been winning for  0 in the first place. Moreover, it is trivial to show that the strategy
 0′ =  0[ ↦→ (, ′)] is such that Win0(,  0) ⊊ Win0(,  0′), with the inclusion being proper
because  ∈ Win0(,  0′) ∖ Win0(,  0). Algorithm 1: Opt.</p>
      <p>We are now ready to present the
algorithm Opt, which is reported in Algo- Input:  a reachability game,  0 a strategy
rithm 1. The algorithm works as follow. for player 0
First, it computes the winning region fol- Output: Winning strategy for 
lowing  0 denoted Win0(,  0), and com- minimizing the distance from  0
pares it with the winning region of the Fix(,  0) :
game Win0(). If the two sets are equal,  ′ ← Win0(,  0)
it means that  0 is already winning, so it if  ′ = Win0() then
returns the optimal solution ( 0, 0), with return ( 0, 0)
the second component denoting the cost else
of fixing. If that is not the case, the al- select (, ′) from Frontier( ′)
gorithm proceeds by first computing the ( 0′,  ′) ← Fix(,  0[ ↦→ (, ′)])
frontier of Win0(,  0), in order to se- ′ ←   0()
lect an edge (, ′) from it, then it com- if  ∈ Win0(′) then
pares two possible solutions. The first is ( 0′′,  ′′) ← Fix(′,  0)
obtained by solving the problem where if  ′′ &lt;  ′ + 1 then
the initial strategy is  0[ ↦→ (, ′)], ob- return ( 0′′,  ′′)
tained from  0 by diverting the choice on  end
with the frontier edge (, ′). The second end
is obtained by solving the problem when return ( 0′,  ′ + 1)
Player 0 is forced to select edge  0() in . end
This is obtained by considering the game
′ =  0(), where all other outgoing edges from  are removed. Both solutions are computed
with their relative costs  ′ and  ′′, which are then compared to select the best between the two.
Note that the latter solution might not exist, as the choice of  0 in  might lead, for instance, out
of the winning region. The algorithm then first checks whether such solution is viable before
making a useless recursive call on (′,  0). Observe that in the first case the total modification
cost  ′ must be increased by 1, as the initial strategy  0[ ↦→ (, ′)] is at distance 1 from  0
itself. We have the following.</p>
      <p>Theorem 2. The algorithm Opt returns the optimal solution to the Strategy Repair problem.</p>
      <p>The algorithm Opt presented in the previous section is of exponential complexity, as it
requires two recursive calls at each iteration to compare the distances between the initial
strategy and two candidate best solutions. Also, notice that the recursive call that makes use of
the selected edge in the frontier always computes a correct solution, although it might not be
the optimal one. Therefore, a suboptimal but polynomially computable solution could be found
by just selecting the one obtained from such call, disregarding the other.</p>
      <p>This is how the algorithm Greedy is
conceived. However, in order to improve Algorithm 2: Greedy.
the quality of the solution, i.e., the ac- Input:  a reachability game,  0 a strategy
curacy w.r.t. the optimum, we employ for player 0
a selection criterion for the edge in the Fix(,  0) :
frontier set. Indeed, consider an instance  ′ ← Win0(,  0)
(,  0) of Strategy Repair, and an edge if  ′ = Win0() then
(, ′) ∈ Frontier0(Win0(,  0)). First, return ( 0, 0)
note that  0() ̸= (, ′), otherwise, the end
node  would be winning for  0 and  ← Frontier0( ′)
(, ′) would not be in the frontier. There- (, ′) ← argmax{|Repair 0 (, ′)|;
fore, consider the set Repair 0 (, ′) = (, ′) ∈  }
Win0(,  0[ ↦→ (, ′)]) ∖ Win0(,  0), ( 0′,  ′) ← Fix(,  0[ ↦→ (, ′)])
that is, the set of nodes that are indirectly return ( 0′,  ′ + 1)
repaired by using the frontier edge (, ′)
in the solution. Therefore, when selecting the frontier edge, one might decide to greedily
maximize the number of nodes that are indirectly repaired by such a selection. This is how the
algorithm Greedy works, as it is presented in Algorithm 2.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>This work is an initial investigation into the problem of Strategy Repair and leaves at least
two interesting questions open. Firstly, while the polynomial algorithm exhibits outstanding
experimental performance, no approximation guarantee was obtained. For future work, we aim
at studying such a property. Secondly, it is interesting to go beyond simple reachability and
apply the repair approach to other problems. In particular, one immediate extension would
be to investigate applicability and efectiveness of the approach for strong cyclic [5] solutions.
More in general, the repair approach could be applied to more complex games, such as parity or
Büchi games [2, 3], which would have an immediate impact on more complex forms of planning,
such as Classical or FOND Planning for temporally extended goals.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Patrizi and Perelli were supported by the PNRR MUR project PE0000013-FAIR. Perelli was also
supported by the PRIN 2020 projects PINPOINT and by Sapienza University of Rome under the
“Progetti Grandi di Ateneo” programme, grant RG123188B3F7414A (ASGARD - Autonomous and
Self-Governing Agent-Based Rule Design).
Intelligence and Applications, IOS Press, 2023, pp. 780–787. URL: https://doi.org/10.3233/
FAIA230344. doi:10.3233/FAIA230344.
[2] E. Grädel, W. Thomas, T. Wilke (Eds.), Automata, Logics, and Infinite Games: A Guide to
Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of
Lecture Notes in Computer Science, Springer, 2002. URL: https://doi.org/10.1007/3-540-36387-4.
doi:10.1007/3-540-36387-4.
[3] D. Perrin, J. Pin, Infinite words - automata, semigroups, logic and games, volume 141 of</p>
      <p>Pure and applied mathematics series, Elsevier Morgan Kaufmann, 2004.
[4] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition,</p>
      <p>MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
[5] M. Daniele, P. Traverso, M. Y. Vardi, Strong cyclic planning revisited, in: S. Biundo,
M. Fox (Eds.), Recent Advances in AI Planning, 5th European Conference on Planning,
ECP’99, Durham, UK, September 8-10, 1999, Proceedings, volume 1809 of Lecture Notes
in Computer Science, Springer, 1999, pp. 35–48. URL: https://doi.org/10.1007/10720246_3.
doi:10.1007/10720246\_3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gaillard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patrizi</surname>
          </string-name>
          , G. Perelli, Strategy Repair in Reachability Games., in: K.
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nowé</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Nalepa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fairstein</surname>
          </string-name>
          , R. Radulescu (Eds.),
          <source>ECAI'23</source>
          , volume
          <volume>372</volume>
          of Frontiers in Artificial
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>