1. Introduction

Strategy Repair in Reachability Games (short paper)⋆

Pierre Gaillard

Fabio Patrizi

Giuseppe Perelli

1 0 ENS Paris-Saclay, University Paris-Saclay 1 Sapienza University of Rome

We introduce Strategy Repair, the problem of finding a minimal amount of modifications to turn a strategy for a reachability game from losing into winning. The problem is relevant for a number of settings in Planning and Synthesis, where solutions essentially correspond to winning strategies in a suitably defined reachability game. We show, via reduction from Vertex Cover, that Strategy Repair is NP-complete and devise two algorithms, one optimal and exponential and one polynomial but sub-optimal.

eol>Reachability Games Synthesis Strategic Reasoning

1. Introduction

Reachability Games (RGs) [2] can serve as semantic models for reasoning about dynamic domains, with the resulting strategy representing the behavior that an agent can execute, in order to achieve a desired state. Typically, however, at execution time, models deviate from the actual trajectory that stems from strategy execution, resulting in a situation where the actual state does not match that of the model. There may also be situations where the goal changes during strategy execution. In both these examples, the agent is unable to keep executing the computed strategy (which was originally winning) and take appropriate actions to achieve the desired goal. Thus, the problem arises of coming up with a new strategy that guarantees goal achievement.

The original strategy might have been designed to guarantee not only goal achievement, but also a number of additional properties, such as cost minimization, reward maximization, or forbidden-state avoidance, which might yield a significant additional computational efort. Thus, when the unexpected changes are small and yield only a slightly diferent problem wrt the original one, i.e., only few target states are added or removed and state mismatches occur rarely, it is reasonable to seek for a solution obtained as a slight modification of the original one, under the assumption that the new strategy will retain all (or part of) the properties featured by the initial strategy, without needing the computational overhead required to achieve such properties.

This paper investigates this approach from the general perspective of RGs. We introduce a problem, called Strategy Repair, which requires, given a losing strategy 0, to find a minimal amount of modifications which turn 0 into a winning strategy.

We make the following contributions. Firstly, we formally define the problem by introducing a notion of distance between two strategies, which intuitively corresponds to the number of states over which the strategies difer. Then, based on this notion, we devise a solution algorithm and characterize its complexity. Specifically, we prove, by reduction from Vertex Cover, that the decision version of Strategy Repair is NP-complete. We then investigate more eficient, but sub-optimal, alternatives, devising a polynomial greedy algorithm. The full version of this paper [ 1 ] also reports on an experimental analysis, which shows that the polynomial algorithm yields impressive results in terms of running time, scalability and accuracy (measured as distance from the optimal solution).

2. Preliminaries

A 2-player arena, or simply arena is a tuple = ⟨, 0, 1, E⟩, where is the set of nodes, or vertices, with = 0 ∪ 1 and 0 ∩ 1 = ∅, and E ⊆ × is the set of edges of the arena. We say that 0 is the set of nodes controlled by player 0, (0), whereas 1 is the set of nodes controlled by player 1 (1).

Definition 1 (Reachability game). A Reachability game is a pair = ⟨, ⟩, where is an arena, and ⊆ is a subset of nodes, sometimes called target.

A path in the arena is a sequence = 0 · 1 · 2 . . . ∈ such that (, +1) ∈ E for each ∈ N. As usual, by , we denote the -th node occurring in the sequence , whereas by ≤ we denote the prefix of up to node , also called partial path. We say that a path is winning for player 0 if ∈ for some ∈ N, otherwise it is winning for player 1. A strategy for player 0 is a function 0 : * · 0 → E mapping partial paths to edges, such that 0(0 . . . ) is an edge outgoing from , for each partial path in * · 0. A strategy 1 for player 1 is defined accordingly. A path is compatible with strategy 0 if 0( ≤ ) = ( , +1) for each ∈ 0. Analogously, it is compatible with strategy 1 if 1( ≤ ) = ( , +1) for each ∈ 1.

We say that a strategy 0 is winning for player 0 from , if every path starting from and compatible with 0 is winning. We say that a node is winning for player 0 if there exists a strategy 0 winning from . We denote by Win0() and Win1() the sets of nodes in that are winning for player 0 and 1, respectively. Finally, a strategy is said to be simply winning if it is winning from every vertex in Win0(). It is well known that reachability games are memoryless determined [3], that is, every node is either winning for player 0 or winning for player 1 and that there always exists a memoryless winning strategy, i.e., a winning strategy that is defined as 0 : 0 → E mapping each node belonging to an agent to an outgoing edge. Therefore, from now on we restrict our attention to only memoryless strategies. Such restriction allows us to define a very natural distance between two player 0 strategies 0 and 0′ over the same game, that is dist( 0, 0′) = |{ ∈ 0 | 0() ̸= 0′()}| Intuitively, we count the number of nodes on which the two strategies map to a diferent outgoing edge. This can be proved to be an actual distance [ 1 ].

We conclude this section by introducing some useful notation. For a given game and an edge = (1, 2) ∈ E, by we denote the game induced from by removing every edge (1′, 2′) incompatible with , that is, such that 1′ = 1 and 2′ ̸= 2. This can be extended to subsets E′ ⊆ E of edges, where E′ = (E′∖{}) is recursively defined by projecting the edges of E′ one by one. Notice that a (memoryless) strategy 0 can be regarded as a subset of edges, one for each node in 0, therefore 0 denotes the game induced from by removing every edge (, ′) incompatible with 0, that is, such that ∈ 0 and (, ′) ̸= 0(). Note that every vertex of 0 has only one successor in 0 , which means that player 0 has only strategy 0 available in the game.

3. The Strategy Repair Problem

We now introduce the strategy repair problem for reachability games. First, for a given reachability game and a player 0 strategy 0, define Win0(, 0) to be the set of nodes from which 0 is winning. It is not hard to show that Win0(, 0) = Win0( 0 ), that is, the nodes that are winning for player 0 when it is using strategy 0 can be obtained by considering the game 0 where the choices incompatible with 0 have already been ruled out. Observe that it always holds that Win0(, 0) ⊆ Win0(), with Win0(, 0) = Win0() if, and only if, 0 is winning for player 0. We define the strategy repair problem as follows.

Definition 2 (Strategy repair problem). For a given reachability game and a strategy 0, ifnd a winning strategy 0′ such that dist( 0, 0′) ≤ dist( 0, 0′′) for each winning strategy 0′′.

The problem introduced requires to minimize the number of modifications that are required to turn a strategy 0 into a strategy 0′ winning for a given reachability game . The corresponding decision problem, instead, consists in fixing a given threshold ∈ N and checking whether some winning strategy 0′ exists with dist( 0, 0′) ≤ . We now prove that the decision version of the strategy repair problem for reachability games is NP-complete. To do so, we show a reduction from the NP-complete problem vertex cover [4]. Given a vertex cover instance, the idea is to construct a RG with one cycle for each edge in such a way that selecting a vertex onto the cover corresponds to one change in the strategy that breaks all the cycles corresponding to adjacent edges of .

Theorem 1. The strategy repair problem for reachability games is NP-complete [ 1 ].

4. Algorithmic Solutions

We now present two algorithms for Strategy Repair, which we called Opt and Greedy, respectively. The former returns the optimal solution to the problem, but runs in exponential time. The latter, instead, returns a sub-optimal solution but runs in polynomial time. It is important to remark that they both produce correct winning strategies for the game. However, the algorithm Greedy does not provide the best one in terms of distance from the originally specified strategy.

We now proceed with the description of Algorithm Opt. In order to do so, we first introduce some useful definition. For a given game and a set ⊆ of nodes, the Frontier of , denoted Frontier0() = ((0 ∖ ) × ) ∩ E, is the set of edges that are outgoing from a Player 0 node and incoming to a node in . Intuitively, the edges in Frontier0 can be used by Player 0 to enter in a single step the region . Consider a game and a strategy 0, and let = Win0(, 0) be the set of nodes that are winning for strategy 0. Observe that for an edge (, ′) ∈ Frontier0(), it holds that 0() ̸= (, ′), otherwise would have been winning for 0 in the first place. Moreover, it is trivial to show that the strategy 0′ = 0[ ↦→ (, ′)] is such that Win0(, 0) ⊊ Win0(, 0′), with the inclusion being proper because ∈ Win0(, 0′) ∖ Win0(, 0). Algorithm 1: Opt.

We are now ready to present the algorithm Opt, which is reported in Algo- Input: a reachability game, 0 a strategy rithm 1. The algorithm works as follow. for player 0 First, it computes the winning region fol- Output: Winning strategy for lowing 0 denoted Win0(, 0), and com- minimizing the distance from 0 pares it with the winning region of the Fix(, 0) : game Win0(). If the two sets are equal, ′ ← Win0(, 0) it means that 0 is already winning, so it if ′ = Win0() then returns the optimal solution ( 0, 0), with return ( 0, 0) the second component denoting the cost else of fixing. If that is not the case, the al- select (, ′) from Frontier( ′) gorithm proceeds by first computing the ( 0′, ′) ← Fix(, 0[ ↦→ (, ′)]) frontier of Win0(, 0), in order to se- ′ ← 0() lect an edge (, ′) from it, then it com- if ∈ Win0(′) then pares two possible solutions. The first is ( 0′′, ′′) ← Fix(′, 0) obtained by solving the problem where if ′′ < ′ + 1 then the initial strategy is 0[ ↦→ (, ′)], ob- return ( 0′′, ′′) tained from 0 by diverting the choice on end with the frontier edge (, ′). The second end is obtained by solving the problem when return ( 0′, ′ + 1) Player 0 is forced to select edge 0() in . end This is obtained by considering the game ′ = 0(), where all other outgoing edges from are removed. Both solutions are computed with their relative costs ′ and ′′, which are then compared to select the best between the two. Note that the latter solution might not exist, as the choice of 0 in might lead, for instance, out of the winning region. The algorithm then first checks whether such solution is viable before making a useless recursive call on (′, 0). Observe that in the first case the total modification cost ′ must be increased by 1, as the initial strategy 0[ ↦→ (, ′)] is at distance 1 from 0 itself. We have the following.

Theorem 2. The algorithm Opt returns the optimal solution to the Strategy Repair problem.

The algorithm Opt presented in the previous section is of exponential complexity, as it requires two recursive calls at each iteration to compare the distances between the initial strategy and two candidate best solutions. Also, notice that the recursive call that makes use of the selected edge in the frontier always computes a correct solution, although it might not be the optimal one. Therefore, a suboptimal but polynomially computable solution could be found by just selecting the one obtained from such call, disregarding the other.

This is how the algorithm Greedy is conceived. However, in order to improve Algorithm 2: Greedy. the quality of the solution, i.e., the ac- Input: a reachability game, 0 a strategy curacy w.r.t. the optimum, we employ for player 0 a selection criterion for the edge in the Fix(, 0) : frontier set. Indeed, consider an instance ′ ← Win0(, 0) (, 0) of Strategy Repair, and an edge if ′ = Win0() then (, ′) ∈ Frontier0(Win0(, 0)). First, return ( 0, 0) note that 0() ̸= (, ′), otherwise, the end node would be winning for 0 and ← Frontier0( ′) (, ′) would not be in the frontier. There- (, ′) ← argmax{|Repair 0 (, ′)|; fore, consider the set Repair 0 (, ′) = (, ′) ∈ } Win0(, 0[ ↦→ (, ′)]) ∖ Win0(, 0), ( 0′, ′) ← Fix(, 0[ ↦→ (, ′)]) that is, the set of nodes that are indirectly return ( 0′, ′ + 1) repaired by using the frontier edge (, ′) in the solution. Therefore, when selecting the frontier edge, one might decide to greedily maximize the number of nodes that are indirectly repaired by such a selection. This is how the algorithm Greedy works, as it is presented in Algorithm 2.

5. Future Work

This work is an initial investigation into the problem of Strategy Repair and leaves at least two interesting questions open. Firstly, while the polynomial algorithm exhibits outstanding experimental performance, no approximation guarantee was obtained. For future work, we aim at studying such a property. Secondly, it is interesting to go beyond simple reachability and apply the repair approach to other problems. In particular, one immediate extension would be to investigate applicability and efectiveness of the approach for strong cyclic [5] solutions. More in general, the repair approach could be applied to more complex games, such as parity or Büchi games [2, 3], which would have an immediate impact on more complex forms of planning, such as Classical or FOND Planning for temporally extended goals.

Acknowledgments

Patrizi and Perelli were supported by the PNRR MUR project PE0000013-FAIR. Perelli was also supported by the PRIN 2020 projects PINPOINT and by Sapienza University of Rome under the “Progetti Grandi di Ateneo” programme, grant RG123188B3F7414A (ASGARD - Autonomous and Self-Governing Agent-Based Rule Design). Intelligence and Applications, IOS Press, 2023, pp. 780–787. URL: https://doi.org/10.3233/ FAIA230344. doi:10.3233/FAIA230344. [2] E. Grädel, W. Thomas, T. Wilke (Eds.), Automata, Logics, and Infinite Games: A Guide to Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of Lecture Notes in Computer Science, Springer, 2002. URL: https://doi.org/10.1007/3-540-36387-4. doi:10.1007/3-540-36387-4. [3] D. Perrin, J. Pin, Infinite words - automata, semigroups, logic and games, volume 141 of

Pure and applied mathematics series, Elsevier Morgan Kaufmann, 2004. [4] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition,

MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms. [5] M. Daniele, P. Traverso, M. Y. Vardi, Strong cyclic planning revisited, in: S. Biundo, M. Fox (Eds.), Recent Advances in AI Planning, 5th European Conference on Planning, ECP’99, Durham, UK, September 8-10, 1999, Proceedings, volume 1809 of Lecture Notes in Computer Science, Springer, 1999, pp. 35–48. URL: https://doi.org/10.1007/10720246_3. doi:10.1007/10720246\_3.

[1]

Gaillard ,

Patrizi , G. Perelli, Strategy Repair in Reachability Games., in: K. Gal , A.

Nowé , G. J.

Nalepa , R.

Fairstein , R. Radulescu (Eds.), ECAI'23 , volume 372 of Frontiers in Artificial