Towards Strategy Repair for Adjustable Autonomy Pierre Gaillard1 , Fabio Patrizi2 and Giuseppe Perelli2 1 ENS Paris-Saclay, University Paris-Saclay 2 Sapienza University of Rome Abstract Strategy Repair is the problem of finding a minimal amount of modifications to turn a strategy for a game on graphs from losing into winning. This allows for minimally changing the programmed behavior of an agent, in response to unexpected changes or additional requirements one might need to fulfill at execution time. In this paper, we report on an extension of previous work concerning Strategy Repair for Reachability Games, and discuss the usefulness of Strategy Repair in addressing some issues related to Adjustable Autonomy, i.e., the idea that agents, in order to be autonomous, might also need to stop execution and ask for external (typically human) intervention. Keywords Strategic Reasoning, Formal Games, 1. Introduction Strategy Repair is the problem of finding a minimal amount of modifications to turn a strategy for a game from losing into winning. This problem has been recently studied [1] in the setting of Reachability Games [2], i.e., finite-state games played by two players, where one, Player 0, aims at reaching a state from a target set, no matter how the other, Player 1, behaves. To fulfill its purpose, Player 0 needs a strategy, i.e., a function telling the player which move to perform next, in order to eventually achieve a target state. Synthesizing a winning strategy (for Player 0) in reachability games is a very well-studied problem in several areas, including Formal Synthesis [3, 4, 5, 6] and Fully Observable Nondeterministic (FOND) Planning [7], with the decision version known to be P-complete wrt the game size [2, 8, 9, 10]. Synthesized strategies may become invalid at runtime, as the result of, e.g., deviations from the model, goal changes consequent to unexpected situations, or the detection of a critical situation, where, e.g., safety or ethical aspects are crucial, possibly calling for human assistance. In these situations, a new strategy needs to be defined, which guarantees goal achievement, while possibly fulfilling a number of additional (soft) requirements possibly emerged at execution time and not explicitly modeled, which might require a significant effort to be satisfied, or not be automatically addressable at all. This relates to the notion of Adjustable Autonomy (see, e.g., [11]), i.e., the idea that autonomous agents sometime need to be assisted, typically by humans, in carrying out their tasks. In other words, the agent’s autonomy might need to be adjusted at execution time, in order to obtain the desired behavior, according not only to the primary goal the agent was designed for, such as reaching a desired position, but also to other requirements involving aspects that are difficult to model or guarantee, such as never harm people, act fairly, etc. In general, the human might intervene directly by taking control of the agent at execution time or indirectly, by revising the strategy under execution (in a way that the agent could not do) and then, once done, let the agent execute it. However, in certain conditions, modifying a strategy (either at synthesis or at execution time) to a large extent may negatively affect the quality of the delivered solution, for a number of reasons. For instance, the agent might have been designed or optimized for the original strategy (think, e.g., of when the agent is equipped with specific hardware tailored on the actions prescribed by the original strategy), and deviating from it might require an additional overhead AAPEI ’24: 1st International Workshop on Adjustable Autonomy and Physical Embodied Intelligence, October 20, 2024, Santiago de Compostela, Spain.  0009-0009-7054-6360 (P. Gaillard); 0000-0002-9116-251X (F. Patrizi); 0000-0002-8687-6323 (G. Perelli) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings at execution time in order to fulfill the objectives; or there might be communication inefficiencies due to low bandwidth or noisy channel, which prevent efficient communication of the revised strategy from the human to the agent. Obviously, many more situations exist where a strategy change may have a negative impact on the actual solution. This work makes three contributions. Firstly, it presents a motivating scenario for Strategy Repair in the context of Adjustable Autonomy, for an environment where communication is critical and the amount of transmitted data should be minimized. Second, it reports a sound and complete solution approach for Strategy Repair in Reachability Games, together with a greedy (polynomial) approach which, while sub-optimal, returns solutions that much similar to the original strategy than those obtained from scratch. This second contribution appeared in [1]. Finally, we present preliminary results concerning the extension of the problem to Büchi Games, i.e., a more general setting, where the agent’s goal consists in guaranteeing that some states from a target set can be visited infinitely often. 2. Preliminaries A 2-player reachability game, or simply game is a tuple 𝒢 = ⟨𝑉, 𝑉0 , 𝑉1 , E, 𝒯 ⟩, where 𝑉 is the set of nodes, or vertices, with 𝑉 = 𝑉0 ∪ 𝑉1 and 𝑉0 ∩ 𝑉1 = ∅, E ⊆ 𝑉 × 𝑉 is the set of edges, and 𝒯 ⊆ 𝑉 is a subset of nodes, sometimes called target. We say that 𝑉0 is the set of nodes controlled by player 0 (𝑃0 ), whereas 𝑉1 is the set of nodes controlled by player 1 (𝑃1 ). A structure 𝒜 = ⟨𝑉, 𝑉0 , 𝑉1 , E⟩ satisfying these properties is called an arena. A path in the game is a sequence 𝜋 = 𝑣0 · 𝑣1 · 𝑣2 . . . ∈ 𝑉 𝜔 such that (𝑣𝑖 , 𝑣𝑖+1 ) ∈ E for each 𝑖 ∈ N. As usual, by 𝜋𝑖 , we denote the 𝑖-th node occurring in the sequence 𝜋, whereas by 𝜋≤𝑖 we denote the prefix of 𝜋 up to node 𝜋𝑖 , also called partial path. We say that a path 𝜋 is winning for player 0 if 𝜋𝑖 ∈ 𝒯 for some 𝑖 ∈ N, otherwise it is winning for player 1. A strategy for player 0 is a function 𝜎0 : 𝑉 * · 𝑉0 → E mapping partial paths to edges, such that 𝜎0 (𝑣0 . . . 𝑣𝑛 ) is an edge outgoing from 𝑣𝑛 , for each partial path in 𝑉 * · 𝑉0 . A strategy 𝜎1 for player 1 is defined accordingly. A path 𝜋 is compatible with strategy 𝜎0 if 𝜎0 (𝜋≤𝑖 ) = (𝜋𝑖 , 𝜋𝑖+1 ) for each 𝜋𝑖 ∈ 𝑉0 . Analogously, it is compatible with strategy 𝜎1 if 𝜎1 (𝜋≤𝑖 ) = (𝜋𝑖 , 𝜋𝑖+1 ) for each 𝜋𝑖 ∈ 𝑉1 . We say that a strategy 𝜎0 is winning for player 0 from 𝑣, if every path 𝜋 starting from 𝑣 and compatible with 𝜎0 is winning. We say that a node 𝑣 is winning for player 0 if there exists a strategy 𝜎0 winning from 𝑣. We denote by Win0 (𝒢) and Win1 (𝒢) the sets of nodes in 𝒢 that are winning for player 0 and 1, respectively. Finally, a strategy is said to be simply winning if it is winning from every vertex in Win0 (𝒢). It is well known that reachability games are memoryless determined [12], that is, every node 𝑣 is either winning for player 0 or winning for player 1 and that there always exists a memoryless winning strategy. Therefore, from now on we restrict our attention to only memoryless strategies, that are defined as 𝜎0 : 𝑉0 → E mapping each node belonging to an agent to an outgoing edge. Such restriction allows us to define a very natural distance between two player 0 strategies 𝜎0 and 𝜎0′ over the same game, that is dist(𝜎0 , 𝜎0′ ) = |{𝑣 ∈ 𝑉0 | 𝜎0 (𝑣) ̸= 𝜎0′ (𝑣)}|. Intuitively, the distance between two strategies counts the number of nodes on which they map to a different outgoing edge [1]. We conclude this section by introducing some useful notation. For a given subset E′ ⊆ E of edges, by 𝒢E′ we denote the game where we remove every edge (𝑣1′ , 𝑣2′ ) incompatible with E′ , that is, such that 𝑣1′ = 𝑣1 and 𝑣2′ ̸= 𝑣2 , for some (𝑣1 , 𝑣2 ) ∈ E′ . Notice that a (memoryless) strategy 𝜎0 can be regarded as a subset of edges, one for each node in 𝑉0 , therefore 𝒢𝜎0 denotes the game induced from 𝒢 by removing every edge (𝑣, 𝑣 ′ ) incompatible with 𝜎0 , that is, such that 𝑣 ∈ 𝑉0 and (𝑣, 𝑣 ′ ) ̸= 𝜎0 (𝑣). Note that every vertex of 𝑉0 has only one successor in 𝒢𝜎0 , which means that player 0 has only strategy 𝜎0 available in the game. 3. The Strategy Repair Problem We now introduce the strategy repair problem for reachability games. First, for a given reachability game 𝒢 and a player 0 strategy 𝜎0 , define Win0 (𝒢, 𝜎0 ) to be the set of nodes from which 𝜎0 is winning. It is not hard to show that Win0 (𝒢, 𝜎0 ) = Win0 (𝒢𝜎0 ), that is, the nodes that are winning for player 0 when it is using strategy 𝜎0 can be obtained by considering the game 𝒢𝜎0 where the choices incompatible with 𝜎0 have already been ruled out. Observe that it always holds that Win0 (𝒢, 𝜎0 ) ⊆ Win0 (𝒢), with Win0 (𝒢, 𝜎0 ) = Win0 (𝒢) if, and only if, 𝜎0 is winning for player 0. We define the strategy repair problem as follows. Definition 1 (Strategy repair problem). For a given reachability game 𝒢 and a strategy 𝜎0 , find a winning strategy 𝜎0′ such that dist(𝜎0 , 𝜎0′ ) ≤ dist(𝜎0 , 𝜎0′′ ) for each winning strategy 𝜎0′′ . The problem introduced requires to minimize the number of modifications that are required to turn a strategy 𝜎0 into a strategy 𝜎0′ winning for a given reachability game 𝒢. The corresponding decision problem, instead, consists in fixing a given threshold 𝑘 ∈ N and checking whether some winning strategy 𝜎0′ exists with dist(𝜎0 , 𝜎0′ ) ≤ 𝑘. In [1] we prove that the strategy repair problem for reachability games is NP-complete via a reduction from vertex cover [10]. Theorem 1. The strategy repair problem for reachability games is NP-complete [1]. 4. Motivating Scenario Consider a robot operating in a vast and distant environment with the objective of rescuing humans or finding a precious resource. The robot is autonomous but constantly connected to a control center which supervises the operations, collects data, and makes decisions at any time. Unfortunately, due to the nature of the environment the robot operates in, the communication channel is very noisy and requires a huge number of re-transmissions before data is actually received at the other end. As a result, in order to guarantee communication effectiveness and reliability, the amount of data to be transmitted, in both directions, must be minimized. The robot needs thus to be as much autonomous as possible. The precise location of the objective (hu- Search area man or resource) is unknown, but the control Area 1 center has identified a candidate search area Area 2 Objective and devised a respective search strategy that ategy modified str has been loaded onto the robot for execution, Commu nicatio n of ch when at the control center, before starting the anges Area 3 Control center operation. ateg y initial str To the robot, the environment features many sources of uncertainty. For instance, Figure 1: Robot Rescue some actions, such as move forward, might be imprecise and make the robot deviate from its nominal path, the robot might encounter obstacles along its path, and so on. The scenario can be modelled as a two-player Reachability Game where the target set corresponds to the search area and the opponent resolves the uncertainty about robot’s moves. That is, the opponent controls the occurrence of the events over which the robot has uncertainty, such as, when moving forward yields drifting, or when an obstacle is present along the path. Assume that the strategy guaranteeing reaching the search area, loaded onto the robot in the control center, is currently under execution and the robot is on the field. While the search progresses, data is sent to the control center, which collects and analyzes it, possibly revising the current target areas. For instance, after a while, the control center might realize that the initial area should be abandoned in favor of another, or that the robot is close to a human to rescue and thus particular care should be put in interacting with the person. In both these cases, the strategy followed by the robot needs to be adjusted. In theory, the control center could re-compute the new strategy and communicate it to the robot. Yet, the communication channel might prevent an entire strategy from being transmitted due to noise and losses. It becomes, thus, of crucial importance, reducing the amount of information to be sent to the robot, which can be done by computing a new strategy which achieves the (new) desired objective, while remaining as similar as possible to the current one. In this case, indeed, only the differences wrt current strategy need to be communicated. Figure 1 depicts this situation, where the robot initially follows the original strategy, which leads to area 3. However, when the control center realizes that the objective is located in a different place, it must adapt the strategy to reach the new area consisting of areas 1 and 2. To this aim, in order to minimize communication, the control center can repair the strategy, and then send only the adjustments to the robot. If time-efficiency is required, instead of computing an optimal solution, the control center might compute a sub-optimal with a greedy algorithm, as discussed later on in the work. 5. Algorithmic Solutions We now present two algorithms for Strategy Repair, which we called Opt and Greedy, respectively. The former returns the optimal solution to the problem, but runs in exponential time. The latter, instead, returns a sub-optimal solution but runs in polynomial time. It is important to remark that they both produce correct winning strategies for the game. However, the algorithm Greedy does not provide the most optimal one in terms of distance from the original strategy. We now proceed with the description of Algo- Algorithm 1: Opt. rithm Opt. In order to do so, we first introduce Input: 𝒢 a reachability game, 𝜎0 a some useful definition. For a given game 𝒢 and a strategy for player 0 set 𝑋 ⊆ 𝑉 of nodes, the Frontier of 𝑋, denoted Output: Winning strategy minimizing Frontier0 (𝑋) = ((𝑉0 ∖ 𝑋) × 𝑋) ∩ E, is the set of the distance from 𝜎0 edges that are outgoing from a Player 0 node and Fix(𝒢, 𝜎 ) 0 : 𝑇 ′ ← Win0 (𝒢, 𝜎0 ) incoming to a node in 𝑋. Intuitively, the edges in if 𝑇 ′ = Win0 (𝒢) then Frontier0 can be used by Player 0 to enter in a single return (𝜎0 , 0) step the region 𝑋. Consider a game 𝒢 and a strategy else select (𝑣, 𝑣 ′ ) from Frontier(𝑇 ′ ) 𝜎0 , and let 𝑋 = Win0 (𝒢, 𝜎0 ) be the set of nodes (𝜎0′ , 𝛽 ′ ) ← Fix(𝒢, 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 ′ )]) that are winning for strategy 𝜎0 . Observe that for an 𝒢 ′ ← 𝒢𝜎0 (𝑣) edge (𝑣, 𝑣 ′ ) ∈ Frontier0 (𝑋), it holds that 𝜎0 (𝑣) ̸= if 𝑣 ∈ Win0 (𝒢 ′ ) then ′ (𝑣, 𝑣 ), otherwise 𝑣 would have been winning for (𝜎0′′ , 𝛽 ′′ ) ← Fix(𝒢 ′ , 𝜎0 ) if 𝛽 ′′ < 𝛽 ′ + 1 then 𝜎0 in the first place. Moreover, it is trivial to show return (𝜎0′′ , 𝛽 ′′ ) that the strategy 𝜎0′ = 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 ′ )] is such that end Win0 (𝒢, 𝜎0 ) ⊊ Win0 (𝒢, 𝜎0′ ), with the inclusion be- end ing proper because 𝑣 ∈ Win0 (𝒢, 𝜎0′ )∖Win0 (𝒢, 𝜎0 ). return (𝜎0′ , 𝛽 ′ + 1) end We are now ready to present the algorithm Opt, which is reported in Algorithm 1. The algorithm works as follow. First, it computes the winning region following 𝜎0 𝑇 ′ = Win0 (𝒢, 𝜎0 ) and compares it with the winning region of the game Win0 (𝒢). If the two sets are equal, it means that 𝜎0 is already winning, so it returns the optimal solution (𝜎0 , 0), with the second component denoting the cost of fixing. If that is not the case, the algorithm proceeds by first computing the frontier of Win0 (𝒢, 𝜎0 ), in order to select an edge (𝑣, 𝑣 ′ ) from it, then it compares two possible solutions. The first is obtained by solving the problem where the initial strategy is 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 ′ )], obtained from 𝜎0 by diverting the choice on 𝑣 with the frontier edge (𝑣, 𝑣 ′ ). The second is obtained by solving the problem when Player 0 is forced to select edge 𝜎0 (𝑣) in 𝑣. This is obtained by considering the game 𝒢 ′ = 𝒢𝜎0 (𝑣) , where all other outgoing edges from 𝑣 are removed. Both solutions are computed with their relative costs 𝛽 ′ and 𝛽 ′′ , which are then compared to select the best between the two. Note that the latter solution might not exist, as the choice of 𝜎0 in 𝑣 might lead, for instance, out of the winning region. The algorithm then first checks whether such solution is viable before making a useless recursive call on (𝒢 ′ , 𝜎0 ). Observe that in the first case the total modification cost 𝛽 ′ must be increased by 1, as the initial strategy 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 ′ )] is at distance 1 from 𝜎0 itself. We have the following. Theorem 2. The algorithm Opt returns the optimal solution to the Strategy Repair problem. The algorithm Opt presented in the previous section is of exponential complexity, as it requires two recursive calls at each iteration to compare the distances between the initial strategy and two candidate best solutions. Also, notice that the recursive call that makes use of the selected edge in the frontier always computes a correct solution, although it might not be the optimal one. Therefore, a suboptimal but polynomially computable solution could be found by just selecting the one obtained from such call, disregarding the other. This is how the algorithm Greedy is conceived. However, in order to improve the quality of Algorithm 2: Greedy. the solution, i.e., the accuracy w.r.t. the opti- Input: 𝒢 a reachability game, 𝜎0 a mum, we employ a selection criterion for the strategy for player 0 edge in the frontier set. Indeed, consider an in- Fix(𝒢, 𝜎0 ) : stance (𝒢, 𝜎0 ) of Strategy Repair, and an edge 𝑇 ′ ← Win0 (𝒢, 𝜎0 ) ′ (𝑣, 𝑣 ) ∈ Frontier0 (Win0 (𝒢, 𝜎0 )). First, note that if 𝑇 ′ = Win0 (𝒢) then return (𝜎0 , 0) 𝜎0 (𝑣) ̸= (𝑣, 𝑣 ′ ), otherwise, the node 𝑣 would be end winning for 𝜎0 and (𝑣, 𝑣 ′ ) would not be in the fron- 𝐹 ← Frontier0 (𝑇 ′ ) ′ tier. Therefore, consider the set Repair𝜎0 (𝑣, 𝑣 ) = (𝑣, 𝑣 ′ ) ← argmax{|Repair𝜎0 (𝑣, 𝑣 ′ )|; ′ Win0 (𝒢, 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 )])∖Win0 (𝒢, 𝜎0 ), that is, the (𝑣, 𝑣 ′ ) ∈ 𝐹 } (𝜎0′ , 𝛽 ′ ) ← Fix(𝒢, 𝜎0 [𝑣 ↦→ (𝑣, 𝑣 ′ )]) set of nodes that are indirectly repaired by using the return (𝜎0′ , 𝛽 ′ + 1) frontier edge (𝑣, 𝑣 ′ ) in the solution. When selecting the frontier edge, one might decide to greedily maximize the number of nodes that are indirectly repaired by such a selection. This is how the algorithm Greedy works, as it is presented in Algorithm 2. 6. Strategy Repair for Büchi Games We now study the strategy repair problem on Büchi games. A Büchi game is defined by 𝒢 = ⟨𝒜, 𝒯 ⟩ where 𝒜 = ⟨𝑉, 𝑉0 , 𝑉1 , E⟩ is an arena and 𝒯 ⊆ 𝑉 is a target set. In a Büchi game, a path 𝜋 is winning for 𝑃0 when for all 𝑗, there exists 𝑖 ⩾ 𝑗 such that 𝜋𝑖 ∈ 𝒯 , i.e. when the target is reached infinitely many times. The notion of winning strategy and winning region can be defined similarly as for the reachability games but with the Büchi winning condition. From the complexity analysis of RG, it immediately follows that the lower-bound for strategy repair in Büchi games is NP. In addition, notice that a guess-and-check approach for solving strategy repair with Büchi objectives still works. This is because also Büchi games can be solved in polynomial time. This provides us with the following result. Theorem 3. The strategy repair problem for Büchi games is NP-complete. Now, we turn to solving the problem in practice. To do so, we introduce the operator Pre0 defined by Pre0 (𝑋) = {𝑢 ∈ 𝑉0 | ∃(𝑢, 𝑣) ∈ E, 𝑣 ∈ 𝑋} ∪ {𝑢 ∈ 𝑉1 | ∀(𝑢, 𝑣) ∈ E, 𝑣 ∈ 𝑋} for a given set of vertices 𝑋 ⊆ 𝑉 , which corresponds to the set of vertices from which 𝑃0 can ensure to reach 𝑋 at the next step. The algorithm for strategy repair will use the following key lemma inspired from [13] : Lemma 1. Let 𝒢 = ⟨𝒜, 𝒯 ⟩ be a Büchi game and 𝑊 be its winning region. Then, 𝑊 is equal to the winning region of the reachability game⟨𝒜, 𝒯 ′ ⟩ where 𝒯 ′ = 𝒯 ∩ Pre0 (𝑊 ). Intuitively, to reach 𝒯 infinitely many times from 𝑊 , 𝑃0 must ensure to reach the part of the target from which it is possible to remain in 𝑊 at the next step. As a consequence, if 𝒯 ′ stands for 𝒯 ∩ Pre0 (𝑊 ), note that any strategy ensuring player 0 to reach 𝒯 from 𝑊 ∖ 𝒯 ′ and going to 𝑊 from 𝒯 ′ (which is possible since 𝒯 ′ ⊆ Pre0 (𝑊 )) is winning since any ′ induced play would be of the form ((𝑊 ∖ 𝒯 ′ )* .𝒯 ′ )𝜔 with 𝒯 ′ ⊆ 𝒯 , and computing such strategies is equivalent to solving a reachability game where the target is 𝒯 ∩ Pre0 (𝑊 ). This induces the algorithm 3, which uses the algorithm of strategy repair for reachability games. Algorithm 3: Algorithm solving the strategy repair problem for Büchi games Input: 𝒢 a Büchi game, 𝜎0 a strategy for player 0 Output: Winning strategy for 𝒢 minimizing the distance from 𝜎0 Fix_Buchi(𝒢, 𝜎0 ) : if 𝜎0 winning in 𝒢 then return 𝜎0 else 𝑊 ← Win0 (𝒢) 𝒯 ′ ← 𝒯 ∩ 𝑃 𝑟𝑒0 (𝑊 ) 𝜎0′ ← Fix_Reachability(⟨𝒜, 𝒯 ′ ⟩, 𝜎0 ) // minimal modifications to reach 𝒯 ′ for 𝑣 ∈ 𝑉0 ∩ 𝒯 ′ do if 𝜎0 (𝑣) ∈/ 𝑊 then find (𝑣0 , 𝑣 ′ ) ∈ 𝐸 such that 𝑣 ′ ∈ 𝑊 // fixing the strategy in 𝒯 ′ 𝜎0′ (𝑣) ← (𝑣, 𝑣 ′ ) end end return 𝜎0′ end Note that both algorithms Opt and Greedy can be used for finding the minimal modifications to reach 𝒯 ′ , producing an exact solution in exponential time, or a suboptimal solution in polynomial time. As before with reachability games, note that in the second case, the strategy obtained is winning even if it does not minimize the distance. 7. Future Work This work is an initial investigation into the problem of Strategy Repair and leaves at least two interesting questions open. Firstly, while the polynomial algorithm exhibits outstanding experimental performance, no approximation guarantee was obtained. For future work, we aim to study such a property. Secondly, it is interesting to go beyond simple reachability and Büchi to apply the repair approach to other problems. In particular, one immediate extension would be to investigate the applicability and effectiveness of the approach for strong cyclic [7] solutions. Acknowledgments The authors are supported by the PNRR MUR project PE0000013-FAIR, the Italian Ministry of University and Research (MUR) under PRIN grant B87G22000450001 (PINPOINT) and by Sapienza University of Rome under the Progetti Grandi di Ateneo programme, grant RG123188B3F7414A (ASGARD - Autonomous and Self-Governing Agent-Based Rule Design) and the Sapienza project MARLeN (Multi- layer Abstraction for Reinforcement Learning with Non-Markovian Rewards). Declaration on Generative AI The author(s) have not employed any Generative AI tools. References [1] P. Gaillard, F. Patrizi, G. Perelli, Strategy Repair in Reachability Games., in: K. Gal, A. Nowé, G. J. Nalepa, R. Fairstein, R. Radulescu (Eds.), ECAI’23, volume 372 of Frontiers in Artificial Intelligence and Applications, IOS Press, 2023, pp. 780–787. URL: https://doi.org/10.3233/FAIA230344. doi:10. 3233/FAIA230344. [2] E. Grädel, W. Thomas, T. Wilke (Eds.), Automata, Logics, and Infinite Games: A Guide to Cur- rent Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of Lecture Notes in Computer Science, Springer, 2002. URL: https://doi.org/10.1007/3-540-36387-4. doi:10.1007/ 3-540-36387-4. [3] G. De Giacomo, M. Vardi, Synthesis for LTL and LDL on finite traces, in: Q. Yang, M. J. Wooldridge (Eds.), Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, AAAI Press, 2015, pp. 1558–1564. URL: http://ijcai.org/Abstract/15/223. [4] A. Camacho, C. Muise, J. Baier, S. McIlraith, LTL realizability via safety and reachability games, in: J. Lang (Ed.), Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, ijcai.org, 2018, pp. 4683–4691. URL: https://doi.org/10.24963/ijcai.2018/651. doi:10.24963/ijcai.2018/651. [5] A. Camacho, J. Baier, C. Muise, S. A. McIlraith, Finite LTL synthesis as planning, in: M. de Weerdt, S. Koenig, G. Röger, M. T. J. Spaan (Eds.), Proceedings of the Twenty-Eighth International Confer- ence on Automated Planning and Scheduling, ICAPS 2018, Delft, The Netherlands, June 24-29, 2018, AAAI Press, 2018, pp. 29–38. URL: https://aaai.org/ocs/index.php/ICAPS/ICAPS18/paper/ view/17790. [6] G. De Giacomo, M. Favorito, J. Li, M. Vardi, S. Xiao, S. Zhu, Ltlf synthesis as AND-OR graph search: Knowledge compilation at work, in: L. D. Raedt (Ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, ijcai.org, 2022, pp. 2591–2598. URL: https://doi.org/10.24963/ijcai.2022/359. doi:10.24963/ijcai.2022/ 359. [7] M. Daniele, P. Traverso, M. Y. Vardi, Strong cyclic planning revisited, in: S. Biundo, M. Fox (Eds.), Recent Advances in AI Planning, 5th European Conference on Planning, ECP’99, Durham, UK, September 8-10, 1999, Proceedings, volume 1809 of Lecture Notes in Computer Science, Springer, 1999, pp. 35–48. URL: https://doi.org/10.1007/10720246_3. doi:10.1007/10720246\_3. [8] C. H. Papadimitriou, Computational complexity, Academic Internet Publ., 2007. [9] S. Dasgupta, C. H. Papadimitriou, U. V. Vazirani, Algorithms, McGraw-Hill, 2008. [10] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition, MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms. [11] Proc. of the the AAAI Spring Symposium on Agents with Adjustable Autonomy, AAAI, 1999. URL: https://aaai.org/proceeding/spring-1999-06/. [12] D. Perrin, J. Pin, Infinite words - automata, semigroups, logic and games, volume 141 of Pure and applied mathematics series, Elsevier Morgan Kaufmann, 2004. [13] F. Horn, N. Fijalkow, Büchi games, in: N. Fijalkow (Ed.), Games on Graphs, Online, 2023.