Additional Winning Strategies in Two-Player Games? Vadim Malvone and Aniello Murano Università degli Studi di Napoli Federico II Abstract. We study the problem of checking whether a two-player reach- ability game admits more than a winning strategy. We investigate this in case of perfect and imperfect information, and, by means of an automata approach we provide a linear-time procedure and an exponential-time procedure, respectively. In both cases, the results are tight. 1 Introduction Game theory is a powerful mathematical framework largely exploited in computer science and AI [1,9,16]. In the basic setting, we refer to two-player turn-based games where the players, conventionally named P layer0 and P layer1 , play for a finite number of rounds. The states of the arena are partitioned among the players and each player can move only from the states he owns. Solving such a game amounts to check whether P layer0 has a winning strategy, i.e., a sequence of moves that allows him to achieve his goal, no matter how his opponent plays. In several game settings having a more precise (quantitative) information about how many winning strategies a player has at his disposal turns out to be crucial. For example, in Nash Equilibrium, such an information amounts to solve the challenging question of checking whether the equilibrium is unique [2]. In this paper, we address the quantitative question of checking whether P layer0 has more than a strategy to win a finite two-player turn-based game, under the reachability condition. We consider both the cases that the players have perfect or imperfect information about the moves performed by the opponent. For the solution we use an automata-theoretic approach. Precisely, we build an automaton that accepts all tree witnesses for more than one winning strategy for P layer0 . Hence, we reduce the addressed quantitative question to the emptiness of this automaton. In the perfect information setting, this results in a linear-time upper bound complexity. In the imperfect information setting, instead, it results in an exponential-time solution. In both cases, the results are tight. ? This work is a communication based on the works [12] and [11]. Copyright © by the paper’s authors. Copying permitted for private and academic pur- poses. V. Biló, A. Caruso (Eds.): ICTCS 2016, Proceedings of the 17th Italian Conference on Theoretical Computer Science, 73100 Lecce, Italy, September 7–9 2016, pp. 251–256 published in CEUR Workshop Proceedins Vol-1720 at http://ceur-ws.org/Vol-1720 252 Vadim Malvone and Aniello Murano Related works. Counting strategies has been deeply exploited in reactive sys- tems formal verification by means of specification logics extended with graded modalities, interpreted over games of infinite duration [2,6,10]. These works suit- ably extend preliminary reasonings in closed system verification [3,4,7]. 2 The Game Model In this section, we present our model and its semantics. Definition 1. A turn-based two-player reachability game with imperfect infor- mation ( 2TRGI), P layer0 vs P layer1 , is a tuple G = < St, sI , Ac, tr, W, ∼ =>, where St = St0 ∪ St1 is a finite non-empty set of states, with Sti set of states of P layeri , sI ∈ St is an initial state, Ac = Ac0 ∪ Ac1 is the set of actions, W is a set of target states, tr : Sti × Aci → St1−i , for i ∈ {0, 1} is a transition function mapping a state of a player and its action to a state belonging to the other player, and ∼==∼ =0 ∪ ∼=1 is a set of equivalence relations on Ac. Let i ∈ {0, 1} and a, a0 ∈ Aci be two actions. If a ∼ =1−i a0 we say that a and 0 a are indistinguishable to P layer1−i . By [Aci ] ⊆ Aci we denote the subset of actions that are distinguishable/visible for P layer1−i . In particular, for each set of equivalent actions over Aci , we put a representative action in [Aci ]. From any s ∈ Sti , if a ∼ =1−i a0 then also the reached states are indistinguishable, ie tr(s, a) = s and tr(s, a0 ) = s00 are indistinguishable for P layer1−i . A relation ∼ 0 =i is an identity if a ∼ =i a0 iff a = a0 . A 2TRGI has perfect information (a 2TRG, for short) if ∼= contains only identities. To give the semantics of 2TRGIs, we introduce the concepts of track, strategy and play. A track is a finite sequence of states ρ ∈ St∗ such that, for all i < |ρ|, if (ρ)i ∈ St0 then there exists an action a0 ∈ Ac0 such that (ρ)i+1 = tr((ρ)i , a0 ), else there exists an action a1 ∈ Ac1 such that (ρ)i+1 = tr((ρ)i , a1 ), where (ρ)i denotes the i-st element of ρ. By last(ρ) we denote the last element of ρ and by ρ≤i the prefix track (ρ)0 . . . (ρ)i . By Trk ⊆ St∗ we denote the set of tracks over St and by Trki the set of tracks ρ in which last(ρ) ∈ Sti . For simplicity, we assume that all tracks in Trk start at the initial state sI ∈ St. A strategy for P layeri is a function σi : Trki → Aci that maps a track to an action. A strategy is uniform if it adheres on the visibility (visible actions) of the players. In the rest of the paper we only refer to uniform strategies. The composition of strategies, one for each player, induces a computation called play. Precisely, assume P layer0 and P layer1 take strategies σ0 and σ1 , respectively. Their composition induces a play ρ such that (ρ)0 = sI and for each i ≥ 0 if (ρ)i ∈ St0 then (ρ)i+1 = tr((ρ)i , σ0 (ρ≤i )), else (ρ)i+1 = tr((ρ)i , σ1 (ρ≤i )). We can now give the definition of reachability winning condition. Definition 2. P layer0 wins a 2TRGI, under the reachability condition, if he has a strategy that for all strategies of P layer1 the resulting induced play will enter a state in W. Such a strategy is said winning for P layer0 . Additional Winning Strategies in Two-Player Games 253 Given a 2TRGI and one of its play ρ, it is possible to check who is the winner in ρ by considering only (ρ)0 , . . . , (ρ)|St|+1 . In fact, if (ρ)i ∈ W for some i ∈ {0, . . . , |St| + 1} then P layer0 wins the play ρ, otherwise there exists a loop in which P layer1 can stay infinitely, and then he wins the game. We formalize the winning condition by means of a tree structure that we call schema strategy tree. To proper introduce it, we provide the concept of decision tree. We start with some basic notion about trees. Let Υ be a set. An Υ -tree is a prefix closed subset T ⊆ Υ ∗ . The elements of T are called nodes and the empty word ε is the root of T . Given a node v = y · x, with y ∈ Υ ∗ and x ∈ Υ , we define prf (v) to be y and last(v) to be x. For an alphabet Σ, a Σ-labeled Υ -tree is a pair < T, V > where T is an Υ −tree and V : T → Σ maps each node of T to a symbol in Σ. A decision tree is the unwinding of the game structure along with all possible combinations of player actions, ie a tree that collects all tracks over the game. A winning strategy can be seen as an opportune mapping, over the decision tree, of a player’s "strategy schema" built over the visibility. In other words, the player first makes a decision over a set S of indistinguishable states and then this choice is used in the decision tree for each state in S. This makes the decision tree uniform. However, observe that we use synchronous memoryfull strategies. This means that in a decision tree, the set S of indistinguishable states resides at the same level. To make this idea more precise, we now formalize the concept of schema strategy tree and uniform strategy tree. Definition 3. Given a 2TRGI and a uniform strategy σ for P layeri , a schema strategy tree for P layeri is a {>, ⊥}-labeled (Aci ∪ [Ac1−i ])-tree < T, V >, with T ⊂ (Aci ∪ [Ac1−i ])∗ and V as follows: (i) V (ε) = >; (ii) for all v ∈ T , if last(v) ∈ [Ac1−i ] then V (v) = >, else let ρ = (ρ)0 . . . (ρ)|v|−1 be a track from sI , with (ρ)k = tr((ρ)k−1 , last(v≤k )) for each 0 ≤ k ≤ |v| − 1, if last(v) = σ(ρ) then V (v) = >, else V (v) = ⊥. In a schema strategy tree the > label indicates that P layeri selects the corresponding set of visible states in the decision tree and ⊥ is used conversely. In particular, the starting node of the game is the root of the schema strategy tree and it is always enabled (condition (i)); all nodes belonging to the adversarial player are always enabled; and one of the successors of P layeri nodes is enabled in accordance with the uniform strategy σ (condition (ii)). Simply, a uniform strategy tree is a projection of the decision tree along the schema strategy tree. 3 Additional Winning Strategies for 2TRGI In this section we provide the main result of this work. Technically, we make use of alternating tree automata [14]. An alternating tree automaton (ATA) is a tuple A =< Σ, D, Q, q0 , δ, F >, where Σ is the alphabet, D a finite set of directions, Q the set of states, q0 ∈ Q the initial state, δ : Q×Σ → B + (D×Q) the transition function, where B + (D×Q) is the set of all positive Boolean combinations of pairs (d, q) with d direction and 254 Vadim Malvone and Aniello Murano q state, and F ⊆ Q the set of the accepting states. An ATA recognizes (finite) trees by means of runs. For a Σ-labeled tree < T, V >, with T = D∗ , a run is a (D∗ × Q)-labeled N-tree < Tr , r > such that the root is labeled with (ε, q0 ) and the labels of each node and its successors satisfy the transition relation. A run is accepting if all its leaves are labeled with accepting states. An input tree is accepted if there exists a corresponding accepting run. By L(A) we denote the set of trees accepted by A. We say that A is not empty if L(A) 6= ∅. For our purpose, we formalize the concept of schema additional strategy tree. Definition 4. Given a 2TRGI and two uniform strategies σ and σ 0 for P layeri , a schema additional strategy tree for P layeri is a {>, ⊥}-labeled (Aci ∪ [Ac1−i ])- tree < T, V >, with T ⊂ (Aci ∪ [Ac1−i ])∗ and V as follows: (i) V (ε) = >; (ii) for all v ∈ T , if last(v) ∈ [Ac1−i ] then V (v) = >, else let ρ = (ρ)0 . . . (ρ)|v|−1 be a track from sI , with (ρ)k = tr((ρ)k−1 , last(v≤k )) for each 0 ≤ k ≤ |v| − 1, if last(v) = σ(ρ) or last(v) = σ 0 (ρ) then V (v) = >, else V (v) = ⊥. Now, we have all ingredients to give the following result. Theorem 1. Given a 2TRGI the problem of deciding whether P layer0 has more than a uniform winning strategy is ExpTime-Complete. Proof. For the lower bound, we recall that 2-player turn-based games with imper- fect information is ExpTimeH [13]. For the upper bound, we use an automata ap- proach. Precisely, we build an ATA A that accepts all schema additional strategy trees for P layer0 . It has as set of states Q = St×St×{>, ⊥}×{0, 1}×{ok, split} and alphabet Σ = {>, ⊥}. We use in Q a duplication of states as we want to remember the state associated to the parent node while traversing the tree. The flag split is used to remember we have to “enter” two winning strategy paths, so moving to ok. The flag f ∈ {1, 0} indicates whether along a path we have entered or not a target state. As initial state we set q0 = (sI , sI , >, 0, split). Given a state q = (s, s0 , t, f, f¯), the transition relation δ(q, t0 ) is defined as: V   0 00 0 a0 ∈Ac0 (d, (s , s , >, f , ok)) if s0 ∈ St0 ∧ t0 = > ∧ t = > ∧ f¯= ok  V W 0 00 0 ¯0 0 0 ¯ Va0 ∈Ac0 f¯0∈{ok,split} (d,(s , s , >, f , f )) if s ∈ St0 ∧ t = > ∧ t = > ∧ f = split    0 00 0 ¯ 0 0  a1 ∈Ac1 (d, (s , s , >, f , f )) if s ∈ St1 ∧ t = > ∧ t = > f alse if t0 = > and t = ⊥     if t0 = ⊥  true if s0 ∈ St0 then s00 = tr(s0 , a0 ) and d is in accordance with |Ac1 |, else s00 = tr(s0 , a1 ) and d is in accordance with |Ac0 |; if q 0 ∈ W then f 0 = 1, otherwise f0 = f. The set of accepting states is F = {(s, s0 , t, f, f¯) : s, s0 ∈ St ∧ t = > ∧ f = ¯ 1∧ f = ok}. Recall that an input tree is accepted if there exists a run whose leaves are all labeled with accepting states. In our setting this means that an input tree simulates a schema additional strategy tree for P layer0 . So, if the automaton is not empty then P layer0 wins the game with more than one strategy, ie, there exists a schema additional strategy tree for him. Additional Winning Strategies in Two-Player Games 255 The desired computational complexity follows by considering that: (i) the size of the automaton is polynomial in the size of the game, (ii) to check its emptiness can be performed in exponential time [5,8]. In case we study a 2TRG the automaton provided in the above proof sends one copy in each direction. So, the automaton is nondeterministic. By recalling that the emptiness problem in this case is solvable in linear-time [15] and the PTime-hardness for alternating reachability games [9] the following result holds. Theorem 2. Given a 2TRG the problem of deciding whether P layer0 has more than a winning strategy is PTime-Complete. 4 Conclusion and Future Work In this paper we have shown an automata-theoretic approach to check whether a player has more than a winning strategy in a two-player reachability game. Our approach works with optimal asymptotic complexity both in the case the players have perfect information about the moves performed by their adversarial or not. We believe that our results can be used as core engine to count strategies in more involved game scenarios and in many solution concepts reasoning. For example, it can be used to solve the Unique Nash Equilibrium problem, in an extensive game form of finite duration. As future work, it would be useful to check additional winning strategies in multi-agent concurrent games. References 1. R. Alur, T. Henzinger, and O. Kupferman. Alternating-Time Temporal Logic. Journal of the ACM, 49(5):672–713, 2002. 2. B. Aminof, V. Malvone, A. Murano, and S. Rubin. Graded strategy logic: Reasoning about uniqueness of nash equilibria. In AAMAS 2016, pages 698–706, 2016. 3. B. Aminof, A. Murano, and S. Rubin. On ctl* with graded path modalities. In LPAR-20, pages 281–296, 2015. 4. P. Bonatti, C. Lutz, A. Murano, and M. Vardi. The Complexity of Enriched muCalculi. Logical Methods in Computer Science, 4(3):1–27, 2008. 5. E. Emerson and C. Jutla. The Complexity of Tree Automata and Logics of Programs (Extended Abstract). In FOCS’88, pages 328–337. IEEE Computer Society, 1988. 6. A. Ferrante, A. Murano, and M. Parente. Enriched Mu-Calculi Module Checking. Logical Methods in Computer Science, 4(3):1–21, 2008. 7. O. Kupferman, U. Sattler, and M. Vardi. The Complexity of the Graded muCalculus. In CADE’02, LNCS 2392, pages 423–437. Springer, 2002. 8. O. Kupferman, M. Vardi, and P. Wolper. An Automata Theoretic Approach to Branching-Time Model Checking. Journal of the ACM, 47(2):312–360, 2000. 9. O. Kupferman, M. Vardi, and P. Wolper. Module Checking. Information and Computation, 164(2):322–344, 2001. 10. V. Malvone, F. Mogavero, A. Murano, and L. Sorrentino. On the counting of strategies. In TIME 2015, pages 170–179, 2015. 256 Vadim Malvone and Aniello Murano 11. V. Malvone, A. Murano, and L.Sorrentino. Additional Winning Strategies in Finite Games. Under submission. 12. V. Malvone, A. Murano, and L. Sorrentino. Games with additional winning strate- gies. In CILC’15, CEUR Workshop Proceedings, vol. 1459, pages 175–180. CEUR- WS.org, 2015. 13. J. H. Reif. The complexity of two-player games of incomplete information. J. Comput. Syst. Sci., 29(2):274–301, 1984. 14. G. Slutzki. Alternating tree automata. Theoretical Computer Science, 41:305–318, 1985. 15. W. Thomas. Infinite trees and automaton definable relations over omega-words. In STACS’90, pages 263–277, 1990. 16. M. Wooldridge. An Introduction to Multi Agent Systems. John Wiley & Sons, 2002.