Additional Winning Strategies in Two-Player
                     Games?

                       Vadim Malvone and Aniello Murano

                     Università degli Studi di Napoli Federico II


       Abstract. We study the problem of checking whether a two-player reach-
       ability game admits more than a winning strategy. We investigate this in
       case of perfect and imperfect information, and, by means of an automata
       approach we provide a linear-time procedure and an exponential-time
       procedure, respectively. In both cases, the results are tight.


1    Introduction

Game theory is a powerful mathematical framework largely exploited in computer
science and AI [1,9,16]. In the basic setting, we refer to two-player turn-based
games where the players, conventionally named P layer0 and P layer1 , play for
a finite number of rounds. The states of the arena are partitioned among the
players and each player can move only from the states he owns. Solving such a
game amounts to check whether P layer0 has a winning strategy, i.e., a sequence
of moves that allows him to achieve his goal, no matter how his opponent plays.
    In several game settings having a more precise (quantitative) information
about how many winning strategies a player has at his disposal turns out to be
crucial. For example, in Nash Equilibrium, such an information amounts to solve
the challenging question of checking whether the equilibrium is unique [2].
    In this paper, we address the quantitative question of checking whether
P layer0 has more than a strategy to win a finite two-player turn-based game,
under the reachability condition. We consider both the cases that the players have
perfect or imperfect information about the moves performed by the opponent.
For the solution we use an automata-theoretic approach. Precisely, we build an
automaton that accepts all tree witnesses for more than one winning strategy for
P layer0 . Hence, we reduce the addressed quantitative question to the emptiness
of this automaton. In the perfect information setting, this results in a linear-time
upper bound complexity. In the imperfect information setting, instead, it results
in an exponential-time solution. In both cases, the results are tight.
?
  This work is a communication based on the works [12] and [11].
Copyright © by the paper’s authors. Copying permitted for private and academic pur-
poses.
V. Biló, A. Caruso (Eds.): ICTCS 2016, Proceedings of the 17th Italian Conference on
Theoretical Computer Science, 73100 Lecce, Italy, September 7–9 2016, pp. 251–256
published in CEUR Workshop Proceedins Vol-1720 at http://ceur-ws.org/Vol-1720
252     Vadim Malvone and Aniello Murano

Related works. Counting strategies has been deeply exploited in reactive sys-
tems formal verification by means of specification logics extended with graded
modalities, interpreted over games of infinite duration [2,6,10]. These works suit-
ably extend preliminary reasonings in closed system verification [3,4,7].


2     The Game Model

In this section, we present our model and its semantics.

Definition 1. A turn-based two-player reachability game with imperfect infor-
mation ( 2TRGI), P layer0 vs P layer1 , is a tuple G = < St, sI , Ac, tr, W, ∼    =>,
where St = St0 ∪ St1 is a finite non-empty set of states, with Sti set of states of
P layeri , sI ∈ St is an initial state, Ac = Ac0 ∪ Ac1 is the set of actions, W is a
set of target states, tr : Sti × Aci → St1−i , for i ∈ {0, 1} is a transition function
mapping a state of a player and its action to a state belonging to the other player,
and ∼==∼  =0 ∪ ∼=1 is a set of equivalence relations on Ac.

Let i ∈ {0, 1} and a, a0 ∈ Aci be two actions. If a ∼      =1−i a0 we say that a and
  0
a are indistinguishable to P layer1−i . By [Aci ] ⊆ Aci we denote the subset of
actions that are distinguishable/visible for P layer1−i . In particular, for each set
of equivalent actions over Aci , we put a representative action in [Aci ]. From
any s ∈ Sti , if a ∼
                   =1−i a0 then also the reached states are indistinguishable, ie
tr(s, a) = s and tr(s, a0 ) = s00 are indistinguishable for P layer1−i . A relation ∼
            0
                                                                                       =i
is an identity if a ∼
                    =i a0 iff a = a0 . A 2TRGI has perfect information (a 2TRG,
for short) if ∼= contains only identities. To give the semantics of 2TRGIs, we
introduce the concepts of track, strategy and play.
    A track is a finite sequence of states ρ ∈ St∗ such that, for all i < |ρ|, if
(ρ)i ∈ St0 then there exists an action a0 ∈ Ac0 such that (ρ)i+1 = tr((ρ)i , a0 ),
else there exists an action a1 ∈ Ac1 such that (ρ)i+1 = tr((ρ)i , a1 ), where (ρ)i
denotes the i-st element of ρ. By last(ρ) we denote the last element of ρ and
by ρ≤i the prefix track (ρ)0 . . . (ρ)i . By Trk ⊆ St∗ we denote the set of tracks
over St and by Trki the set of tracks ρ in which last(ρ) ∈ Sti . For simplicity, we
assume that all tracks in Trk start at the initial state sI ∈ St.
    A strategy for P layeri is a function σi : Trki → Aci that maps a track to an
action. A strategy is uniform if it adheres on the visibility (visible actions) of
the players. In the rest of the paper we only refer to uniform strategies.
    The composition of strategies, one for each player, induces a computation
called play. Precisely, assume P layer0 and P layer1 take strategies σ0 and σ1 ,
respectively. Their composition induces a play ρ such that (ρ)0 = sI and for each
i ≥ 0 if (ρ)i ∈ St0 then (ρ)i+1 = tr((ρ)i , σ0 (ρ≤i )), else (ρ)i+1 = tr((ρ)i , σ1 (ρ≤i )).
    We can now give the definition of reachability winning condition.

Definition 2. P layer0 wins a 2TRGI, under the reachability condition, if he has
a strategy that for all strategies of P layer1 the resulting induced play will enter
a state in W. Such a strategy is said winning for P layer0 .
                         Additional Winning Strategies in Two-Player Games          253

    Given a 2TRGI and one of its play ρ, it is possible to check who is the
winner in ρ by considering only (ρ)0 , . . . , (ρ)|St|+1 . In fact, if (ρ)i ∈ W for some
i ∈ {0, . . . , |St| + 1} then P layer0 wins the play ρ, otherwise there exists a loop
in which P layer1 can stay infinitely, and then he wins the game.
    We formalize the winning condition by means of a tree structure that we call
schema strategy tree. To proper introduce it, we provide the concept of decision
tree. We start with some basic notion about trees.
    Let Υ be a set. An Υ -tree is a prefix closed subset T ⊆ Υ ∗ . The elements of
T are called nodes and the empty word ε is the root of T . Given a node v = y · x,
with y ∈ Υ ∗ and x ∈ Υ , we define prf (v) to be y and last(v) to be x. For an
alphabet Σ, a Σ-labeled Υ -tree is a pair < T, V > where T is an Υ −tree and
V : T → Σ maps each node of T to a symbol in Σ.
    A decision tree is the unwinding of the game structure along with all possible
combinations of player actions, ie a tree that collects all tracks over the game. A
winning strategy can be seen as an opportune mapping, over the decision tree, of
a player’s "strategy schema" built over the visibility. In other words, the player
first makes a decision over a set S of indistinguishable states and then this choice
is used in the decision tree for each state in S. This makes the decision tree
uniform. However, observe that we use synchronous memoryfull strategies. This
means that in a decision tree, the set S of indistinguishable states resides at the
same level. To make this idea more precise, we now formalize the concept of
schema strategy tree and uniform strategy tree.

Definition 3. Given a 2TRGI and a uniform strategy σ for P layeri , a schema
strategy tree for P layeri is a {>, ⊥}-labeled (Aci ∪ [Ac1−i ])-tree < T, V >, with
T ⊂ (Aci ∪ [Ac1−i ])∗ and V as follows: (i) V (ε) = >; (ii) for all v ∈ T , if
last(v) ∈ [Ac1−i ] then V (v) = >, else let ρ = (ρ)0 . . . (ρ)|v|−1 be a track from sI ,
with (ρ)k = tr((ρ)k−1 , last(v≤k )) for each 0 ≤ k ≤ |v| − 1, if last(v) = σ(ρ) then
V (v) = >, else V (v) = ⊥.

    In a schema strategy tree the > label indicates that P layeri selects the
corresponding set of visible states in the decision tree and ⊥ is used conversely.
In particular, the starting node of the game is the root of the schema strategy tree
and it is always enabled (condition (i)); all nodes belonging to the adversarial
player are always enabled; and one of the successors of P layeri nodes is enabled
in accordance with the uniform strategy σ (condition (ii)). Simply, a uniform
strategy tree is a projection of the decision tree along the schema strategy tree.


3    Additional Winning Strategies for 2TRGI
In this section we provide the main result of this work. Technically, we make use
of alternating tree automata [14].
    An alternating tree automaton (ATA) is a tuple A =< Σ, D, Q, q0 , δ, F >,
where Σ is the alphabet, D a finite set of directions, Q the set of states, q0 ∈ Q
the initial state, δ : Q×Σ → B + (D×Q) the transition function, where B + (D×Q)
is the set of all positive Boolean combinations of pairs (d, q) with d direction and
254       Vadim Malvone and Aniello Murano

q state, and F ⊆ Q the set of the accepting states. An ATA recognizes (finite)
trees by means of runs. For a Σ-labeled tree < T, V >, with T = D∗ , a run is a
(D∗ × Q)-labeled N-tree < Tr , r > such that the root is labeled with (ε, q0 ) and
the labels of each node and its successors satisfy the transition relation. A run
is accepting if all its leaves are labeled with accepting states. An input tree is
accepted if there exists a corresponding accepting run. By L(A) we denote the
set of trees accepted by A. We say that A is not empty if L(A) 6= ∅.
    For our purpose, we formalize the concept of schema additional strategy tree.

Definition 4. Given a 2TRGI and two uniform strategies σ and σ 0 for P layeri ,
a schema additional strategy tree for P layeri is a {>, ⊥}-labeled (Aci ∪ [Ac1−i ])-
tree < T, V >, with T ⊂ (Aci ∪ [Ac1−i ])∗ and V as follows: (i) V (ε) = >; (ii)
for all v ∈ T , if last(v) ∈ [Ac1−i ] then V (v) = >, else let ρ = (ρ)0 . . . (ρ)|v|−1
be a track from sI , with (ρ)k = tr((ρ)k−1 , last(v≤k )) for each 0 ≤ k ≤ |v| − 1, if
last(v) = σ(ρ) or last(v) = σ 0 (ρ) then V (v) = >, else V (v) = ⊥.

      Now, we have all ingredients to give the following result.

Theorem 1. Given a 2TRGI the problem of deciding whether P layer0 has more
than a uniform winning strategy is ExpTime-Complete.

Proof. For the lower bound, we recall that 2-player turn-based games with imper-
fect information is ExpTimeH [13]. For the upper bound, we use an automata ap-
proach. Precisely, we build an ATA A that accepts all schema additional strategy
trees for P layer0 . It has as set of states Q = St×St×{>, ⊥}×{0, 1}×{ok, split}
and alphabet Σ = {>, ⊥}. We use in Q a duplication of states as we want to
remember the state associated to the parent node while traversing the tree. The
flag split is used to remember we have to “enter” two winning strategy paths, so
moving to ok. The flag f ∈ {1, 0} indicates whether along a path we have entered
or not a target state. As initial state we set q0 = (sI , sI , >, 0, split). Given a
state q = (s, s0 , t, f, f¯), the transition relation δ(q, t0 ) is defined as:
V


                   0 00        0
     a0 ∈Ac0 (d, (s , s , >, f , ok))            if s0 ∈ St0 ∧ t0 = > ∧ t = > ∧ f¯= ok
  V        W                     0   00  0 ¯0       0           0               ¯
Va0 ∈Ac0 f¯0∈{ok,split} (d,(s , s , >, f , f )) if s ∈ St0 ∧ t = > ∧ t = > ∧ f = split



                   0 00        0 ¯                   0           0
    a1 ∈Ac1 (d, (s , s , >, f , f ))            if s ∈ St1 ∧ t = > ∧ t = >
   f alse                                        if t0 = > and t = ⊥




                                                 if t0 = ⊥

true

    if s0 ∈ St0 then s00 = tr(s0 , a0 ) and d is in accordance with |Ac1 |, else s00 =
tr(s0 , a1 ) and d is in accordance with |Ac0 |; if q 0 ∈ W then f 0 = 1, otherwise
f0 = f.
    The set of accepting states is F = {(s, s0 , t, f, f¯) : s, s0 ∈ St ∧ t = > ∧ f =
    ¯
1∧ f = ok}. Recall that an input tree is accepted if there exists a run whose leaves
are all labeled with accepting states. In our setting this means that an input tree
simulates a schema additional strategy tree for P layer0 . So, if the automaton is
not empty then P layer0 wins the game with more than one strategy, ie, there
exists a schema additional strategy tree for him.
                        Additional Winning Strategies in Two-Player Games       255

    The desired computational complexity follows by considering that: (i) the
size of the automaton is polynomial in the size of the game, (ii) to check its
emptiness can be performed in exponential time [5,8].

   In case we study a 2TRG the automaton provided in the above proof sends
one copy in each direction. So, the automaton is nondeterministic. By recalling
that the emptiness problem in this case is solvable in linear-time [15] and the
PTime-hardness for alternating reachability games [9] the following result holds.

Theorem 2. Given a 2TRG the problem of deciding whether P layer0 has more
than a winning strategy is PTime-Complete.


4   Conclusion and Future Work

In this paper we have shown an automata-theoretic approach to check whether
a player has more than a winning strategy in a two-player reachability game.
Our approach works with optimal asymptotic complexity both in the case the
players have perfect information about the moves performed by their adversarial
or not. We believe that our results can be used as core engine to count strategies
in more involved game scenarios and in many solution concepts reasoning. For
example, it can be used to solve the Unique Nash Equilibrium problem, in an
extensive game form of finite duration. As future work, it would be useful to
check additional winning strategies in multi-agent concurrent games.


References

 1. R. Alur, T. Henzinger, and O. Kupferman. Alternating-Time Temporal Logic.
    Journal of the ACM, 49(5):672–713, 2002.
 2. B. Aminof, V. Malvone, A. Murano, and S. Rubin. Graded strategy logic: Reasoning
    about uniqueness of nash equilibria. In AAMAS 2016, pages 698–706, 2016.
 3. B. Aminof, A. Murano, and S. Rubin. On ctl* with graded path modalities. In
    LPAR-20, pages 281–296, 2015.
 4. P. Bonatti, C. Lutz, A. Murano, and M. Vardi. The Complexity of Enriched
    muCalculi. Logical Methods in Computer Science, 4(3):1–27, 2008.
 5. E. Emerson and C. Jutla. The Complexity of Tree Automata and Logics of Programs
    (Extended Abstract). In FOCS’88, pages 328–337. IEEE Computer Society, 1988.
 6. A. Ferrante, A. Murano, and M. Parente. Enriched Mu-Calculi Module Checking.
    Logical Methods in Computer Science, 4(3):1–21, 2008.
 7. O. Kupferman, U. Sattler, and M. Vardi. The Complexity of the Graded muCalculus.
    In CADE’02, LNCS 2392, pages 423–437. Springer, 2002.
 8. O. Kupferman, M. Vardi, and P. Wolper. An Automata Theoretic Approach to
    Branching-Time Model Checking. Journal of the ACM, 47(2):312–360, 2000.
 9. O. Kupferman, M. Vardi, and P. Wolper. Module Checking. Information and
    Computation, 164(2):322–344, 2001.
10. V. Malvone, F. Mogavero, A. Murano, and L. Sorrentino. On the counting of
    strategies. In TIME 2015, pages 170–179, 2015.
256    Vadim Malvone and Aniello Murano

11. V. Malvone, A. Murano, and L.Sorrentino. Additional Winning Strategies in Finite
    Games. Under submission.
12. V. Malvone, A. Murano, and L. Sorrentino. Games with additional winning strate-
    gies. In CILC’15, CEUR Workshop Proceedings, vol. 1459, pages 175–180. CEUR-
    WS.org, 2015.
13. J. H. Reif. The complexity of two-player games of incomplete information. J.
    Comput. Syst. Sci., 29(2):274–301, 1984.
14. G. Slutzki. Alternating tree automata. Theoretical Computer Science, 41:305–318,
    1985.
15. W. Thomas. Infinite trees and automaton definable relations over omega-words.
    In STACS’90, pages 263–277, 1990.
16. M. Wooldridge. An Introduction to Multi Agent Systems. John Wiley & Sons, 2002.