A Note on the Approximation of Mean-Payoff Games

                                        Raffaella Gentilini1
                                  1
                                      University of Perugia, Italy


         Abstract. We consider the problem of designing approximation schemes for the
         values of mean-payoff games. It was recently shown that (1) mean-payoff with
         rational weights scaled on [−1, 1] admit additive fully-polynomial approxima-
         tion schemes, and (2) mean-payoff games with positive weights admit relative
         fully-polynomial approximation schemes. We show that the problem of design-
         ing additive/relative approximation schemes for general mean-payoff games (i.e.
         with no constraint on their edge-weights) is P-time equivalent to determining
         their exact solution.


1      Introduction

Two-player mean-payoff games are played on weighted graphs1 with two types of ver-
tices: in player-0 vertices, player 0 chooses the successor vertex from the set of outgoing
edges; in player-1 vertices, player 1 chooses the successor vertex from the set of outgo-
ing edges. The game results in an infinite path through the graph. The long-run average
of the edge-weights along this path, called the value of the play, is won by player 0 and
lost by player 1.
    The decision problem for mean-payoff games asks, given a vertex v and a threshold
ν ∈ Q, if player 0 has a strategy to win a value at least ν when the game starts in v.
The value problem consists in computing the maximal (rational) value that player 0 can
achieve from each vertex v of the game. The associated (optimal) strategy synthesis
problem is to construct a strategy for player 0 that secures the maximal value.
    Mean-payoff games have been first studied by Ehrenfeucht and Mycielski in [1],
where it is shown that memoryless (or positional) strategies suffice to achieve the op-
timal value. This result entails that the decision problem for these games lies in NP
∩ coNP [2, 18], and it was later shown to belong to2 UP ∩ coUP [10]. Despite many
efforts [19, 18, 13, 5, 6, 20, 9, 12], no polynomial-time algorithm for the mean-payoff
game problems is known so far.
    Beside such a theoretically engaging complexity status, mean-payoff games have
plenty of applications, especially in the synthesis, analysis and verification of reactive
(non-terminating) systems. Many natural models of such systems include quantitative
information, and the corresponding question requires the solution of quantitative games,
like mean-payoff games. Concrete examples of applications include various kinds of
scheduling, finite-window online string matching, or more generally, analysis of online
problems and algorithms, as well as selection with limited storage [18]. Mean-payoff
games can even be used for solving the max-plus algebra Ax = Bx problem, which in
 1
     in which every edge has a positive/negative (rational) weight
 2
     The complexity class UP is the class of problems recognizable by unambiguous polynomial
     time nondeterministic Turing machines [14]. Obviously P⊆ UP ∩ coUP ⊆ NP ∩ coNP.
                                                      Problems
    Algorithms      Decision Problem                            Value Problem                          Note
       [12]          O(|E | · |V | · W )                           2
                                                      O(|E | · |V | · W · (log|V | + logW ))        Deterministic
                                     2
       [18]         Θ(|E | · |V | · W )                        Θ(|E | · |V | · W )3
                                                                                                    Deterministic
                                           |V |                                   |V |
       [20]         O(|E | · |V | · 2             )        O(|E | · |V | · 2             · logW )   Deterministic
                                          2
       [9]       min(O(|E
                     √
                          | · |V | · W ), min(O(|E | · √                 3
                                                       |V | · W · (logV + logW )), Randomized
                   2 O(   |V |·log|V |)
                                          · logW )          2 O(       |V |·log|V |)
                                                                                       · logW )


              Table 1. Complexity of the main algorithms to solve mean-payoff games.


turn has further applications [6]. Beside their applicability to the modeling of quantita-
tive problems, mean-payoff games have tight connections with important problems in
game theory and logic. For instance, parity games [8] and the model-checking problem
for the modal mu-calculus [11] are poly-time reducible to mean-payoff games [7], and
it is a long-standing open question to know whether these problems are in P.
     Table 1 summarize the complexity of the main algorithms for solving mean-payoff
games in the literature. In particular, all deterministic algorithms for mean-payoff games
are either pseudopolynomial (i.e., polynomial in the number of vertices |V |, the number
of edges |E|, and the maximal absolute weight W , rather than in the binary represen-
tation of W ) or exponential [19, 18, 13, 12, 20, 17]. The works in [9, 3] define a ran-
domized algorithm which is both subexponential and pseudopolynomial. Recently, the
authors of [15, 4] show that the pseudopolynomial procedures in [18, 13, 12] can be used
to design (fully) polynomial value approximation schemes for certain classes of mean-
payoff games: namely, mean-payoff games with positive (integer) weights or rational
weights with absolute value less or equal to 1. In this paper, we consider the problem
of extending such positive approximation results for general mean-payoff games, i.e.
mean-payoff games with weights arbitrary shifted/scaled on the line of rational num-
bers.


2     Preliminaries and Definitions

Game graphs A game graph is a tuple Γ = (V, E, w, hV0 , V1 i) where GΓ = (V, E, w)
is a weighted graph and hV0 , V1 i is a partition of V into the set V0 of player-0 vertices
and the set V1 of player-1 vertices. An infinite game on Γ is played for infinitely many
rounds by two players moving a pebble along the edges of the weighted graph GΓ . In
the first round, the pebble is on some vertex v ∈ V . In each round, if the pebble is on
a vertex v ∈ Vi (i = 0, 1), then player i chooses an edge (v, v 0 ) ∈ E and the next
round starts with the pebble on v 0 . A play in the game graph Γ is an infinite sequence
p = v0 v1 . . . vn . . . such that (vi , vi+1 ) ∈ E for all i ≥ 0. A strategy for player i
(i = 0, 1) is a function σ : V ∗ · Vi → V , such that for all finite paths v0 v1 . . . vn with
vn ∈ Vi , we have (vn , σ(v0 v1 . . . vn )) ∈ E. A strategy-profile is a pair of strategies
hσ0 , σ1 i, where σ0 (resp. σ1 ) is a strategy for player 0 (resp. player 1). We denote by
Σi (i = 0, 1) the set of strategies for player i. A strategy σ for player i is memoryless
if σ(p) = σ(p0 ) for all sequences p = v0 v1 . . . vn and p0 = v00 v10 . . . vm   0
                                                                                      such that
            0                      M
vn = vm . We denote by Σi the set of memoryless strategies of player i. A play
v0 v1 . . . vn . . . is consistent with a strategy σ for player i if vj+1 = σ(v0 v1 . . . vj ) for
all positions j ≥ 0 such that vj ∈ Vi . Given an initial vertex v ∈ V , the outcome of
the strategy profile hσ0 , σ1 i in v is the (unique) play outcomeΓ (v, σ0 , σ1 ) that starts in
v and is consistent with both σ0 and σ1 . Given a memoryless strategy πi for player i
in the game Γ , we denote by GΓ (πi ) = (V, Eπi , w) the weighted graph obtained by
removing from GΓ all edges (v, v 0 ) such that v ∈ Vi and v 0 6= πi (v).


Mean-Payoff Games A mean-payoff game (MPG) [1] is an infinite game played on
a game graph Γ where player 0 wins a payoff value defined as the long-run average
weights of the play, while player 1 loses that value. Formally, the payoff value of a play
v0 v1 . . . vn . . . in Γ is
                                                            n−1
                                                         1 X
                   MP(v0 v1 . . . vn . . . ) = lim inf    ·     w(vi , vi+1 ).
                                                n→∞      n i=0

The value secured by a strategy σ0 ∈ Σ0 in a vertex v is

                       valσ0 (v) = inf       MP(outcomeΓ (v, σ0 , σ1 ))
                                    σ1 ∈Σ1

and the (optimal) value of a vertex v in a mean-payoff game Γ is

                   valΓ (v) = sup        inf    MP(outcomeΓ (v, σ0 , σ1 )).
                                σ0 ∈Σ0 σ1 ∈Σ1


We say that σ0 is optimal if valσ0 (v) = valΓ (v) for all v ∈ V . Secured value and opti-
mality are defined analogously for strategies of player 1. Ehrenfeucht and Mycielski [1]
show that mean-payoff games are memoryless determined, i.e., memoryless strategies
are sufficient for optimality and the optimal (maximum) value that player 0 can secure
is equal to the optimal (minimum) value that player 1 can achieve.

Theorem 1 ([1]). For all MPG Γ = (V, E, w, hV0 , V1 i) and for all vertices v ∈ V , we
have

valΓ (v) = sup       inf MP(outcomeΓ (v, σ0 , σ1 )) = inf            sup MP(outcomeΓ (v, σ0 , σ1 )),
            σ0 ∈Σ0 σ1 ∈Σ1                                     σ1 ∈Σ1 σ0 ∈Σ0


and there exist two memoryless strategies π0 ∈ Σ0M and π1 ∈ Σ1M such that

                              valΓ (v) = valπ0 (v) = valπ1 (v).

Moreover, uniform optimal strategies exist for both players, i.e. there exists a strategy
profile hσ0 , σ1 i that can be used to secure the optimal value independently of the initial
vertex [1]. Such a strategy profile is said the optimal strategy profile.
    The following lemma characterizes the shape of MPG values in a MPG Γ =
(V, E, w, hV0 , V1 i) with integer weights in {−W, . . . , W }. Note that solving MPG with
rational weights is P-time reducible to solving MPG with integer weights [20, 18].
Lemma 1 ([1, 20]). Let Γ = (V, E, w, hV0 , V1 i) be a MPG with integer weights and
let W = max(v,v0 )∈E |w(v, v 0 )|. For each vertex v ∈ V , the optimal value valΓ (v) is a
rational number nd such that 1 ≤ d ≤ |V | and |n| ≤ d · W .
We consider the following three classical problems [18, 9] for a MPG Γ = (V, E, w, hV0 , V1 i):
 1. Decision Problem. Given a threshold ν ∈ Q and a vertex v ∈ V , decide if valΓ (v) ≥
    ν.
 2. Value Problem. Compute for each vertex v ∈ V the value valΓ (v).
 3. (Optimal) Strategy Problem . Given an MPG Γ , compute an (optimal) strategy
    profile for Γ .

Approximate Solutions for MPG
Dealing with approximate MPG solutions, we can take into consideration either ab-
solute or relative error measures, and define the notions of additive and relative MPG
approximate value.
Definition 1 (MPG additive ε-value). Let Γ = (V, E, w, hV0 , V1 i) be a MPG, let
v ∈ V and consider ε ≥ 0. The value val
                                    f ∈ Q is said an additive ε-value on v if and
only if:
                               f − valΓ (v)| ≤ ε
                              |val
Definition 2 (MPG relative ε-value). Let Γ = (V, E, w, hV0 , V1 i) be a MPG, let v ∈
V and consider ε ≥ 0. The value val
                                f ∈ Q is said a relative ε-value on v if and only if:
                                     f − valΓ (v)|
                                    |val
                                                   ≤ε
                                       |valΓ (v)|
Note that additive MPG ε-values are shift-invariant. More precisely, if valf is an additive
approximate ε-value on the vertex v in Γ = (V, E, w, hV0 , V1 i), then val  f + k is an
additive approximate ε-value in the MPG Γ 0 = (V, E, w + k, hV0 , V1 i), where all the
weights are shifted by k. On the contrary, additive MPG ε-values are not scale-invariant.
            f is a relative ε-value for v in the MPG Γ = (V, E, w, hV0 , V1 i), then k · val
In fact, if val                                                                          f
is a relative ε · k-value for v in the MPG Γ 0 = (V, E, w · k, hV0 , V1 i), where all the
weights are multiplied by k. In other words, the additive error on Γ is amplified by a
factor k in the scaled version of the game, Γ 0 . Conversely, relative MPG ε-values are
scale invariant but not shift invariant.
    The notions of (fully) polynomial approximation schemes w.r.t relative and additive
errors are formally defined below.
Definition 3 (MPG Fully Polynomial Time Approximation Scheme (FPTAS)). An
additive (resp. relative) fully polynomial approximation scheme for the MPG Γ =
(V, E, w, hV0 , V1 i) is an algorithm A such that for all ε > 0, A computes an additive
(resp. relative) ε-value in time polynomial w.r.t. the size3 of Γ and 1ε .
Definition 4 (MPG Polynomial Time Approximation Scheme (PTAS)). An additive
(resp. relative) polynomial approximation scheme for the MPG Γ = (V, E, w, hV0 , V1 i)
is an algorithm A such that for all ε > 0, A computes an additive (resp. relative) ε-
value in time polynomial w.r.t. the size of Γ .
 3
     Given Γ = (V, E, w, hV0 , V1 i), size(Γ ) = |E| + |V | + log(W ), where W is the maximum
     (absolute value) of a weight in Γ .
3   Mean-Payoff Games and Additive Approximation Schemes
Recently, [15] provides an additive fully polynomial scheme for the MPG value prob-
lem on graphs with rational weights in the interval [−1, +1]. A natural question is
whether we could efficiently approximate the value in MPG with no restrictions on
the weights. The next theorem shows that a generalization of the positive approxima-
tion result in [15] on MPG with arbitrary (rational) weights would indeed provide a
polynomial time exact solution to the MPG value problem.

Theorem 2. The MPG value problem does not admit an additive FPTAS, unless it is in
P.

Proof. We start to consider the MPG problem on graphs with integer weights. Assume
that the MPG value problem on graphs with integer weights admits an additive FPTAS.
Given a MPG Γ = (V, E, w, hV0 , V1 i) and a vertex v ∈ V , let |V | = n and ε =
    1
2n(n−1) . Then, our FPTAS computes an additive ε-value val on v in time polynomial
                                                         f
                           Γ
w.r.t. n. By Lemma 1, val (v) is a rational number with denominator d such that 1 ≤
d ≤ n. Two rationals with denominator d for which 1 ≤ d ≤ n have distance at least
   1
n(n−1) . Hence, there is a unique rational with denominator d, 1 ≤ d ≤ n, within the
                                                           1
interval I = {q ∈ Q | val
                      f − ε ≤ q ≤ valf + ε}, where ε =          . Such unique rational
                                                            2n(n−1)
is valΓ (v) and can be easily found in time logarithmic w.r.t. n [16]. Thus, we have an
algorithm A to solve the value problem on Γ in time polynomial w.r.t. n. The MPG
problem on graphs with rational weights can be reduced in polynomial time (w.r.t. the
size of Γ ) to the MPG on graphs with integer weights by simply resizing the weights
in the original graph [20, 12].                                                       t
                                                                                      u

In view of the proof of the above theorem, we could still hope to obtain some positive
approximation results for general (i.e. arbitrarly scaled) MPG by considering weaker
notion of approximations with respect to FPTAS. Unfortunately, the next lemma shows
that the following is sufficient to show that the MPG value problem is in P: determining
in time polynomial w.r.t. the size of a given MPG Γ a k-approximate value of v, where
v ∈ V and k is an arbitrary constant.
Theorem 3. For any constant k: If the problem of computing an additive k-approximate
MPG value can be solved in polynomial time (w.r.t. the size of the input MPG), then
the MPG value problem belongs to P.

Proof. We start to consider MPG with integer weights. Let v be a vertex in the MPG
Γ = (V, E, w : E 7→ [−W, W ], hV0 , V1 i) and denote |V | = n. If 2k + 1 > (n − 2)!,
then the problem of determining valΓ (v) can be solved in time O(k k ) = O(1) by
simply enumerating all the strategies available to the players.
    Otherwise, assume 2k +1 ≤ (n−2)!. Consider the game Γ 0 = (V, E, w0 , hV0 , V1 i),
where ∀e ∈ E : w0 (e) = w(e) · n!. By hypothesis, there is an algorithm A that
computes a k-approximate value val f for v on Γ 0 in time T polynomial w.r.t. the size
     0
of Γ . Since log(W · n!) = O(n · log(n) + log(W )), T is also polynomial w.r.t.
                                      0
the size of Γ . By construction, valΓ (v) is an integer. There are at most 2k + 1 in-
                        f − k, val
tegers in the interval [val       f + k], thus we have at most 2k + 1 candidates
{ bval−kc , . . . , bval+kc } for valΓ (v). Moreover, those candidates lie in an interval of
    f           g
     n!                n!
length L ≤ 2k+1
              n!  ≤ (n−2)!
                       n!
                                    1
                             = n·(n−1) . The minimum distance between two possible
                  Γ          1
candidates for val (v) is n·(n−1) .
    The exact value valΓ (v) is thus the unique rational number with denominator of
size at most n that lies in the interval [ bval−kc , bval+kc
                                          f        g
                                              n!        n!   ] and can be easily found in time
logarithmic w.r.t. n [16].
    The MPG problem on graphs with rational weights can be reduced in polynomial
time (w.r.t. the size of Γ ) to the MPG on graphs with integer weights by simply resizing
the weights in the original graph [20, 12].                                                  t
                                                                                             u

A direct consequence of Theorem 3 is that the MPG value problem does not admit
a PTAS, unless it is in P. More precisely, Theorem 2 and Theorem 3 entail a result
of P-time equivalence between the exact MPG value problem and the three classes of
approximations listed in Corollary 1.
Corollary 1. The following problems are P-time equivalent:

 1. Solving the MPG value problem.
 2. Determining an additive FPTAS for the MPG value problem.
 3. Determining an additive PTAS for the MPG value problem.
 4. Computing an additive k-approximate MPG value in polynomial time, for any con-
    stant k.


4   Mean-Payoff Games and Relative Approximation Schemes

In the recent work in [4], the authors consider the design approximation schemes for
the MPG value problem based on the relative–rather than absolute–error. In particular,
they provide a relative FPTAS for the MPG value problem on graphs with nonnegative
weights. Note that negative weights are necessary to encode parity games and the µ-
calculus model checking into MPG games [10]. The following theorem considers the
problem of designing (fully) polynomial approximation schemes for the MPG value
problem on graphs with arbitrary (positive and negative) rational weights. It shows that
solving such a problem would indeed provide an exact solution to the MPG value prob-
lem, computable in time polynomial w.r.t. the size of the MPG.
Theorem 4. The MPG value problem does not admit a relative PTAS, unless it is in P.

Proof. Let Γ = (V, E, p, hV0 , V1 i) be a MPG, let v ∈ V . Assume that MPG admit a
relative PTAS and consider ε = 21 . Our assumption entails that we have an algorithm
A that computes a relative 12 -value val
                                       f for v in time polynomial w.r.t. the size of Γ .
               f ≥ 0 if and only if valΓ (v) ≥ 0. In other words, we show that the
We show that val
MPG decision problem is PTIME reducible to the computation of a relative 21 -value.
By definition of relative ε-value, for ε = 12 , we have:

                                    f − valΓ (v)|
                                   |val             1
                                                  ≤                                       (1)
                                      |valΓ (v)|    2

    We have four cases to consider:
                                   f − valΓ (v) ≥ 0 and val
 1. In the first case, assume that val                  f ≥ 0. By contradiction,
                  Γ
    suppose val (v) < 0. Then, Disequation implies:

                            f − valΓ (v) ≤ 1 · |valΓ (v)|
                            val                                  ⇒
                                              2
                            val ≤ valΓ (v) + 21 · |valΓ (v)| < 0
                            f

    that contradicts our hypothesis.
                                     f − valΓ (v) ≥ 0 and val
 2. In the second case, assume that val                                    f ≥
                                                          f < 0. Then, 0 > val
        Γ
    val (v). that contradicts our hypothesis.
                                   f − valΓ (v) < 0 and val
 3. In the third case, assume that val                   f < 0. By contradiction,
                 Γ
    suppose val (v) > 0. Then, Disequation implies:

                                        f ≤ 1 · |valΓ (v)|
                            valΓ (v) − val                   ⇒
                                            2
                            f ≥ val (v) − 1 · |valΓ (v)| ≥ 0
                            val       Γ
                                                 2

    that contradicts our hypothesis.
                                  f − valΓ (v) < 0 and val
 4. The last case to consider is: val                  f ≥ 0. Then, valΓ (v) >
     f ≥ 0.
    val
Provided a P-time algorithm for deciding whether valΓ (v) ≥ 0, a dichotomic search
can be used to determine valΓ (v) in time polynomial w.r.t. the size of Γ [12, 20]. t
                                                                                    u

As a direct consequence of Theorem 4 we obtain the following result of P-time equiv-
alence involving the computation of MPG exact and approximate solutions.
Corollary 2. The following problems are P-time equivalent:
 1. Solving the MPG value problem.
 2. Determining a relative FPTAS for the MPG value problem.
 3. Determining a relative PTAS for the MPG value problem.


References
 1. A. Ehrenfeucht and J. Mycielski. International journal of game theory. Positional Strategies
    for Mean-Payoff Games, 8:109–113, 1979.
 2. A. V. Karzanov and V. N. Lebedev. Cyclical games with proibitions. Mathematical Pro-
    graming, 60:277–293, 1993.
 3. D. Andersson and S. Vorobyov. Fast algorithms for monotonic discounted linear programs
    with two variables per inequality. Technical Report Preprint NI06019-LAA, Isaac Newton
    Institute for Mathematical Sciences, Cambridge, UK, 2006.
 4. Y. Boros, K. Elbassioni, M. Fouz, V. Gurvich, K. Makino, and B. Manthey. Stochastic
    mean-payoff games: Smoothed analysis and approximation schemes. In Proc. of ICALP:
    Colloquium on Automata, Languages and Programming, 2011.
 5. A. Condon. On algorithms for simple stochastic games. In Advances in Computational
    Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical
    Computer Science, pages 51–73. American Mathematical Society, 1993.
 6. V. Dhingra and S. Gaubert. How to solve large scale deterministic games with mean payoff
    by policy iteration. In Proc. Performance evaluation methodolgies and tools, article no. 12.
    ACM, 2006.
 7. E. A. Emerson, C. Jutla, and A. P. Sistal. On model checking for fragments of the µ-calculus.
    In Proc. of CAV: Computer Aided Verification, LNCS 697, pages 385–396. Springer, 1993.
 8. Y. Gurevich and L. Harrington. Trees, automata, and games. In Proc. of STOC: Symposium
    on Theory of Computing, pages 60–65. ACM, 1982.
 9. H. Bjorklund and S. Vorobyov. A combinatorial strongly subexponential strategy improve-
    ment algorithm for mean payoff games. Discrete Applied Mathematics, 155:210–229, 2007.
10. M. Jurdzinski. Deciding the winner in parity games is in UP ∩ coUP. Inf. Process. Lett.,
    68(3):119–124, 1998.
11. D. Kozen. Results on the propositional mu-calculus. Theor. Comput. Sci., 27:333–354, 1983.
12. L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J-F. Raskin. Faster algorithms for mean
    payoff games. Formal Methods in System Design, 38(2):97–118, 2011.
13. N. Pisaruk. Mathematics of operations research. Mean Cost Cyclical Games, 4(24):817–828,
    1999.
14. C. M. Papadimitriou. Computational complexity. Addison-Wesley, Reading, Massachusetts,
    1994.
15. A. Roth, M. Balcan, A. Kalai, and Y. Mansour. On the equilibria of alternating move games.
    In Proc. of SODA: Symposium on Discrete Algorithms, pages 805–816. ACM, 2010.
16. S. Kwek and K. Mehlhorn. Optimal search for rationals. Information Processing Letters,
    86:23–26, 2003.
17. S. Schewe. From parity and payoff games to linear programming. In Proceedings of MFCS:
    Mathematical Foundations of Computer Science, LNCS 5734, pages 675–686. Springer,
    2009.
18. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical
    Computer Science, 158:343–359, 1996.
19. V. A. Gurvich, A. V. Karzanov, and L. G. Kachiyan. Ussr computational mathematics and
    mathematical physics. Cyclic Games and an Algorithm to Find Minmax Cycle Means in
    Directed Graphs, 5(28):85–91, 1988.
20. Y. Lifshits and D. Pavlov. Potential theory for mean payoff games. Journal of Mathematical
    Sciences, 145(3):4967–4974, 2007.