A Note on the Approximation of Mean-Payoff Games Raffaella Gentilini1 1 University of Perugia, Italy Abstract. We consider the problem of designing approximation schemes for the values of mean-payoff games. It was recently shown that (1) mean-payoff with rational weights scaled on [−1, 1] admit additive fully-polynomial approxima- tion schemes, and (2) mean-payoff games with positive weights admit relative fully-polynomial approximation schemes. We show that the problem of design- ing additive/relative approximation schemes for general mean-payoff games (i.e. with no constraint on their edge-weights) is P-time equivalent to determining their exact solution. 1 Introduction Two-player mean-payoff games are played on weighted graphs1 with two types of ver- tices: in player-0 vertices, player 0 chooses the successor vertex from the set of outgoing edges; in player-1 vertices, player 1 chooses the successor vertex from the set of outgo- ing edges. The game results in an infinite path through the graph. The long-run average of the edge-weights along this path, called the value of the play, is won by player 0 and lost by player 1. The decision problem for mean-payoff games asks, given a vertex v and a threshold ν ∈ Q, if player 0 has a strategy to win a value at least ν when the game starts in v. The value problem consists in computing the maximal (rational) value that player 0 can achieve from each vertex v of the game. The associated (optimal) strategy synthesis problem is to construct a strategy for player 0 that secures the maximal value. Mean-payoff games have been first studied by Ehrenfeucht and Mycielski in [1], where it is shown that memoryless (or positional) strategies suffice to achieve the op- timal value. This result entails that the decision problem for these games lies in NP ∩ coNP [2, 18], and it was later shown to belong to2 UP ∩ coUP [10]. Despite many efforts [19, 18, 13, 5, 6, 20, 9, 12], no polynomial-time algorithm for the mean-payoff game problems is known so far. Beside such a theoretically engaging complexity status, mean-payoff games have plenty of applications, especially in the synthesis, analysis and verification of reactive (non-terminating) systems. Many natural models of such systems include quantitative information, and the corresponding question requires the solution of quantitative games, like mean-payoff games. Concrete examples of applications include various kinds of scheduling, finite-window online string matching, or more generally, analysis of online problems and algorithms, as well as selection with limited storage [18]. Mean-payoff games can even be used for solving the max-plus algebra Ax = Bx problem, which in 1 in which every edge has a positive/negative (rational) weight 2 The complexity class UP is the class of problems recognizable by unambiguous polynomial time nondeterministic Turing machines [14]. Obviously P⊆ UP ∩ coUP ⊆ NP ∩ coNP. Problems Algorithms Decision Problem Value Problem Note [12] O(|E | · |V | · W ) 2 O(|E | · |V | · W · (log|V | + logW )) Deterministic 2 [18] Θ(|E | · |V | · W ) Θ(|E | · |V | · W )3 Deterministic |V | |V | [20] O(|E | · |V | · 2 ) O(|E | · |V | · 2 · logW ) Deterministic 2 [9] min(O(|E √ | · |V | · W ), min(O(|E | · √ 3 |V | · W · (logV + logW )), Randomized 2 O( |V |·log|V |) · logW ) 2 O( |V |·log|V |) · logW ) Table 1. Complexity of the main algorithms to solve mean-payoff games. turn has further applications [6]. Beside their applicability to the modeling of quantita- tive problems, mean-payoff games have tight connections with important problems in game theory and logic. For instance, parity games [8] and the model-checking problem for the modal mu-calculus [11] are poly-time reducible to mean-payoff games [7], and it is a long-standing open question to know whether these problems are in P. Table 1 summarize the complexity of the main algorithms for solving mean-payoff games in the literature. In particular, all deterministic algorithms for mean-payoff games are either pseudopolynomial (i.e., polynomial in the number of vertices |V |, the number of edges |E|, and the maximal absolute weight W , rather than in the binary represen- tation of W ) or exponential [19, 18, 13, 12, 20, 17]. The works in [9, 3] define a ran- domized algorithm which is both subexponential and pseudopolynomial. Recently, the authors of [15, 4] show that the pseudopolynomial procedures in [18, 13, 12] can be used to design (fully) polynomial value approximation schemes for certain classes of mean- payoff games: namely, mean-payoff games with positive (integer) weights or rational weights with absolute value less or equal to 1. In this paper, we consider the problem of extending such positive approximation results for general mean-payoff games, i.e. mean-payoff games with weights arbitrary shifted/scaled on the line of rational num- bers. 2 Preliminaries and Definitions Game graphs A game graph is a tuple Γ = (V, E, w, hV0 , V1 i) where GΓ = (V, E, w) is a weighted graph and hV0 , V1 i is a partition of V into the set V0 of player-0 vertices and the set V1 of player-1 vertices. An infinite game on Γ is played for infinitely many rounds by two players moving a pebble along the edges of the weighted graph GΓ . In the first round, the pebble is on some vertex v ∈ V . In each round, if the pebble is on a vertex v ∈ Vi (i = 0, 1), then player i chooses an edge (v, v 0 ) ∈ E and the next round starts with the pebble on v 0 . A play in the game graph Γ is an infinite sequence p = v0 v1 . . . vn . . . such that (vi , vi+1 ) ∈ E for all i ≥ 0. A strategy for player i (i = 0, 1) is a function σ : V ∗ · Vi → V , such that for all finite paths v0 v1 . . . vn with vn ∈ Vi , we have (vn , σ(v0 v1 . . . vn )) ∈ E. A strategy-profile is a pair of strategies hσ0 , σ1 i, where σ0 (resp. σ1 ) is a strategy for player 0 (resp. player 1). We denote by Σi (i = 0, 1) the set of strategies for player i. A strategy σ for player i is memoryless if σ(p) = σ(p0 ) for all sequences p = v0 v1 . . . vn and p0 = v00 v10 . . . vm 0 such that 0 M vn = vm . We denote by Σi the set of memoryless strategies of player i. A play v0 v1 . . . vn . . . is consistent with a strategy σ for player i if vj+1 = σ(v0 v1 . . . vj ) for all positions j ≥ 0 such that vj ∈ Vi . Given an initial vertex v ∈ V , the outcome of the strategy profile hσ0 , σ1 i in v is the (unique) play outcomeΓ (v, σ0 , σ1 ) that starts in v and is consistent with both σ0 and σ1 . Given a memoryless strategy πi for player i in the game Γ , we denote by GΓ (πi ) = (V, Eπi , w) the weighted graph obtained by removing from GΓ all edges (v, v 0 ) such that v ∈ Vi and v 0 6= πi (v). Mean-Payoff Games A mean-payoff game (MPG) [1] is an infinite game played on a game graph Γ where player 0 wins a payoff value defined as the long-run average weights of the play, while player 1 loses that value. Formally, the payoff value of a play v0 v1 . . . vn . . . in Γ is n−1 1 X MP(v0 v1 . . . vn . . . ) = lim inf · w(vi , vi+1 ). n→∞ n i=0 The value secured by a strategy σ0 ∈ Σ0 in a vertex v is valσ0 (v) = inf MP(outcomeΓ (v, σ0 , σ1 )) σ1 ∈Σ1 and the (optimal) value of a vertex v in a mean-payoff game Γ is valΓ (v) = sup inf MP(outcomeΓ (v, σ0 , σ1 )). σ0 ∈Σ0 σ1 ∈Σ1 We say that σ0 is optimal if valσ0 (v) = valΓ (v) for all v ∈ V . Secured value and opti- mality are defined analogously for strategies of player 1. Ehrenfeucht and Mycielski [1] show that mean-payoff games are memoryless determined, i.e., memoryless strategies are sufficient for optimality and the optimal (maximum) value that player 0 can secure is equal to the optimal (minimum) value that player 1 can achieve. Theorem 1 ([1]). For all MPG Γ = (V, E, w, hV0 , V1 i) and for all vertices v ∈ V , we have valΓ (v) = sup inf MP(outcomeΓ (v, σ0 , σ1 )) = inf sup MP(outcomeΓ (v, σ0 , σ1 )), σ0 ∈Σ0 σ1 ∈Σ1 σ1 ∈Σ1 σ0 ∈Σ0 and there exist two memoryless strategies π0 ∈ Σ0M and π1 ∈ Σ1M such that valΓ (v) = valπ0 (v) = valπ1 (v). Moreover, uniform optimal strategies exist for both players, i.e. there exists a strategy profile hσ0 , σ1 i that can be used to secure the optimal value independently of the initial vertex [1]. Such a strategy profile is said the optimal strategy profile. The following lemma characterizes the shape of MPG values in a MPG Γ = (V, E, w, hV0 , V1 i) with integer weights in {−W, . . . , W }. Note that solving MPG with rational weights is P-time reducible to solving MPG with integer weights [20, 18]. Lemma 1 ([1, 20]). Let Γ = (V, E, w, hV0 , V1 i) be a MPG with integer weights and let W = max(v,v0 )∈E |w(v, v 0 )|. For each vertex v ∈ V , the optimal value valΓ (v) is a rational number nd such that 1 ≤ d ≤ |V | and |n| ≤ d · W . We consider the following three classical problems [18, 9] for a MPG Γ = (V, E, w, hV0 , V1 i): 1. Decision Problem. Given a threshold ν ∈ Q and a vertex v ∈ V , decide if valΓ (v) ≥ ν. 2. Value Problem. Compute for each vertex v ∈ V the value valΓ (v). 3. (Optimal) Strategy Problem . Given an MPG Γ , compute an (optimal) strategy profile for Γ . Approximate Solutions for MPG Dealing with approximate MPG solutions, we can take into consideration either ab- solute or relative error measures, and define the notions of additive and relative MPG approximate value. Definition 1 (MPG additive ε-value). Let Γ = (V, E, w, hV0 , V1 i) be a MPG, let v ∈ V and consider ε ≥ 0. The value val f ∈ Q is said an additive ε-value on v if and only if: f − valΓ (v)| ≤ ε |val Definition 2 (MPG relative ε-value). Let Γ = (V, E, w, hV0 , V1 i) be a MPG, let v ∈ V and consider ε ≥ 0. The value val f ∈ Q is said a relative ε-value on v if and only if: f − valΓ (v)| |val ≤ε |valΓ (v)| Note that additive MPG ε-values are shift-invariant. More precisely, if valf is an additive approximate ε-value on the vertex v in Γ = (V, E, w, hV0 , V1 i), then val f + k is an additive approximate ε-value in the MPG Γ 0 = (V, E, w + k, hV0 , V1 i), where all the weights are shifted by k. On the contrary, additive MPG ε-values are not scale-invariant. f is a relative ε-value for v in the MPG Γ = (V, E, w, hV0 , V1 i), then k · val In fact, if val f is a relative ε · k-value for v in the MPG Γ 0 = (V, E, w · k, hV0 , V1 i), where all the weights are multiplied by k. In other words, the additive error on Γ is amplified by a factor k in the scaled version of the game, Γ 0 . Conversely, relative MPG ε-values are scale invariant but not shift invariant. The notions of (fully) polynomial approximation schemes w.r.t relative and additive errors are formally defined below. Definition 3 (MPG Fully Polynomial Time Approximation Scheme (FPTAS)). An additive (resp. relative) fully polynomial approximation scheme for the MPG Γ = (V, E, w, hV0 , V1 i) is an algorithm A such that for all ε > 0, A computes an additive (resp. relative) ε-value in time polynomial w.r.t. the size3 of Γ and 1ε . Definition 4 (MPG Polynomial Time Approximation Scheme (PTAS)). An additive (resp. relative) polynomial approximation scheme for the MPG Γ = (V, E, w, hV0 , V1 i) is an algorithm A such that for all ε > 0, A computes an additive (resp. relative) ε- value in time polynomial w.r.t. the size of Γ . 3 Given Γ = (V, E, w, hV0 , V1 i), size(Γ ) = |E| + |V | + log(W ), where W is the maximum (absolute value) of a weight in Γ . 3 Mean-Payoff Games and Additive Approximation Schemes Recently, [15] provides an additive fully polynomial scheme for the MPG value prob- lem on graphs with rational weights in the interval [−1, +1]. A natural question is whether we could efficiently approximate the value in MPG with no restrictions on the weights. The next theorem shows that a generalization of the positive approxima- tion result in [15] on MPG with arbitrary (rational) weights would indeed provide a polynomial time exact solution to the MPG value problem. Theorem 2. The MPG value problem does not admit an additive FPTAS, unless it is in P. Proof. We start to consider the MPG problem on graphs with integer weights. Assume that the MPG value problem on graphs with integer weights admits an additive FPTAS. Given a MPG Γ = (V, E, w, hV0 , V1 i) and a vertex v ∈ V , let |V | = n and ε = 1 2n(n−1) . Then, our FPTAS computes an additive ε-value val on v in time polynomial f Γ w.r.t. n. By Lemma 1, val (v) is a rational number with denominator d such that 1 ≤ d ≤ n. Two rationals with denominator d for which 1 ≤ d ≤ n have distance at least 1 n(n−1) . Hence, there is a unique rational with denominator d, 1 ≤ d ≤ n, within the 1 interval I = {q ∈ Q | val f − ε ≤ q ≤ valf + ε}, where ε = . Such unique rational 2n(n−1) is valΓ (v) and can be easily found in time logarithmic w.r.t. n [16]. Thus, we have an algorithm A to solve the value problem on Γ in time polynomial w.r.t. n. The MPG problem on graphs with rational weights can be reduced in polynomial time (w.r.t. the size of Γ ) to the MPG on graphs with integer weights by simply resizing the weights in the original graph [20, 12]. t u In view of the proof of the above theorem, we could still hope to obtain some positive approximation results for general (i.e. arbitrarly scaled) MPG by considering weaker notion of approximations with respect to FPTAS. Unfortunately, the next lemma shows that the following is sufficient to show that the MPG value problem is in P: determining in time polynomial w.r.t. the size of a given MPG Γ a k-approximate value of v, where v ∈ V and k is an arbitrary constant. Theorem 3. For any constant k: If the problem of computing an additive k-approximate MPG value can be solved in polynomial time (w.r.t. the size of the input MPG), then the MPG value problem belongs to P. Proof. We start to consider MPG with integer weights. Let v be a vertex in the MPG Γ = (V, E, w : E 7→ [−W, W ], hV0 , V1 i) and denote |V | = n. If 2k + 1 > (n − 2)!, then the problem of determining valΓ (v) can be solved in time O(k k ) = O(1) by simply enumerating all the strategies available to the players. Otherwise, assume 2k +1 ≤ (n−2)!. Consider the game Γ 0 = (V, E, w0 , hV0 , V1 i), where ∀e ∈ E : w0 (e) = w(e) · n!. By hypothesis, there is an algorithm A that computes a k-approximate value val f for v on Γ 0 in time T polynomial w.r.t. the size 0 of Γ . Since log(W · n!) = O(n · log(n) + log(W )), T is also polynomial w.r.t. 0 the size of Γ . By construction, valΓ (v) is an integer. There are at most 2k + 1 in- f − k, val tegers in the interval [val f + k], thus we have at most 2k + 1 candidates { bval−kc , . . . , bval+kc } for valΓ (v). Moreover, those candidates lie in an interval of f g n! n! length L ≤ 2k+1 n! ≤ (n−2)! n! 1 = n·(n−1) . The minimum distance between two possible Γ 1 candidates for val (v) is n·(n−1) . The exact value valΓ (v) is thus the unique rational number with denominator of size at most n that lies in the interval [ bval−kc , bval+kc f g n! n! ] and can be easily found in time logarithmic w.r.t. n [16]. The MPG problem on graphs with rational weights can be reduced in polynomial time (w.r.t. the size of Γ ) to the MPG on graphs with integer weights by simply resizing the weights in the original graph [20, 12]. t u A direct consequence of Theorem 3 is that the MPG value problem does not admit a PTAS, unless it is in P. More precisely, Theorem 2 and Theorem 3 entail a result of P-time equivalence between the exact MPG value problem and the three classes of approximations listed in Corollary 1. Corollary 1. The following problems are P-time equivalent: 1. Solving the MPG value problem. 2. Determining an additive FPTAS for the MPG value problem. 3. Determining an additive PTAS for the MPG value problem. 4. Computing an additive k-approximate MPG value in polynomial time, for any con- stant k. 4 Mean-Payoff Games and Relative Approximation Schemes In the recent work in [4], the authors consider the design approximation schemes for the MPG value problem based on the relative–rather than absolute–error. In particular, they provide a relative FPTAS for the MPG value problem on graphs with nonnegative weights. Note that negative weights are necessary to encode parity games and the µ- calculus model checking into MPG games [10]. The following theorem considers the problem of designing (fully) polynomial approximation schemes for the MPG value problem on graphs with arbitrary (positive and negative) rational weights. It shows that solving such a problem would indeed provide an exact solution to the MPG value prob- lem, computable in time polynomial w.r.t. the size of the MPG. Theorem 4. The MPG value problem does not admit a relative PTAS, unless it is in P. Proof. Let Γ = (V, E, p, hV0 , V1 i) be a MPG, let v ∈ V . Assume that MPG admit a relative PTAS and consider ε = 21 . Our assumption entails that we have an algorithm A that computes a relative 12 -value val f for v in time polynomial w.r.t. the size of Γ . f ≥ 0 if and only if valΓ (v) ≥ 0. In other words, we show that the We show that val MPG decision problem is PTIME reducible to the computation of a relative 21 -value. By definition of relative ε-value, for ε = 12 , we have: f − valΓ (v)| |val 1 ≤ (1) |valΓ (v)| 2 We have four cases to consider: f − valΓ (v) ≥ 0 and val 1. In the first case, assume that val f ≥ 0. By contradiction, Γ suppose val (v) < 0. Then, Disequation implies: f − valΓ (v) ≤ 1 · |valΓ (v)| val ⇒ 2 val ≤ valΓ (v) + 21 · |valΓ (v)| < 0 f that contradicts our hypothesis. f − valΓ (v) ≥ 0 and val 2. In the second case, assume that val f ≥ f < 0. Then, 0 > val Γ val (v). that contradicts our hypothesis. f − valΓ (v) < 0 and val 3. In the third case, assume that val f < 0. By contradiction, Γ suppose val (v) > 0. Then, Disequation implies: f ≤ 1 · |valΓ (v)| valΓ (v) − val ⇒ 2 f ≥ val (v) − 1 · |valΓ (v)| ≥ 0 val Γ 2 that contradicts our hypothesis. f − valΓ (v) < 0 and val 4. The last case to consider is: val f ≥ 0. Then, valΓ (v) > f ≥ 0. val Provided a P-time algorithm for deciding whether valΓ (v) ≥ 0, a dichotomic search can be used to determine valΓ (v) in time polynomial w.r.t. the size of Γ [12, 20]. t u As a direct consequence of Theorem 4 we obtain the following result of P-time equiv- alence involving the computation of MPG exact and approximate solutions. Corollary 2. The following problems are P-time equivalent: 1. Solving the MPG value problem. 2. Determining a relative FPTAS for the MPG value problem. 3. Determining a relative PTAS for the MPG value problem. References 1. A. Ehrenfeucht and J. Mycielski. International journal of game theory. Positional Strategies for Mean-Payoff Games, 8:109–113, 1979. 2. A. V. Karzanov and V. N. Lebedev. Cyclical games with proibitions. Mathematical Pro- graming, 60:277–293, 1993. 3. D. Andersson and S. Vorobyov. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Technical Report Preprint NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK, 2006. 4. Y. Boros, K. Elbassioni, M. Fouz, V. Gurvich, K. Makino, and B. Manthey. Stochastic mean-payoff games: Smoothed analysis and approximation schemes. In Proc. of ICALP: Colloquium on Automata, Languages and Programming, 2011. 5. A. Condon. On algorithms for simple stochastic games. In Advances in Computational Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. American Mathematical Society, 1993. 6. V. Dhingra and S. Gaubert. How to solve large scale deterministic games with mean payoff by policy iteration. In Proc. Performance evaluation methodolgies and tools, article no. 12. ACM, 2006. 7. E. A. Emerson, C. Jutla, and A. P. Sistal. On model checking for fragments of the µ-calculus. In Proc. of CAV: Computer Aided Verification, LNCS 697, pages 385–396. Springer, 1993. 8. Y. Gurevich and L. Harrington. Trees, automata, and games. In Proc. of STOC: Symposium on Theory of Computing, pages 60–65. ACM, 1982. 9. H. Bjorklund and S. Vorobyov. A combinatorial strongly subexponential strategy improve- ment algorithm for mean payoff games. Discrete Applied Mathematics, 155:210–229, 2007. 10. M. Jurdzinski. Deciding the winner in parity games is in UP ∩ coUP. Inf. Process. Lett., 68(3):119–124, 1998. 11. D. Kozen. Results on the propositional mu-calculus. Theor. Comput. Sci., 27:333–354, 1983. 12. L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J-F. Raskin. Faster algorithms for mean payoff games. Formal Methods in System Design, 38(2):97–118, 2011. 13. N. Pisaruk. Mathematics of operations research. Mean Cost Cyclical Games, 4(24):817–828, 1999. 14. C. M. Papadimitriou. Computational complexity. Addison-Wesley, Reading, Massachusetts, 1994. 15. A. Roth, M. Balcan, A. Kalai, and Y. Mansour. On the equilibria of alternating move games. In Proc. of SODA: Symposium on Discrete Algorithms, pages 805–816. ACM, 2010. 16. S. Kwek and K. Mehlhorn. Optimal search for rationals. Information Processing Letters, 86:23–26, 2003. 17. S. Schewe. From parity and payoff games to linear programming. In Proceedings of MFCS: Mathematical Foundations of Computer Science, LNCS 5734, pages 675–686. Springer, 2009. 18. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343–359, 1996. 19. V. A. Gurvich, A. V. Karzanov, and L. G. Kachiyan. Ussr computational mathematics and mathematical physics. Cyclic Games and an Algorithm to Find Minmax Cycle Means in Directed Graphs, 5(28):85–91, 1988. 20. Y. Lifshits and D. Pavlov. Potential theory for mean payoff games. Journal of Mathematical Sciences, 145(3):4967–4974, 2007.