Norm Creation in Proposition Control Games Xin Sun∗ Livio Robaldo† xin.sun@uni.lu livio.robaldo@uni.lu Faculty of Science, Technology and Communication, University of Luxembourg Abstract In this paper we study how to create norms in games. We consider norms as deontic statements which are used to describe prescriptions made by certain randomized signals of which the usage is to guide agents’ behavior. Such randomized signals are implemented from correlated equilibria in games. Our proposal belongs to the offline norm creation approach [ST96]. Agents’ compliance and computational tractability have been identified as interesting problems to cope with in the offline norm creation approach. In this paper, four types of norms are studied: utilitarian norms, egalitarian norms, Nash product norms and opportunity-balanced norms. We show that in our frame- work all four types of norms can be created in polynomial time and all rational agents will comply with those norms. 1 Introduction Norms, or social laws, have been extensively studied as a mechanism for coordinating interactions within multi-agent systems. Two general approaches to the design of norms have been investigated in the literature: offline approach [ST96, vdHRW07, ÅW10, ÅvdHW12] and online approach [ST97, SA07, MLE11, MLR+ 15]. Online approach aims to equip agents with the ability to dynamically coordinate their activities, for example by explicitly reasoning about coordination at run-time or learning from the interaction with other agents. In contrast, offline techniques aim at developing a coordination regime at design-time, and build this regime into a system for use at run-time. There are arguments in favor of both approaches: the online approach is potentially more flexible, and may be more robust against unanticipated events, while the offline approach benefits from offline reasoning about coordination, thereby reducing the run-time decision-making burden on agents. This paper belongs to the offline approach. One of the most popular offline approaches is the social law paradigm, originally introduced by Shohan and Tennenholtz [ST96]. Social laws are understood as a set of rules imposed upon a multi-agent system with the goal of ensuring some desirable states. Social laws work by constraining the behavior of the agents in the system—by forbidding agents from performing certain actions in certain states. There are two disadvantages of the social law paradigm. Firstly, the computational complexity of the creation of social laws that will effectively coordinate a multi-agent system is intractable. In Shohan and Tennenholtz [ST96], the design of social law is to find suitable restrictions for the system, which is NP complete. In van der Hoek et al [vdHRW07], the design of social laws is framed as model checking problem of alternating-time temporal logic. It is again NP complete. ∗ Xin Sun has received funding from the European Union’s H2020 research and innovation programme under the Marie Skodowska-Curie grant agreement No 690974 for the project “MIREL: MIning and REasoning with Legal texts”. † Livio Robaldo has received funding from the European Unions H2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 661007 for the project “ProLeMAS: PROcessing LEgal language in normative Multi-Agent Systems”. c by the paper’s authors. Copying permitted for private and academic purposes. Copyright ⃝ In: T. Ågotnes, B. Liao, Y.N. Wang (eds.): Proceedings of the first Chinese Conference on Logic and Argumentation (CLAR 2016), Hangzhou, China, 2-3 April 2016, published at http://ceur-ws.org 12 Ågotnes and Wooldridge [ÅW10] extend the model by taking into account both the implementation costs of social laws and multiple (possibly conflicting) design objectives with different priorities. In this setting, the design of social laws becomes an optimization problem and is FPN P complete. Secondly, there is no guarantee whether agents will comply with the created social laws. There might be conflicts between the interest of agents and the designer. The complexity of the generation of social laws such that every agent will choose to comply is even more intractable. In this paper, following ideas from the game theoretical analysis of norms [Gin10], we develop a different offline paradigm, such that the complexity of the norm creation problem is tractable and every agent will comply with the created norms. The main features of our framework are the following: 1. Instead of relational structures, we use a kind of succinct game called proposition control game to represent multi- agent system. 2. Instead of constraints, norms in our framework are viewed as randomized signals, like traffic lights, to guide agents’ behavior. We generate randomized signals by computing correlated equilibrium of proposition control games. Norms are descriptions of those randomized signals using deontic statements. 3. Four kinds of norms are created: utilitarian norms, egalitarian norms, Nash product norms and opportunity-balanced norms. (a) Utilitarian norms are created from those correlated equilibria which maximize the sum of the expect utility of each agents. (b) Egalitarian norms are created from those correlated equilibria which maximize the expected utility of the weakest agent. (c) Nash product norms are created from those correlated equilibria which maximize the product of the expected utility of all agent. (d) Opportunity-balanced norms are created from those correlated equilibria which are computed by taking the average of correlated equilibrium which maximize the expect utility of every single agent. The procedure of norm creation in this paper is as follows: At first a proposition control game is given. Then we compute a correlated equilibrium of the given game. The resulting correlated equilibrium is a probability distribution over agents’ strategy profiles. We then transform the probability distribution to randomized signals to guide agents’ behavior. Finally we use deontic statements to describe the prescriptions given by those randomized signals. Those deontic statements are norms. The structure of the rest of this paper is as follows. We briefly review proposition control game in Section 2. Then in Section 3 we explain in details our framework and show how norms can be created in polynomial time. In Section 4 we discuss related work. We summarize this paper in Section 5 with suggestions of future work. 2 Proposition control game Proposition control game is a variant of Boolean game. Boolean game is super succinct in the sense that agents’ strategy and utility function are represented implicitly. Such succinctness is reached with a cost: many decision problems in Boolean games are intractable. For example deciding whether there is a pure strategy Nash equilibrium in a given Boolean game is ΣP2 hard [BLLZ06]. To find a balance between succinctness and tractability, we introduce proposition control games. In a proposition control game, the strategies available to each agent consist in assigning a truth value to each variable he can control. The goal of each agent is represented by a set of weighted formulas. Formally, let P = {p0 , p1 , . . .} be a finite set of propositional variables and let LP be the propositional language built from P. 2P is the set of all valuations for P, with the usual convention that for V ∈ 2P and p ∈ V , V gives the value true to p if p ∈ V and false otherwise. Let X ⊆ P, 2X is the set of X-valuations. A partial valuation (for P) is an X-valuation for some X ⊆ P. Partial valuations are denoted by listing all variables of X, with a “ + ” symbol when the variable is set to be true and a “ − ” symbol when the variable is set to be false: for instance, let X = {p, q, r}, then the X-valuation V = {p, r} is denoted {+p, −q, +r}. If {P1 , . . . , Pn } is a partition of P and V1 , . . . , Vn are partial valuations, where Vi ∈ 2Pi , (V1 , . . . , Vn ) denotes the valuation V1 ∪ . . . ∪ Vn . Definition 1 (proposition control game) A proposition control game is a tuple (Agent, P, π, S1 , . . . , Sn , Goal), where 1. Agent = {1, . . . , n} is a set of agents. 13 2. P is a finite set of propositional variables. 3. π : Agent %→ 2P is a control assignment function such that {π(1), . . . , π(n)} forms a partition of P. 4. For each agent i, Si ⊆ 2π(i) is his strategy set. 5. Goal = {Goal1 , . . . , Goaln } is a set of sets of weighted formulas of LP . Each Goali is a finite set {⟨x1 , m1 ⟩, . . . , ⟨xk , mk ⟩} where xj ∈ LP and mj is a real number representing the weight of xj . A strategy for agent i is a π(i)−valuation. Note that since {π(1), . . . , π(n)} forms a partition of P, a strategy profile s = (s1 , . . . , sn ) is a valuation for P. Agents’ utilities in proposition control games are induced by their goals. For every agent i and every strategy profile s, ui (s) = Σ{mj : ⟨φj , mj ⟩ ∈ Goali , s ! φj }. Agent’s preference over strategy profiles is induced by his utility function naturally: s ≤i s′ iff ui (s) ≤ ui (s′ ). Let s = (s1 , . . . , sn ) be a strategy profile, we use s−i to denote the projection of s on Agent − {i}: s−i = (s1 , . . . , si−1 , si+1 , . . . , sn ) and si to denote the projection of s on i’s strategy. The main difference between proposition control game and Boolean game is the representation of agents’ strategy set. In a proposition control game, an agent’s strategy set is a subset of the power set of the propositional variables he can control, while in a Boolean game the strategy set of each agent is the entire power set of the propositional variables he can control. Such difference is the reason for proposition control game being computationally easier than Boolean game. 1 Moreover, reality supports the restriction we made on proposition control game. In real life, it is common that we can set something to be true, but not able to set it to be false, or vice versa. For example, the president of USA is able to start a war (+w), but has no ability to stop a war (−w). That is to say, +w is in the president’s strategy set while −w is not. As another example, we are able to not hit the bull’s eye (−b) when we are throwing a dart, but not able to hit the bull’s eye (+b), which means −b is in our strategy set while +b is not. For the sake of tractability and being realistic, we sacrifice the super-succinctness of Boolean game and use proposition control game instead. Example 1 Let G = (Agent, P, π, S1 , S2 , Goal) where Agent = {1, 2}, P = {p, q, r, s}, π(1) = {p, r}, π(2) = {q, s}, S1 = {{p, r}, {p}, {r}}, S2 = {{q, s}, {q}, {s}}, Goal1 = {⟨p ↔ q, 1⟩, ⟨s, 2⟩}, Goal2 = {⟨p ∧ q, 2⟩, ⟨¬s, 1⟩}. This proposition control game is depicted as follows: +q, +s +q, −s −q, +s +p, +r (3, 2) (1, 3) (2, 0) −p, +r (2, 0) (0, 1) (3, 0) +p, −r (3, 2) (1, 3) (2, 0) 3 Norms and correlated equilibrium In game theory, correlated equilibrium is a solution concept that is more general than the well known (mixed-strategy) Nash equilibrium. But it is insufficiently studied within the normative multi-agent system community, if not completely ignored. Correlated equilibrium is first discussed in Aumann [Aum74]. A correlated equilibrium is a probability distribution over agents’ strategy profiles, which can be understood as a public randomized signal, such that each agent can choose his action according to the recommendation of the signal. If no agent would want to deviate from the recommended strategy (assuming the others don’t deviate), the probability distribution is called a correlated equilibrium. Definition 2 (correlated equilibrium) Given a proposition control game (Agent, P, π, S1 , . . . , Sn , Goal), a correlated equilibrium is a probability distribution τ over S = S1 × . . . × Sn such that for every agent i ∈ Agent, for every pair of strategies si , s′i ∈ Si , we have: ! ! τ (si , s−i ) × ui (si , s−i ) ≥ τ (si , s−i ) × ui (s′i , s−i ) s−i ∈S−i s−i ∈S−i where S−i = S1 × . . . × Si−1 × Si+1 × . . . × Sn . 1 As a side effect, proposition control game is not as succinct as Boolean game because in Boolean games strategy sets are represented implicitly. However, the succinctness of proposition control game is still better than normal form game. In a normal form game, both strategy set and utility function are represented explicitly, while in propositional control game the utility function is compactly represented by propositional formulas. Every proposition control game can be transformed into a normal form game in polynomial time. Moreover, we can improve the succinctness of proposition control game without damaging the computational tractability by representing strategy set as a propositional formula of which the true valuation can be effectively computed. Similar treatment has been studied by Bonzon [Bon07] for Boolean game. We leave such task for proposition control game as future work. 14 Intuitively, in a correlated " equilibrium the expected utility of an agent following the randomized signal produced by the correlated equilibrium ( s−i ∈S−i τ (si , s−i ) × ui (si , s−i )) is larger than his expected utility of deviating from the signal " ( s−i ∈S−i τ (si , s−i ) × ui (s′i , s−i )). An immediate positive consequence of this properties is that all those norms created from correlated equilibrium will be obeyed by rational agents because they cannot be better-off by violating those norms (deviating from the randomized signal). An equivalent characterization of correlated equilibrium is ! τ (si , s−i ) × (ui (si , s−i ) − ui (s′i , s−i )) ≥ 0. s−i ∈S−i The difference between ui (s′i , s−i ) and ui (si , s−i ) can be understood as a measure of the regret of agent i choosing s′i instead of si , when other agents chooses s−i . The regret grows when (ui (s′i , s−i )−ui (si , s−i )) grows. When (ui (si , s−i )− ui (s′i , s−i )) is positive, agent i has no regret for choosing si compared to s′i , when other agents choosing s−i . Therefore this characterization of correlated equilibrium shows that there is no expected regret for each agent in a correlated equilibrium, which is why all agents are willing to follow the signal produced by a correlated equilibrium. Example 2 Let G = (Agent, P, π, S1 , S2 , Goal) where Agent = {1, 2}, P = {p, q}, π(1) = {p}, π(2) = {q}, S1 = {{p}, ∅}, S2 = {{q}, ∅}, Goal1 = {⟨p∧q, 4⟩, ⟨p∧¬q, 1⟩, ⟨¬p∧q, 5⟩, ⟨¬p∧¬q, 0⟩}, Goal2 = {⟨p∧q, 4⟩, ⟨p∧¬q, 5⟩, ⟨¬p∧ q, 1⟩, ⟨¬p ∧ ¬q, 0⟩}. This game is called chicken game in the game theory literature. A realistic scenario for this game is two driver meet at a single lane bridge from opposite directions. The first to swerve away (playing +p or +q) yields the bridge to the other. If neither player swerves, the result is a costly deadlock in the middle of the bridge, or a potentially fatal head-on collision. It is depicted by a payoff matrix as follows: +q, −q +p (4, 4) (1, 5) −p (5, 1) (0, 0) This game has two pure Nash equilibria: pure strategy profile (−p, +q) with expected utility vector (5, 1) and (+p, −q) with expected utility vector (1, 5). It has one mixed Nash equilibrium: the mixed strategy profile (1/2, 1/2), which assign probability 1/2 to +p and +q, with expected utility vector (2.5, 2.5). τ is a correlated equilibrium in this game if the following is satisfied: • τ (+p, +q) × 4 + τ (+p, −q) × 1 ≥ τ (+p, +q) × 5 + τ (+p, −q) × 0 • τ (−p, +q) × 5 + τ (−p, −q) × 0 ≥ τ (−p, +q) × 4 + τ (−p, −q) × 1 • τ (+p, +q) × 4 + τ (−p, +q) × 1 ≥ τ (+p, +q) × 5 + τ (−p, +q) × 0 • τ (+p, −q) × 5 + τ (−p, −q) × 0 ≥ τ (+p, −q) × 4 + τ (−p, −q) × 1 Therefore τ1 (+p, +q) = 0.2, τ1 (+p, −q) = 0.3, τ1 (−p, +q) = 0.4, τ1 (−p, −q) = 0.1 is a correlated equilibrium with expected utility (3.1, 2.7). τ2 (+p, +q) = τ2 (+p, −q) = τ2 (−p, +q) = 1/3 is another correlated equilibrium with expected utility (10/3, 10/3). One advantage of correlated equilibrium from a computational perspective is that they are computationally less ex- pensive than Nash equilibrium.2 This can be captured by the fact that computing a correlated equilibrium only requires solving a linear program whereas solving a Nash equilibrium requires solving a mixed integer programming problem. It is known from the algorithmic game theory literature that correlated equilibrium of normal form games can be computed in polynomial time [PR08], while computing Nash equilibrium is PPDA complete [DGP09]. Although we view norms are created from correlated equilibrium, this does not mean each correlated equilibrium should be implemented as norms. Instead, we are only interested in those correlated equilibria which satisfy certain desired properties, for example maximize social welfare, protect the weakest member of the society, or ensure fairness. In the theory pf multiagent resource allocation [CDE+ 06], three types of social welfare are widely used. • Utilitarian social welfare is the sum of the utilities of all agents. 2 Gintis [Gin10] presents more conceptual justifications for viewing norms as correlated equilibria instead of Nash equilibria. 15 • Egalitarian social welfare is the utility of the worst off agents. • Nash product social welfare is the product of the utilities of all agents. Correspondingly we introduce utilitarian/egalitarian/Nash product correlated equilibrium. Moreover, we introduce opportunity-balanced correlated equilibrium, which in certain sense ensures fairness. In this paper we only use those types of correlated equilibrium to create norms. 3.1 Utilitarian correlated equilibrium For each agent i, his expected utility in a correlated equilibrium τ is calculated as follows: ! EUi (τ ) = τ (s) × ui (s) s∈S Given a proposition control game (Agent, P, π, S1 , . . . , Sn , Goal) and a correlated equilibrium τ . The utilitarian social welfare induced by τ is ! SW u (τ ) = EUi (τ ) i∈Agent Utilitarian social welfare sums up the agents’ expected utilities in a given correlated equilibrium, thus providing a useful measure of the overall benefit of the society. Definition 3 (utilitarian correlated equilibrium) Given a proposition control game, and Θ the set of all correlated equi- librium of this game. τ ∈ Θ is a utilitarian correlated equilibrium if τ maximize SW u (τ ), i.e. for all τ ′ ∈ Θ, SW u (τ ) ≥ SW u (τ ′ ) . A utilitarian correlated equilibrium can be computed using linear programming with maximizing SW u (τ ) as the ob- jective function and requirements of correlated equilibrium as constrains. Utilitarian norms are deontic statements which describes those randomized signals implemented from utilitarian correlated equilibrium. For example, in the chicken game utilitarian correlated equilibrium are solutions of the following linear program: max τ (+p, +q) × (4 + 4) + τ (−p, +q) × (5 + 1) + τ (+p, −q) × (1 + 5) + τ (−p, −q) × (0 + 0) subject to • τ (+p, +q) × 4 + τ (+p, −q) × 1 ≥ τ (+p, +q) × 5 + τ (+p, −q) × 0 • τ (−p, +q) × 5 + τ (−p, −q) × 0 ≥ τ (−p, +q) × 4 + τ (−p, −q) × 1 • τ (+p, +q) × 4 + τ (−p, +q) × 1 ≥ τ (+p, +q) × 5 + τ (−p, +q) × 0 • τ (+p, −q) × 5 + τ (−p, −q) × 0 ≥ τ (+p, −q) × 4 + τ (−p, −q) × 1 • τ (+p, +q) + τ (−p, +q) + τ (+p, −q) + τ (−p, −q) = 1 • τ (+p, +q), τ (−p, +q), τ (+p, −q), τ (−p, −q) ≥ 0 Let τ (+p, +q) be x1 , τ (−p, +q) be x2 , τ (+p, −q) be x3 and τ (−p, −q) be x4 . This linear programme is transformed to: max 8x1 + 6x2 + 6x3 subject to • x1 − x3 ≤ 0 • −x2 + x4 ≤ 0 • x1 − x2 ≤ 0 16 • −x3 + x4 ≤ 0 • x1 + x2 + x3 + x4 = 1 • x1 , x2 , x3 , x4 ≥ 0 Using techniques of linear programming, we find a solution for the linear program with x1 = x2 = x3 = 13 . Such a utilitarian correlated equilibrium can be transformed to randomized signals. For example we can use a random number generator to generate a real number between 0 and 1. If the number is less than 13 , then we show a red signal to agent 1 and a green signal to agent 2. If the number is larger than 23 , then we show a green signal to agent 1 and a red signal to agent 2. Otherwise we show a red signal to both agents. Then we create the following two norms for agent 1: • if the signal you see is red, then p is obligatory. • if the signal you you see is green, then ¬p is obligatory. And we generate the following two norms for agent 2: • if the signal you see is red, then q is obligatory. • if the signal you you see is green, then ¬q is obligatory. Note that to ensure that each agent to comply with the norms, we should make the probability distribution of signals public meanwhile keep signals private for each agent. When one agent sees both agents’ signals, it is possible that he want to violate the norms. For example, when agent 1 see a red signal for himself as well as a red signal for agent 2, then he has an incentive to choose −p, which is a violation of his norm. But if agent 1 only sees his own red signal, then he knows that the probability of the signal for agent 2 being red is 12 . Therefore if he choose −p, his expected utility will be 52 , which is no larger than his expected utility of choosing +p. The incentive of violation disappears. Such treatment of publicity and privacy is also required for implementing other types of norms in order to ensure agent’s compliance. Now consider a different example where one ambulance meet a bus in a crossroad. Example 3 (ambulance game) Let G = (Agent, P, π, S1 , S2 , Goal) where Agent = {1, 2}, P = {p, q}, π(1) = {p}, π(2) = {q}, S1 = {{p}, ∅}, S2 = {{q}, ∅}, Goal1 = {⟨p, −10⟩, ⟨¬p ∧ q, 10⟩, ⟨¬p ∧ ¬q, −20⟩}, Goal2 = {⟨q, 0⟩, ⟨p ∧ ¬q, 5⟩, ⟨¬p ∧ ¬q, −10⟩}. It is depicted as follows: +q, −q +p (−10, 0) (−10, 5) −p (10, 0) (−20, −10) Here +p means the ambulance stop while +q means the bus stop. τ is a correlated equilibrium in this game if the following is satisfied: • τ (+p, +q) × (−10) + τ (+p, −q) × (−10) ≥ τ (+p, +q) × 10 + τ (+p, −q) × (−20) • τ (−p, +q) × 10 + τ (−p, −q) × (−20) ≥ τ (−p, +q) × (−10) + τ (−p, −q) × (−10) • τ (+p, +q) × 0 + τ (−p, +q) × 0 ≥ τ (+p, +q) × 5 + τ (−p, +q) × (−10) • τ (+p, −q) × 5 + τ (−p, −q) × (−10) ≥ τ (+p, −q) × 0 + τ (−p, −q) × 0 Let τ (+p, +q) be x1 , τ (−p, +q) be x2 , τ (+p, −q) be x3 and τ (−p, −q) be x4 . The linear program we need to calculate correlated equilibria is: max z = −10x1 + 10x2 − 5x3 − 30x4 subject to • 20x1 − 10x3 ≤ 0 • −20x2 + 10x4 ≤ 0 17 • 5x1 − 10x2 ≤ 0 • −5x3 + 10x4 ≤ 0 • x1 + x2 + x3 + x4 = 1 • x1 , x2 , x3 , x4 ≥ 0 Using techniques of linear programming, we find a solution for the linear program with x2 = 1, x1 = x3 = x4 = 0. Such a utilitarian correlated equilibrium can be implemented to a norm: • ¬p ∧ q is obligatory. The norm created in the ambulance game explains real traffic rule quite well. Theorem 1 Utilitarian correlated equilibria can be computed in polynomial time. Proof : A utilitarian correlated equilibrium can be computed using linear programming with the objective function max- imizing the sum of the expected utility of all agents. A linear program can be solved in polynomial time using standard techniques like ellipsoid algorithms Alevras and Padberg [AP01]. " 3.2 Egalitarian correlated equilibrium The concept of egalitarian social welfare is inspired by the work of Rawls [Raw71]. Intuitively, egalitarian social wel- fare is measured by the situation of the weakest member of society. Given a proposition control game (Agent, P, π, S1 , . . . , Sn , Goal) and a correlated equilibrium τ . The egalitarian social welfare induced by τ is SW e (τ ) = min{EUi (τ ) : i ∈ Agent} Egalitarian social welfare offers a measure of fairness in cases where the minimum needs of all agents has to be satisfied. Definition 4 (egalitarian correlated equilibrium) Given a proposition control game, and Θ the set of all correlated equilibrium of this game. τ ∈ Θ is a egalitarian correlated equilibrium if τ maximize SW e (τ ), i.e. for all τ ′ ∈ Θ, SW e (τ ) ≥ SW e (τ ′ ) . Egalitarian norms are deontic statements of the prescription of those randomized signals transformed from egalitarian correlated equilibria. Computing egalitarian correlated equilibrium seems to be more difficult than computing utilitarian correlated equilibrium at the first sight because maximizing egalitarian social welfare is not a linear function. However, the good news is, we can still use linear programming to compute egalitarian correlated equilibrium efficiently, as the following theorem shows. Theorem 2 Egalitarian correlated equilibrium can be computed in polynomial time. Proof : We use linear programming to compute egalitarian correlated equilibrium. The variables of the linear programming are random variables τ (s), τ (s′ ), . . . , representing the probability assigned to each strategy profile of the game. The objective function of the linear programming is to maximize the egalitarian social welfare, which is represented by a variable z. There are four groups of constrains in this linear program. 1. τ (s) ≥ 0, for all s ∈ S1 × . . . × Sn . 2. ! τ (s) = 1 s∈S1 ×...×Sn These two groups of constrains together ensures that the function τ is a probability distribution over strategy profiles. 3. For all i,, for all si and s′i ∈ Si , ! τ (si , s−i ) × ui (si , s−i ) ≥ s−i ∈S−i ! τ (si , s−i ) × ui (s′i , s−i ) s−i ∈S−i The third group of constrains say that the solution of this linear programming must be a correlated equilibrium. 18 4. For all i, ! ui (s)τ (s) ≥ z s∈S1 ×...×Sn The fourth group of constrains ensures that z is no larger than the expected utility of any agent in this correlated equilibrium. These four groups of constrains together ensures that the solution of this linear programming indeed maximizes the egalitarian social welfare. It can be verified that the size of this linear program is polynomial to the size of the give proposition control game. Since a linear program can be solved in polynomial time, we know that egalitarian correlated equilibrium can be computed in polynomial time. " 3.3 Nash product correlated equilibrium Given a proposition control game (Agent, P, π, S1 , . . . , Sn , Goal) in which every goal of every agent is non-negative and a correlated equilibrium τ . The Nash product social welfare induced by τ is # SW n (τ ) = EUi (τ ) i∈Agent Nash product social welfare can be regarded as a compromise between the utilitarian and the egalitarian social welfare. On the one hand, just as utilitarian social welfare, the Nash product increases with single increasing of individual utilities. On the other hand, just as egalitarian social welfare, the Nash product reaches its maximum when the utilities distributed equally over all agents. Definition 5 (Nash product correlated equilibrium) Given a proposition control game, and Θ the set of all correlated equilibrium of this game. τ ∈ Θ is a Nash product correlated equilibrium if τ maximize SW n (τ ), i.e. for all τ ′ ∈ Θ, SW n (τ ) ≥ SW n (τ ′ ) . Nash product norms are deontic statements of the prescription of those randomized signals transformed from Nash product correlated equilibrium. The following theorem shows that the computation of Nash product correlated equilibrium is tractable. Theorem 3 Nash product correlated equilibrium can be computed in polynomial time. Proof : We use convex optimization to compute Nash product correlated equilibrium.3 Suppose s1 , . . . , sk are all strategy profiles in the given game. We use τ (s1 ), . . . , τ (sk ) to represent the probability assigned to each strategy profile of the game. To compute Nash product correlated equilibrium we need to maximize the Nash product social welfare: $ " i∈Agent s∈S τ (s) × ui (s) = $ 1 1 k k i∈Agent (τ (s ) × ui (s ) + . . . + τ (s ) × ui (s )) = (τ (s1 ) × u1 (s1 ) + . . . + τ (sk ) × u1 (sk )) × . . . × (τ (s1 ) × un (s1 ) + . . . + τ (sk ) × un (sk )) = $ $ (τ (s1 ))n × i∈Agent ui (s1 ) + . . . + (τ (sk ))n × i∈Agent ui (sk ) Note that the above function is a convex function because every goal of every agent is non-negative, which ensures that ui (sj ) is non-negative. The above function is maximized iff the following function is minimized $ $ −log((τ (s1 ))n × i∈Agent ui (s1 ) + . . . + (τ (sk ))n × i∈Agent ui (sk )) (!) Note that (!) is also a convex function because the function f (x) = −logx is a convex function and the composition of two convex functions is again a convex function. Now we take minimizing (!) as our objective function and built a convex optimization problem by imposing four groups of constrains the same as in the proof of Theorem 2. Such a convex optimization problem can be solved using standard techniques from convex optimization, say interior-point methods, in polynomial time. " 3 A comprehensive introduction of convex optimization can be found in Boyd and Vandenberghe [BV04]. 19 3.4 Opportunity-balanced correlated equilibrium A fourth kind of correlated equilibrium of interest to us is what we call opportunity-balanced correlated equilibrium. Intuitively, an opportunity-balanced correlated equilibrium is the average of those correlated equilibria which maximize each single agent’s expected utility. Definition 6 (opportunity-balanced correlated equilibrium) Given a proposition control game (Agent, P, π, S1 , . . . , Sn , Goal), an opportunity-balanced correlated equilibrium is a correlated equilibrium τ such that τ (s) = (τi (s) + . . . + τn (s))/n, where τi is a correlated equilibrium which maximize the expected utility of agent i. The set of all correlated equilibria of a game is a convex set because correlated equilibrium is defined using linear con- strains. Therefore the average of finite correlated equilibria is again a correlated equilibrium. Example 4 (opportunity-balanced correlated equilibrium for the chicken game) We first solve two linear programs with objective function max 4x1 + 5x2 + x3 and 4x1 + x2 + 5x3 respectively. Solving the first linear program give us τ1 with τ1 (x2 ) = 1, τ1 (x1 ) = τ1 (x3 ) = τ1 (x4 ) = 0. Solving the second linear program give us τ2 with τ2 (x3 ) = 1, τ2 (x1 ) = τ2 (x2 ) = τ2 (x4 ) = 0. Then we calculate the opportunity-balanced correlated equilibrium τ by taking the average of τ1 and τ2 : τ (x2 ) = τ (x3 ) = 12 , τ (x1 ) = τ (x4 ) = 0. Example 5 (opportunity-balanced correlated equilibrium for the ambulance game) We first solve two linear programs with objective function max −10x1 + 10x2 − 10x3 − 20x4 and 5x3 − 10x4 respectively. Solving the first linear program give us τ1 with τ1 (x2 ) = 1, τ1 (x1 ) = τ1 (x3 ) = τ1 (x4 ) = 0. Solving the second linear program give us τ2 with τ2 (x3 ) = 1, τ2 (x1 ) = τ2 (x2 ) = τ2 (x4 ) = 0. Then we calculate the opportunity-balanced correlated equilibrium τ by taking the average of τ1 and τ2 : τ (x2 ) = τ (x3 ) = 12 , τ (x1 ) = τ (x4 ) = 0. Opportunity-balanced norms are deontic statements of the prescription of those randomized signals transformed from opportunity-balanced correlated equilibria. The following theorem shows that the computation of opportunity-balanced correlated equilibria is tractable. Theorem 4 Opportunity-balanced correlated equilibria can be computed in polynomial time. Proof : Given an arbitrary proposition control game with n agents. An opportunity-balanced correlated equilibrium can be computed as follows: 1. For each agent, use linear programming to compute a correlated equilibrium which maximizes the agent’s expected utility. 2. Take the average of the n correlated equilibria from the previous step. Since each linear program can be solved in polynomial time, an opportunity-balanced correlated equilibrium can be computed in polynomial time. " 4 Related work Except the literature mentioned in the introductory section, the generation of norms is also studied in sociology and philos- ophy. In Binmore [Bin05] social norms are tools to solve the equilibrium selection in a coordination game with multiple Nash equilibria. His approach is based on evolutionary game theory as he believes that the interaction of agent’s strategies are infinitely repeated games. While all equilibria in a game are not pareto optimal, norms arise to solve the equilibrium selection problem. Other evolutionary game theoretical approach of norm emergence can be found in Alexander [Ale07] and Skyrms [Sky14]. Gintis [Gin10] present his work as a complementary step for the work of Binmore. Instead of Nash equilibrium, Ginitis propose to use correlated equilibrium to generate norms. He considered norms as choreographers who send signals to agents to guide agents’ behavior. A rational agent would follow the suggestion of choreographers if he believes that others agents are following it, too. This paper is a further development of Gintis’ original work from a computational perspective. We understand creation and emergence as two different methods of norm generation. It seems concepts from evolution- ary game theory, like evolutionary stable strategy and population dynamics, are more suitable for norm emergence, while concepts from traditional game theory, like Nash/correlated equilibrium, are more suitable for norm creation. 20 5 Summary and future work In this paper we study how to create norms in proposition control games. We consider norms as deontic statements which are used to describe the prescriptions given by certain randomized signals of which the usage is to guide agents’ behavior. Such randomized signals are implemented from correlated equilibrium in games. Four types of norms are studied: utilitarian norms, egalitarian norms, Nash product norms and opportunity-balanced norms. All these norms can be created in polynomial time in our framework. Moreover, since all these norms created from correlated equilibrium, it is to each agent’s interest to comply with these norms. An avenue of future work is to investigate the limitation of norms. In those games where there is a single pure Nash equilibrium, like the well-known prisoner’s dilemma, the only norm which will be created using our method is the norm indicating the Nash equilibrium is obligatory, because the Nash equilibrium is also the unique correlated equilibrium. Therefore in such situations norms are not helpful in avoiding dominated outcomes. What we need in those situations is a stronger mechanism, for example laws, to enforce punishment to help agents to avoid undesirable outcomes. References [Ale07] J. McKenzie Alexander. The Structural Evolution of Morality. Cambridge University Press, 2007. [AP01] Dimitris Alevras and Manfred W. Padberg. Linear Optimization and Extensions: Problems and Solutions. Springer, 2001. [Aum74] Robert Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67–96, 1974. [ÅvdHW12] Thomas Ågotnes, Wiebe van der Hoek, and Michael Wooldridge. Conservative social laws. In Luc De Raedt, Christian Bessière, Didier Dubois, Patrick Doherty, Paolo Frasconi, Fredrik Heintz, and Peter J. F. Lucas, editors, ECAI 2012 - 20th European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstrations Track, Montpellier, France, August 27-31 , 2012, volume 242 of Frontiers in Artificial Intelligence and Applications, pages 49–54. IOS Press, 2012. [ÅW10] Thomas Ågotnes and Michael Wooldridge. Optimal social laws. In Wiebe van der Hoek, Gal A. Kaminka, Yves Lespérance, Michael Luck, and Sandip Sen, editors, 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Toronto, Canada, May 10-14, 2010, Volume 1-3, pages 667–674. IFAAMAS, 2010. [Bin05] Ken Binmore. Natural Justice. Oxford University Press, 2005. [BLLZ06] Elise Bonzon, Marie-Christine Lagasquie-Schiex, Jérôme Lang, and Bruno Zanuttini. Boolean games re- visited. In Gerhard Brewka, Silvia Coradeschi, Anna Perini, and Paolo Traverso, editors, ECAI 2006, 17th European Conference on Artificial Intelligence, August 29 - September 1, 2006, Riva del Garda, Italy, In- cluding Prestigious Applications of Intelligent Systems (PAIS 2006), Proceedings, volume 141 of Frontiers in Artificial Intelligence and Applications, pages 265–269. IOS Press, 2006. [Bon07] Elise Bonzon. Modélisation des interactions entre agents rationnels : les jeux boolens. PhD thesis, Universit Paul Sabatier, 2007. [BV04] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [CDE+ 06] Yann Chevaleyre, Paul E. Dunne, Ulle Endriss, Jérôme Lang, Michel Lemaı̂tre, Nicolas Maudet, Julian A. Padget, Steve Phelps, Juan A. Rodrı́guez-Aguilar, and Paulo Sousa. Issues in multiagent resource allocation. Informatica (Slovenia), 30(1):3–31, 2006. [DGP09] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing a nash equilibrium. Commun. ACM, 52(2):89–97, 2009. [Gin10] Herbert Gintis. Social norms as choreography. politics, philosophy & economics, 9(3):251–264, 2010. [MLE11] Javier Morales, Maite López-Sánchez, and Marc Esteva. Using experience to generate new regulations. In Toby Walsh, editor, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 307–312. IJCAI/AAAI, 2011. 21 [MLR+ 15] Javier Morales, Maite López-Sánchez, Juan Antonio Rodrı́guez-Aguilar, Michael Wooldridge, and Wamberto Weber Vasconcelos. Synthesising liberal normative systems. In Gerhard Weiss, Pinar Yolum, Rafael H. Bordini, and Edith Elkind, editors, Proceedings of the 2015 International Conference on Au- tonomous Agents and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015, pages 433–441. ACM, 2015. [PR08] Christos H. Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. J. ACM, 55(3), 2008. [Raw71] John Rawls. A Theory of Justice. Oxford University Press, 1971. [SA07] Sandip Sen and Stéphane Airiau. Emergence of norms through social learning. In Manuela M. Veloso, editor, IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pages 1507–1512, 2007. [Sky14] Brian Skyrms. Evolution of the Social Contract. Cambridge University Press, 2014. [ST96] Yoav Shoham and Moshe Tennenholtz. On social laws for artificial agent societies: Off-line design. Artif. Intell., 73(1-2):231–252, 1996. [ST97] Yoav Shoham and Moshe Tennenholtz. On the emergence of social conventions: Modeling, analysis, and simulations. Artif. Intell., 94(1-2):139–166, 1997. [vdHRW07] Wiebe van der Hoek, Mark Roberts, and Michael Wooldridge. Social laws in alternating time: effectiveness, feasibility, and synthesis. Synthese, 156(1):1–19, 2007. 22