The Force of Innovation Emergence and Extinction of Messages in Signaling Games Roland Mühlenbernd1 , Jonas David Nick1 , and Christian Adam1 University of Tübingen Abstract. Lewis [L1] invented signaling games to show that meaning convention can arise simply from regularities in communicative behav- ior. The precondition for the emergence of such conventions are so-called perfect signaling systems. In a series of articles the emergence of such sig- naling systems was addressed by combining signaling games with learning dynamics; and not uncommonly researchers examined the circumstances aggravating the emergence of perfect signaling. It could be shown that especially by increasing the number of states, messages and actions for a signaling game perfect signaling becomes more and more improbable. This paper contributes to the question how the capability of innovation through emergence of new messages and extinction of unused messages would change these outcomes. Our results show that innovation in fact supports the emergence of perfect signaling. 1 Introduction With the objective to explore the evolution of semantic meaning, signaling games re- cently became a leading model for this purpose. In line with this trend researchers used simulations to explore agents’ behavior in repeated signaling games. Within this field of study two different lines of research are apparent: i) the simulation of a repeated 2-player signaling game combined with agent-based learning dynamics, in the majority of cases with the dynamics reinforcement learning (e.g. [B1], [BZ1], [S1]) and ii) evolu- tionary models by simulating population behavior, wherein signaling games are usually combined with the population-based replicator dynamics (e.g. [HH1], [HSRZ1]). To fill the gap between both accounts, recent work deals with applying repeated signaling games combined with agent-based dynamics on social network structures or at least multi-agents accounts (e.g. [Z1], [W1], [M1], [MF1]). With this paper we want to make a contribution to this line of research. Barrett ([B1]) could show that i) for the simplest variant of a signaling game, called Lewis game, combined with a basic version of the learning dynamic reinforce- ment learning in a 2-player repeated game conventions of meaningful language use emerge in any case, but ii) by extending the domains1 of the signaling game those conventions become more and more improbable. Furthermore the number of possible perfect signaling systems increases dramatically. This let surmise the motive that up to now researchers applied only the simple variant Lewis game on population and keep the hands off domain-extended signaling games. Because if even two players fail to learn 1 With domains we refer to the sets of states, messages and action, which will be introduced in the following section. R.K. Rendsvig and S. Katrenko (Eds.): ESSLLI 2012 Student Session Proceedings, CEUR Work- shop Proceedings vol.: see http://ceur-ws.org/, 2012, pp. 93–??. 94 Mühlenbernd, Nick & Adam perfect signaling from time to time, multiple players will not only have this problem, but also be confronted with an environment evolving to Babylon, where a great many of different signaling systems may evolve. With this article we will show that by extending the learning dynamics to allow for innovation we can observe i) an improvement of the probability that perfect signaling emerges for domain-extended signaling games and ii) a restriction of the number of evolving perfect signaling systems in a population, even if the number of possible systems is huge. This article is divided in the following way: in Section 2 we’ll introduce some basic notions of repeated signaling games, reinforcement learning dynamics and multi-agent accounts; in Section 3 we’ll take a closer look at the variant of reinforcement dynamics we used - a derivative of Bush-Mosteller reinforcement; Section 4 is about how implementing innovation of new and extinction of unused messages significantly improves our results; we’ll finish with some implications of our approach in Section 5. 2 Signaling Games and Learning A signaling game SG = h{S, R}, T, M, A, P r, U i is a game played between a sender S and a receiver R. Initially, nature selects a state t ∈ T with prior probability Pr(t) ∈ ∆(T )2 , which the sender observes, but the receiver doesn’t. S then selects a message m ∈ M , and R responds with a choice of action a ∈ A. For each round of play, players receive utilities depending on the actual state t and the response action a. We will here be concerned with a variant of this game, where the number of states is on par with the number of actions (|T | = |A|). For each state t ∈ T there is exactly one action a ∈ A that leads to successful communication. This is expressed by the utility function U (ti , aj ) = 1 if i = j and 0 otherwise. This utility function expresses the particular nature of a signaling game, namely that because successful communication doesn’t depend on the used message, there is no predefined meaning of messages. A signaling game with n states and n messages is called an n × n-game, whereby n is called the domain of the game. 2.1 Strategies and Signaling Systems Although messages are initially meaningless in this game, meaningfulness arises from regularities in behavior. Behavior is defined in terms of strategies. A behavioral sender strategy is a function σ : T → ∆(M ), and a behavioral receiver strategy is a function ρ : M → ∆(A). A behavioral strategy can be interpreted as a single agent’s proba- bilistic choice or as a population average. For a 2 × 2-game, also called Lewis game, exactly two isomorphic strategy profiles constitute a perfect signaling system. In these, strategies are pure (i.e. action choices have probabilities 1 or 0) and messages associate states and actions uniquely, as depicted in Figure 1. It is easy to show that for an n × n-game the number of perfect signaling systems is n!. This means that while for a Lewis game we get the 2 signaling systems as mentioned above, for a 3 × 3-game we get 6, for a 4 × 4-game 24, and for a 8 × 8-game more than 40,000 perfect signaling systems. Moreover for n × n-games with n > 2 there is a possibility of partial pooling equilibria, which transmit information in a fraction of all possible cases. 2 ∆(X) : X → R denotes a probability distribution over random variable X. The Force of Innovation 95 t1 m1 a1 t1 m1 a1 L1 : L2 : t2 m2 a2 t2 m2 a2 Fig. 1. Two perfect signaling systems of a 2 × 2-game, consisting of a pure sender and receiver strategy. 2.2 Models of Reinforcement Learning The simplest model of reinforcement learning is Roth-Erev reinforcement (see [RE1]) and can be captured by a simple model based on urns, known as Pólya urns, which works in the following way: an urn contains balls of different types, each type corre- sponding to an action choice. Now drawing a ball means to perform the appropriate action. An action choice can be successful or unsuccessful and in the former case, the number of balls of the appropriate act will be increased by one, such that the prob- ability for this action choice is increased for subsequent draws. All in all this model ensures that the probability of making a particular decision depends on the number of balls in the urn and therefore on the success of past action choices. This leads to the effect that the more successful an action choice is, the more probable it becomes to be elected in following draws. But Roth-Erev reinforcement has the property that after a while the learning effect3 slows down: while the number of additional balls for a successful action is a static number α, in the general case α = 1, as mentioned above, the overall number of balls in the urn is increasing over time. E.g. if the number of ball in the urn at time τ is n, the number at a later time τ + ǫ must be m ≥ n. Thus the learning effect is changing from α/n to α/m and therefore can only decrease over time. Bush-Mosteller reinforcement (see [BM1]) is similar to Roth-Erev reinforcement, but without slowing the learning effect down. After a reinforcement the overall number of balls in an urn is adjusted to a fixed value c, while preserving the ratio of the different balls. Thus the number of balls in the urn at time τ is c and the number at a later time τ + ǫ is c and consequently the learning effect stays stable over time at α/c. A further modification is the adaption of negative reinforcement: while in the stan- dard account unsuccessful actions have no effect on the urn value, with negative rein- forcement unsuccessful communication is punished by decreasing the number of balls leading to an unsuccessful action. By combining Bush-Mosteller reinforcement with negative reinforcement, the re- sulting learning dynamic follows the concept of lateral inhibition. In particular, a suc- cessful action will not only increase its probability, but also decrease the probability of competing actions. In our account lateral inhibition applies to negative reinforce- ment as well: for an unsuccessful action the number of the appropriate balls will be decreased, while the number of each other type of ball will be increased. 3 The learning effect is the ratio of additional balls for a successful action choice to the overall number of balls. 96 Mühlenbernd, Nick & Adam 2.3 Applying Reinforcement Learning on Repeated Sig- naling Games To apply reinforcement learning to signaling games, sender and receiver both have urns for different states and messages and make their decision by drawing a ball from the appropriate urn. We assume the states are equally distributed. The sender has an urn ✵t for each state t ∈ T , which contains balls for different messages m ∈ M . The number of balls of type m in urn ✵t designated with m(✵t ), the overall number of balls in urn ✵t with |✵t |. If the sender is faced with a state t she draws a ball from urn ✵t and sends message m, if the ball is of type m. Accordingly the receiver has an urn ✵m for each message m ∈ M , which contains balls for different actions a ∈ A, whereby the number of balls of type a in urn ✵m designated with a(✵m ), the overall number of balls in urn ✵m with |✵m |. For a receiver message m the receiver draws a ball from urn ✵m and plays the action a, if the ball is of type a. Thus the sender’s behavioral strategy σ and receiver’s behavioral strategy ρ can be defined in the following way: m(✵t ) a(✵m ) σ(m|t) = (1) ρ(a|m) = (2) |✵t | |✵m | The learning dynamics are realized by changing the urn content dependent on the communicative success. For a Roth-Erev reinforcement account with a positive update value α ∈ N > 0 and a lateral inhibition value γ ∈ N ≥ 0 the following update process is executed after each round of play: if communication via t, m and a is successful, the number of balls in the sender’s urn ✵t is increased by α balls of type m and reduced by γ balls of type m′ 6= m. Similarly, the number of balls in the receiver’s urn ✵m is increased by α balls of type a and reduced by γ balls of type a′ 6= a. Furthermore for an account with negative reinforcement urn contents also change in the case of unsuccessful communication for the negative update value β ∈ N ≥ 0 in the following way: if communication via t, m and a is unsuccessful, the number of balls in the sender’s urn ✵t is decreased by β balls of type m and and increased by γ balls of type m′ 6= m; the number of balls in the receiver’s urn ✵m is decreased by β balls of type a and increased by γ balls of type a′ 6= a. The lateral inhibition value γ ensures that the probability of an action can get zero and it speeds up the learning process. We extended the Bush-Mosteller reinforcement for applying it to games with more than two messages. The content of the appropriate sender and receiver urns will be adjusted to a predefined value in the following way: for the given value c of fixed urn content it is assumed that before a round of play the urn content of all sender and receiver urns |✵| = c. After a round of play it may be the case that the urn content |✵| = d 6= c. Now the number ni of each type of ball i is multiplied by c/d.4 For two messages the Bush-Mosteller is equivalent to our extension by setting the learning c·α parameter of the original model to φ = c+α . 2.4 Multi-Agent Accounts It is interesting not only to examine the classical 2-players sender-receiver game, but the behavior of agents in a society (e.g. [Z1], [W1], [M1], [MF1]): more than 2 agents interact with each other and switch between sender and receiver role. In this way an agent can learn a sender and a receiver strategy as well. Now if such a combination forms a signaling system, it is called a signaling language and the corresponding agent 4 In this account urn contents and numbers of balls are real numbers The Force of Innovation 97 RFR 60% Game RFR Roth-Erev 2 × 2 0% 40% Bush-Mosteller 3 × 3 9.6% B-M + LI 4 × 4 21.9% 20% 8 × 8 59.4% 0% 3×3 4×4 8×8 Fig. 2. Left: Barrett’s results for different n × n games. Right: Comparison of differ- ent learning dynamics: Barrett’s results of Roth-Erev reinforcement, results for Bush- Mosteller reinforcement without and with lateral inhibition. is called a learner. Thus the number of different possible signaling languages is defined by the number of possible signaling systems and therefore for a n × n-game there are n! different languages an agent can learn. Furthermore if an agent’s combination of sender and receiver strategy forms a pooling system, it is called a pooling language. After all it is easy to show that the number of possible pooling languages outvalues the number of possible signaling languages for any kind of n × n-game. 3 Simulating Bush-Mosteller Barrett (see [B1]) simulated repeated signaling games with Roth-Erev reinforcement in the classical sender-receiver variant and computed the run failure rate (RFR). The RFR is the proportion of runs not ending with communication via a perfect signaling system. Barrett started 105 runs for n × n-games with n ∈ {2, 3, 4, 8}. His results show that 100% (RFR = 0) of 2 × 2-games were successful. But for n × n-games with n > 2, the RFR increases rapidly(Figure 2, left). To compare different dynamics, we started two lines of simulation runs for Bush- Mosteller reinforcement in the sender-receiver variant with urn content parameter c = 20 and reinforcement value α = 1. For the second line we additionally used lateral inhibition with value γ = 1/|T |. We tested the same games like Barrett and correspond- ingly 105 runs per game. In comparison with Barrett’s findings our simulation outcomes i) resulted also in a RFR of 0 for the 2 × 2-game, but ii) revealed an improvement with Bush-Mosteller reinforcement for the other games, especially in combination with lat- eral inhibition (see Figure 2, right). Nevertheless, the RFR is never 0 for n × n-games with n > 2 and gets worse for increasing n-values, independent of the dynamics. To analyze the behavior of agents in a multi-agent account, we started with the smallest group of agents in our simulations: three agents arranged in a complete net- work. In contrast to our first simulations all agents communicate as sender and as receiver as well and can learn not only a perfect signaling system, but a signaling lan- guage. Furthermore it was not only to examine if the agents have learned a language, but how many agents learned one. With this account we started between 500 and 1000 simulation runs with Bush-Mosteller reinforcement (α = 1, c = 20) for n × n-games with n = 2 . . . 8. Each simulation run stopped, when each agent in the network has learned a signaling language or a pooling language. We measured the percentage of sim- ulation runs ending with no, one, two or three agents, which have learned a signaling language. 98 Mühlenbernd, Nick & Adam 100% 100% 1 no learners 3 agents account 3 1 1 learner 5 agents account 80% 2 2 learners 0 80% 3 3 learners 0 60% 60% 0 40% 2 2 0 1 40% 3 1 1 1 1 20% 1 0 3 2 20% 0 2 2 0 1 3 2 0% 3 3 2 3 0% 2×2 3×3 4×4 5×5 6×6 7×7 8×8 2×2 3×3 4×4 5×5 6×6 7×7 8×8 Fig. 3. Left: Percentage of simulation runs ending with a specific number of learners os signaling languages in a network with three agents for different n × n-games with n = 1 . . . 8. Right: Average percentage of agents learning a signaling language over all runs for different n × n-games with n = 1 . . . 8. Comparison of the results of a complete network of 3 agents (white circles) and 5 agents (black circles). We got the following results: for a 2 × 2-game, all three agents have learned the same signaling language in more than 80% of all simulation runs. But for a 3 × 3- game in less than a third of all runs agents have learned a signaling language; in more than 40% of all runs two agents have learned a signaling language and the third one a pooling language. And it gets even worse for higher n × n-games. E.g. for an 8 × 8- game in almost 80% of all runs no agents have learned a signaling language and never have all agents learned a signaling language. Figure 3 (left) depicts the distribution of how many agents have learned a signaling language (no learner, only one learner, two learners or all three agents are learners of a signaling language) for n × n-games for n = 2 . . . 8.5 In addition we were interested in whether and how the results would change by extending the number of agents. Thus in another line of experiments we tested the behavior of a complete network of 5 agents for comparison with the results of the 3 agents account. Figure 3 (right) shows the average number of agents who learned a signaling language per run for different n×n-games. As you can see for 2×2-games and 3 × 3-games the enhancement of population size leads to a higher average percentage of agents learning a signaling language. But for games with larger domains the results are by and large the same. The results for the classical sender-receiver game reveal that by extending learning accounts the probability of the emergence of perfect signaling systems can be improved but nevertheless is never one for an n × n-game, if n is large enough. Furthermore the results for the multi-agent account with only three agents show that even for a 2 × 2-game not in any case all agents learn a language. And for games with larger domains, results get worse. Furthermore results don’t get better or worse by changing the number of agents, as shown in a multi-agent account with 5 agents. But how could natural languages arise by assuming them having emerged from n × n-games with a 5 Note: further tests with Bush-Mosteller reinforcement in combination with nega- tive reinforcement and/or lateral inhibition revealed that in the same cases the results could be improved for 2 × 2-games, but were in any case worse for all other games with larger domains. The Force of Innovation 99 huge n-value and in a society of much more interlocutors? We’ll show that by allowing for the extinction of unused messages and the emergence of new messages, perfect signaling systems emerge for huge n-values and multiple agents in any case. In other words, we’ll show that stabilization needs innovation. 4 Innovation The idea of innovation in our account is that messages can become extinct and new messages can emerge, thus the number of messages during a repeated play can vary, whereas the number of states is fixed. The idea of innovation and extinction for rein- forcement learning applied on signaling games stems from Skyrms (2010), whereby to our knowledge it is completely new i) to combine it with Bush-Mosteller reinforcement plus negative reinforcement and ii) to use it for multi-agent accounts. The process of the emergence of new messages works like this: additionally to the balls for each message type each sender urn has an amount of innovative balls (according to Skyrms we call them black balls). If drawing a black ball the sender sends a completely new message, not ever used by any agent of the population. Because the receiver has no receiver urn of the new message, he chooses a random action. If action and state matches, the new message is adopted in the set of known messages of both interlocutors in the following way: i) both agents get a receiver urn for the new message, wherein the balls for all actions are equiprobable distributed, ii) both agents’ sender urns are filled with a predefined amount of balls of the new message and iii) the sender and receiver urn involved in this round are updated according to the learning dynamic. If the newly invented message doesn’t lead to successful communication, the message will be discarded and there will be no change in the agents strategies. As mentioned before, messages can become extinct, and that happens in the fol- lowing way: because of lateral inhibition, infrequently used or unused messages’ value of balls in the sender urns will get lower and lower. At a point when the number of balls of a message is 0 for all sender urns, the message isn’t existent in the active use of the agent (i.o.w. she cannot send the message anymore), and will also be removed from the agent’s passive use by deleting the appropriate receiver urn. At this point the mes- sage isn’t in this agent’s set of known messages. Besides, there is no other interference between sender and receiver urn of one agent. Some further notes: – it is possible that an agent can receive a message that is not in her set of known messages. In this case she adopts the new message like described for the case of innovation. Note that in a multi-agent setup this allows for a spread of new messages – the black balls are also affected by lateral inhibition. That means that the number of black balls can decrease and increase during runtime; it can especially be zero – a game with innovation has a dynamic number of messages starting with 0 mes- sages, but ends with |M | = |T |. Thus we call an innovation game with n states and n ultimate messages an n × n∗ -game 4.1 The Force of Innovation The total number of black balls of an agent’s sender urns describes his personal force of innovation. Note that black balls can only increase by lateral inhibition in the case 100 Mühlenbernd, Nick & Adam Game 2 × 2∗ 3 × 3∗ 4 × 4∗ 5 × 5∗ 6 × 6∗ 7 × 7∗ 8 × 8∗ 3 agents 1,052 2,120 4,064 9,640 21,712 136,110 > 500,000 5 agents 2,093 5,080 18,053 192,840 > 500,000 > 500,000 > 500,000 Table 1. Runtime Table for n × n∗ -games with n = 2 . . . 8; for a complete network of 3 agents and 5 agents. of unsuccessful communication and decrease by lateral inhibition in the case of success- ful communication. This interrelationship leads to the following dynamics: successful communication lowers the personal force of innovation, whereas unsuccessful communi- cation raises the personal force of innovation. If we define the global force of innovation for a group of connected agents X as the average personal force of innovation over all x ∈ X, then the following holds: the better the communication between agents in a group X, the lower the global force of innovation of this group and vice versa. In other words, this account realizes a plausible social dynamics: if communication works, then there is no need to change and therefore a low (or zero) value of the force of innovation, whereby if communication doesn’t work, the force of innovation rises. 4.2 Learning Languages by Innovation: A Question of Time We could show in section 3 that the percentage of agents learning a signaling language in a multi-agent context is being decreased by increasing the domain size of the game. To find out whether innovation can improve these results we started simulation runs with the following settings: – network types: complete network with 3 agents and with 5 agents – learning dynamics: Bush-Mosteller reinforcement with negative reinforcement and lateral inhibition value (α = 1, β = 1, γ = 1/|T |) and innovation – initial state: every urn of the sender is filled with black balls and the receiver does not have any a priori urn. – experiments: 100 simulation runs per n × n∗ -game with n = 2 . . . 8 – break condition: simulation stops if the communicative success of every agents exceeds 99% or the runtime passes the runtime limit of 500,000 communication steps (= runtime) These simulation runs gave the following results: i) for the 3-agents account in combination with n × n∗ -games for n = 2 . . . 7 and the 5-agents in combination with n × n∗ -games for n = 2 . . . 5 all agents have learned a signaling language in each simulation run and ii) for the remaining account-game combinations all simulation runs exceeded the runtime limit (see Table 1). We expect that for the remaining combination all agents will learn a signaling language as well, but it takes extremely long. All in all we could show that the integration of innovation and extinction of mes- sages leads to a final situation where all agents have learned the same signaling lan- guage, if the runtime doesn’t exceed the limit. Nevertheless we expect the same result for account-game combinations where simulations steps of these runs exceeded our limit for a manageable runtime. The Force of Innovation 101 communicative number of messages success force of innovation 1 30 .9 27 .8 24 .7 21 .6 18 .5 15 .4 12 .3 9 .2 6 .1 3 0 0 -.1 50 100 150 200 250 300 350 -.2 -.3 -.4 -.5 Fig. 4. Simulation run of a 3 × 3∗ -game with innovation in a 3-agents population. Communicative success, number of used messages and force of innovation of all agents in the population; number of simulation steps at x-axis. 4.3 The Development of Signaling Languages by Innova- tion As our experiments in the last section showed, by applying Bush-Mosteller reinforce- ment learning with innovation all agents learn the same signaling language for a small group of agents and any n × n∗ -game with n = 2 . . . 7. Let’s take a closer look at how a 3 × 3∗ -game develops during a simulation run by analyzing i) one randomly chosen agent’s parameters and ii) parameters of the whole population. Three parameters are of interest to us: – communication success: utility value averaged over the last 20 communication steps averaged over all agents in the population – number of messages in use: number of actually used messages in the whole popu- lation – force of innovation: absolute number of black balls averaged over all agents Figure 4 shows the resulting values for the whole population: in the beginning all the agents try out a lot of messages, which reduces the number of black balls in the urns because balls for the new messages are added and then the urn content is normalized. Note that for the first communication steps the force of innovation drops rapidly, while the number of messages rises until it reaches 21 messages here. As you can see in the course of the success-graph, the work is not done here. Once they have more or less agreed on which messages might be useful, the agents are trying them out and it is only when finally a subset of those messages is probabilistically favored that the success is increasing, while the number of known messages decreases, until the success finally reaches a perfect 1 on average, while the number of messages equals that of the states (3) and the force of innovation is zero. What you can see in the figures as well is that even though there is no one-to- one correspondence between the number of messages and the average success, their graphs do show some sort of mirroring on the micro level. The interrelationship of innovation force and average success is not well visible in Figure 4, because of the 102 Mühlenbernd, Nick & Adam communicative success force of innovation 1 1.5 .9 .8 1.2 .7 .6 0.9 .5 .4 0.6 .3 .2 0.3 .1 0 0 -.1 100 150 200 250 300 350 -.2 -.3 Fig. 5. Simulation run of a 3 × 3∗ -game with innovation in a 3-agents population. Comparison of communicative success and force of innovation; number of simulation steps at x-axis. coarse scaling of the force of innovation value. Figure 5 shows the force of innovation and the communication success between step 50 and 350 of the simulation run, already depicted in Figure 4, whereas the force of innovation value is 20 times more fine-grained. Here the interrelationship between both values is clearly recognizable, one measure’s peak is simultaneously the other measure’s valley. Admittedly the mirroring is not perfect, but it improves by increasing the number of agents. 5 Conclusion and Outlook Let’s recap: We started out with comparing Roth-Erev and Bush-Mosteller reinforce- ment, finding that Bush-Mosteller yields better results for repeated signaling games. Extending Bush-Mosteller with lateral inhibition lead to even better results, but far from perfect. And results were even worse for multi-agent account with 3 or 5 agents: with increasing n less agents develop a signaling language in the first place, especially pooling strategies turned out to be a common outcome. In a next step we extended the classical Bush-Mosteller reinforcement by adding negative reinforcement and therefore achieving lateral inhibition, innovation and extinction. We found that these tweaks re- sult in perfect communication between 3 agents in n × n∗ -games for n < 8 and between 5 agents for n < 6, since higher values for n or the number of agents require much higher runtime that exceed our limit. Especially the force of innovation seems to be responsible for this achievement, since it makes sure that new messages are introduced when communication is not successful, while the combination of negative reinforcement and lateral inhibition takes care of all unused or useless messages to become extinct. Consequently, the result is an agreement on one single perfect signaling language with no other messages that might interfere. The purpose of this direction of research is mostly about finding reasonable exten- sions of simple learning algorithms that lead to more explanatory results, assuming that more sophisticated learning dynamics might be more adequate to eventually describe human language behavior. We think the extensions we introduced in this article are of that kind, especially negative reinforcement, since we’re rather certain that failure has a learning-effect, and innovation and extinction, because it seems unreasonable to The Force of Innovation 103 assume that all messages are available right from the start and that everything is kept in the lexicon, even if it has only once been successfully used. Further research in this direction should clarify how memory restrictions could be modeled and how sender and receiver roles of one agent should influence each other. What remains to be shown is that our results in fact hold for higher numbers of agents and states. It would further be interesting to see what influence different, again more realistic network-types (say small-world or scale-free networks) have on the results and what happens if two or more languages interact. References CE1. Clarke, F., Ekeland, I.: Nonlinear oscillations and boundary-value problems for Hamiltonian systems. Arch. Rat. Mech. Anal. 78 (1982) 315–333 L1. Lewis, David: Convention. Cambridge: Harvard University Press (1969) B1. Barret, Jeffrey A.: The Evolution of Coding in Signaling Games. Theory and Decision 67 (2009), pp. 223–237 BZ1. Barret, Jeffrey A., Zollman, Kevin J. S.: The Role of Forgetting in the Evolution and Learning of Language. Journal of Experimental and Theoretical Artificial Intelligence 21.4 (2009), pp. 293–309 BM1. Bush, Robert, Mosteller, Frederick: Stochastic Models of Learning. New York: John Wiley & Sons (1955) HH1. Hofbauer, Josef, Huttegger, Simon M.: Feasibility of communication in binary signaling games. Journal of Theoretical Biology 254.4 (2008), pp. 843–849 HSRZ1. Huttegger, Simon M., Skyrms, Brian, Rory, Smead, Zollman, Kevin J.: Evolu- tionary dynamics of Lewis signaling games: signaling systems vs. partial pool- ing. Synthese 172.1 (2010), pp. 177–191 HZ1. Huttegger, Simon M., Zollman, Kevin J.: Signaling Games: Dynamics of Evo- lution and Learning. Language, Games, and Evolution. Ed. by Anton Benz et al. LNAI 6207. Springer (2011), pp. 160–176 M1. Mühlenbernd, Roland: Learning with Neighbours. Synthese 183.S1 (2011), pp. 87–109 MF1. Mühlenbernd, Roland, Franke, Michael: Signaling Conventions: Who Learns What Where and When in a Social Network. Proceedings of EvoLang IX (2011) RE1. Roth, Alvin, Erev, Ido: Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games and Economic Behaviour 8 (1995), pp. 164–212 S1. Skyrms, Brian: Signals: Evolution, Learning & Information. Oxford: Oxford University Press (2010) W1. Wagner, Elliott: Communication and Structured Correlation. Erkenntnis 71.3 (2009), pp. 377–393 Z1. Zollman, Kevin J. S.: Talking to neighbors: The evolution of regional meaning. Philosophy of Science 72.1 (2005), pp. 69–85.