-

Extending Fictitious Play with Pattern Recognition

Ronald Chu

Gerard Vreeswijk

0 0 Department of Information and Computing Sciences, Faculty of Science, Utrecht University , The Netherlands

Fictitious play, an algorithm to predict the opponents next move based on the observed history of play, is one of the oldest simple yet very e ective algorithms in game theory. Although using pattern recognition as a more sophisticated way to analyze the history of play seems a logical step, there is little research available on this subject. In this paper we will examine two di erent types of pattern recognition, and formulate several algorithms that incorporate these approaches. These algorithms and the basic ctitious play variants they extend are empirically tested in eight tournaments on some well known formal-form games. The results obtained will show that adding pattern recognition to ctitious play improves performance, and demonstrate the general possibilities of applying pattern recognition to agents in game theory.

Introduction The eld of game theory studies strategic decision making, where a xed number of players have a limited number of possible actions to choose from. The combination of the actions of all players, their joint actions, determines the outcome of the game. The mathematician George W. Brown describes his ctitious play (FP) algorithm as follows[ 2 ]: \The iterative method in question can be loosely characterized by the fact that it rests on the traditional statistician's philosophy of basing future decisions on the relevant history. Visualize two statisticians, perhaps ignorant of min-max theory, playing many plays of the same discrete zero-sum game. One might naturally expect a statistician to keep track of the opponent's past plays and, in the absence of a more sophisticated calculation, perhaps to choose at each play the optimum pure strategy against the mixture represented by all the opponent's past plays." Fictitious play uses the observations from the past to build an observed frequency distribution of the actions of its opponent(s) and uses the action(s) with the highest observed frequency as a prediction of the opponent(s) next move. It then chooses a strictly myopic response to maximize its expected payo . It was created as a heuristic for computing Nash equilibria by playing a ctitious game against itself, and proven valid by Robbinson[ 11 ]. Note the term ctitious play may be misleading in its current use as a model used by an actual player, since there is nothing ctional about the observed history of play and the algorithm does not "play a ctional game in its head" to make its predictions.

A next logical step in the evolution of the ctitious play algorithm is to perform a more sophisticated analysis of the history of play. There are of course many di erent approaches one can take to perform such an analysis. In this paper we will focus on performing pattern recognition; looking at sequences of actions instead of single actions in the history of play. Although pattern recognition is a broad and established eld with much ongoing research there exists, as Spiliopoulos puts it, "a conspicuous gap in the literature regarding learning in games { the absence of empirical veri cation of learning rules involving pattern recognition"[ 14 ].

In the next section will we give a thorough overview of the basic ctitious play algorithm and two common adaptations. We will then examine two di erent approaches to add pattern recognition, and formulate several algorithms that extend ctitious play with the proposed pattern recognition. Finally we will formulate a set of experiments to obtain empirical data on the performance of the pattern recognizing algorithms and the basic ctitious play algorithms they extend. An analysis of the data obtained will show if, how and why pattern recognition in uences performance. An extensive analysis and schematic overview of the data obtained can be found in[ 4 ]. 2 2.1

Fictitious play

Brown's original algorithm

A normal-form game in game theory is a description of a game in the form of all players' strategy spaces and payo functions. A n-player game has a nite strategy space X = Qi Xi. The strategy space Xi for each individual player i = 1; 2; : : : ; n consists of all actions available to player i. There is a payo (or utility) function ui(x) for each player i which determines the utility received by player i given the joint actions x 2 X played that round.

A B a b 1; 1 0; 0 0; 0 2; 2

As mentioned before FP uses the observed history of play to determine its actions. Because the lack of history the rst action xi1 is speci ed arbitrarily1. The algorithm consists of two components: a forecasting and a response component. The game is played several rounds, and for each time index (or round number) t 2 N0 the n-tuple xt = (xt1; xt2; : : : ; xtn) where xt 2 X, consists of the actions played that round by all players. Let ht = (x1; x2; : : : ; xt) be the observed history up to round t. The forecasting function pit(xj ), where xj 2 Xj , returns the proportion of the time when xj was played in the history ht: (2.1) t

P It u(xj ) pit(xj ) = u=1 t Let pt = Q p

i it be the associated product distribution. The indicator function It(xj ) returns 1 if xj = xtj (action xj was played by player j at time t) and returns 0 otherwise. Note that Pxi2Xi pit(xi) = 1 for every i and every t.

The response is an action from the best-reply correspondence BRi(pt); the set of all actions of i that maximize expected utility given pt. The utility of a probability distribution is de ned as the expected utility of its outcomes by de ning ui(p) = Px2X ui(x)p(x). Let i denote the set of all probability distributions on the set Xi, and let = Q i. Let pit 2 i and pt 2 denote the observed probability distribution in round t for the actions of player i and the joint actions of all players, respectively. For every p 2 and every x 2 X the probability of x is pt(x) = Qi pit(xi). To be able to predict the expected utility for each action we need to be able to distinguish between our actions Xi and the actions of the other players X i = Qi6=j Xj . In the same fashion let i = Qj6=i j and let pt i 2 i. Now we can combine an action xi and a partial probability distribution p i to obtain a full probability distribution (xi; p i) 2 that places a probability 1 on xi, and de ne the best-reply correspondence as: BRi(pt i) = fxi 2 Xi : ui(xi; pt i) ui(x0i; pt i) for all x0i 2 Xig: 2.2

Adaptations

The original algorithm has a very rigid speci cation of predicting according to the empirical distribution of play and choosing an action to 1 The superscript t in xt indicates a time index, not an exponent. When exponentiation is intended the base is a number or between brackets maximize the immediate expected utility. Fortunately there are possible modi cations without losing the properties of convergence[ 6 ]. We will discuss two adaptations: smoothed and weighted ctitious play. 2.2.1 Smoothed ctitious play In smoothed ctitious play (SFP) the players' responses are smoothed by small trembles or random shocks. It was developed by Fudenberg and Kreps[ 6 ] along the lines of Harsanyi's puri cation theorem[ 8 ]. They de ne smooth ctitious play as the family of algorithms that employ a standard FP forecasting rule, and a response rule that maximizes the actual utility Ui: Here the smoothing function wi : i ! R is any smooth, di erentiable and strictly concave function such that jrwi(qi)j ! 1 whenever xi approaches the boundary of i, and the response parameter i > 0.

To understand the variant of SFP we will be using, suppose the choice probabilities of player i are described by the logistic function2 qi(xijp i) which denotes the probability that action xi will be played next: qi(xijp i) =

eui(xi;p i)= i

P eui(x0i;p i)= i : If the response parameter i is close to zero this function closely approximates the best-reply correspondence, while with a larger the probabilities are more evenly spread over all actions3. From an observer's standpoint a player using eq. 2.3 may look like he is using a best-reply function which is perturbed by small utility trembles; each period the actual utility Ui = ui + it is perturbated by the extreme-valued variable it (whose cumulative distribution is ln P ( it z) = e z= i ). Another argument for using the logistic function involves the Shannon entropy. The amount of information conveyed by a probability distribution qi can be represented by the entropy function qiln(qi):

X qi(x0i; p i)ln(qi(x0i; p i)): 2 See McKelvey and Palfrey[ 10 ] for details about the quantal response equilibrium.

The general idea is that players make errors, but since the probability of an action being chosen is related to its utility it is unlikely that costly errors are made. 3 When ! 1 the function qi(xijp i) ! jX1ij , thus all actions have an equal chance of being chosen next. It can be shown[ 7 ] that the optimal qi is given by the logistic function if we de ne the actual utility (eq. 2.2) as a weighted combination of the utility ui and the information gained from experimenting: 2.2.2 Weighted ctitious play The second adaptation of classic FP we will discuss involves techniques where the observations from the past decay over time, making them less in uential than more recent observations. If, as classic ctitious play does, the entire history is considered it is harder to obtain change in beliefs and behavior as the history gets larger thus making it harder for ctitious play to catch up when the opponent changes its strategy. In weighted ctitious play (WFP), also known as exponential FP, a weight factor 0 1 is applied to the history[ 3 ]. It uses the original best-reply response rule and a modi cation of the forecasting rule (eq. 2.1), where every round every prior observation is multiplied with the weight factor (i.e. at time t an observation from t0 t is multiplied with ( )t t0 ): pit(xj ) = The weight factor determines the rate of decay. When = 1 there is no decay and weighted ctitious play behaves like classic ctitious play. When = 0 the entire history decays instantly and weighted ctitious play behaves like the learning rule introduced by Cournot[ 5 ] 3

Pattern recognition In this section we examine two distinct implementations of pattern recognition in ctitious play. Rothelli[ 12 ], Lahav[ 9 ] and Spiliopoulos[ 14,15 ] have created similar pattern recognizing algorithms to describe human learning behavior. Their models search the entire history of play for possible patterns that match the last few moves played, apply a form of decay and nally use the frequency distribution obtained to predict the next move. This approach is similar to conditional ctitious play proposed by Aoyagi[ 1 ], who proved that if two players both apply conditional ctitious play that recognizes patterns of the same length their beliefs converge to the equilibrium in zero-sum games with a unique Nash equilibrium. We will examine Spiliopoulos' model because it is an intuitive extension of weighted ctitious play and in its simplest form does not apply any additional transformations designed to model how humans form beliefs and make decisions.

Sonsino[ 13 ] proposes confused learning with cycle detection at the end of the realized history of play and proves convergence to a xed pattern of pure Nash equilibria for a large class of games. We will brie y touch confused learning and focus on the cycle detection he proposes. 3.1 Spiliopoulos[ 14,15 ] proposes n-period ctitious play (FPN ), where n 2 N+, as an extension to weighted ctitious play which keeps track of sequences of actions of length n in the observed history of play.

It does not directly search for patterns, but uses the observations of the last n 1 rounds as a premise and uses only the part of the history when it forecasts the opponent's next move4. It does respond to patterns as they will be visible in the history; with n = 3 and a pattern AAB the probability p(BjAA), for example, will be high. Even patterns longer than n can indirectly be derived from the history sometimes. The pattern AABBA will result in higher probabilities p(BjAB), p(AjBB) and p(AjBA), but both p(AjAA) and p(BjAA) will be about equal.

FP1 is exactly the same as WFP, and FP3, for example, keeps track of the observed frequencies of all possible patterns of three, which enables a forecast of the next round given what was played in the last two rounds. First the indicator function It is extended to indicate wether a sequence of actions was played. Let xj be a sequence of n actions. The indicator It(xj ) return 1 if the sequence resembles the actions played in rounds t n + 1; : : : ; t 1; t, and 0 otherwise. The forecasting function pit in equation 2.4 is extended to return the proportion of the time the actions xj were played: 4 This is similar to Aoyagi's conditional history. An important di erence is that conditional FP uses the joint actions of both players as a premise, while we only consider the actions of the opponent. which can then be used to calculate the proportion of the time the action xj was played given that the actions xj were played in the previous n 1 rounds: pit(xj jxj ) =

pit(xj + xj )

P x0j2Xj pit(xj + x0j ) :

Spiliopoulos continues by extending the FPN algorithm to allow the players' perceptions to follow commonly ascribed psychophysics principles, which we will not discuss here because human leaning behavior is beyond the scope of our investigation. We will use the n-period ctitious play algorithm in its simplest form in our experiment. 3.2

Cyclic pattern detection

Sonsino[ 13 ] proposes superimposing the ability of strategic pattern recognition of cyclic patterns that repeat successively at the observed path of play on a learning model called confused learning5. Detection is restricted to basic patterns; patterns that do not contain a smaller subpattern6. If a player recognizes such a repeated pattern it will assume the other players will continue to follow that pattern and, being strictly myopic, plays a best-reply response against the predicted next joint actions of the pattern.

For every pattern p with length lp there is a uniform bound Tp such that the model must detect p with probability 1 if it has appeared almost Tp times (if the next round may complete Tp uninterrupted occurrences of p). To eliminate the possible inconsistency that the model detects more than one pattern with probability one at the same time, it is assumed that only patterns with length shorter than some xed bound L are detected and that Tplp 3L. This ensures Tp is large enough relative to L. The detection of patterns after a history-independent xed number of repetitions is to stylized to represent any realistic learning e ort. To accommodate any form of non-stationary pattern detection a set of necessary conditions is added to the model. Patterns longer than 2 must appear almost two times (again, if the next round may complete two uninterrupted occurrences of the pattern) in the past 2lp 1 rounds. Patterns of length 2 must appear almost 21=2 times in the past 4 rounds and patterns of length 1 must appear 2 times in the past 2 rounds. For the pattern ABCD, for example, the last 7 observations must be ABCDABC, while a shorter pattern AB requires the last 4 observations to be BABA. 5 In confused learning players learn to choose the best learning model and are confused in the sense that they do not a priori know what the best strategy is. 6 A and AAB are examples of basic patterns, ABAB and AA are not

A detected pattern p remains detected until it is contradicted, and a pattern p may only be detected lp rounds after the previous pattern was detected. The existence of these minimal and the necessary conditions de ne a range in which pattern detection can occur. Sonsino shows that convergence of behavior to any xed strategy that is not a pure equilibrium is incompatible with pattern recognition7, and that convergence of behavior to some mixed strategy pro le is only possible if agents consider arbitrarily long histories (and thus impossible for agents with bounded memory). 4

Experiment Setup The algorithms used in our experiment (listed in Table 1) are de ned by their forecasting and response components so we can quickly inspect not only the performance of the algorithm as a whole but also the performance of the individual components without diving into the internal beliefs. This enables us to quickly identify which behavior to examine in more detail.

Model Name Forecast Response Parameters Brown's original FP FP Simple Best reply Smooth FP SFP Simple Smoothed = 1 Weighted FP WFP Weighted Best reply Smooth weighted FP SWFP Weighted Smoothed = 1 N -pattern FPN N -Pattern Best reply N = 2; 3 Smoothed n-pattern SFPN N -Pattern Smoothed N = 2; 3, = 1 Weak cycle FPwCL Weak cycle Best reply L = 2; 3; 20 Smoothed weak cycle SFPwCL Weak cycle Smoothed L = 2; 3; 20, = 1 Strong cycle FPsCL Strong cycle Best reply L = 2; 3; 20 Smoothed strong cycle SFPsCL Strong cycle Smoothed L = 2; 3; 20, = 1 7 A detected pattern can only sustain itself if it consists only of pure Nash equilibria, because the player will break other patterns by playing a best-reply action. We will use a pattern length n of both 2 and 3, resulting in four di erent algorithms: FP2, SFP2, FP3 and SFP3. The weight factor is 0:9.

Weak cycle forecasting (section 3.2)

Patterns are only detected at the upper bound when the next round may complete Tp consecutive occurrences of the pattern. The pattern length L a ect how quick a pattern is detected because Tp = oor(3L=lp). We will use the same short pattern length L of 2 and 3, and another large L of 20 resulting in six di erent algorithms. When a pattern is detected the algorithms returns a probability distribution where the next action in the pattern has probability of one. If no pattern is detected it employs the Simple forcasting algorithm.

Strong cycle forecasting (section 3.2)

Patterns are detected at the lower bound described by Sonsino. The pattern length L does not in uence speed with which patterns are detected. We will use the same pattern lengths 2, 3 and 20, and this algorithms also employs the Simple forecasting algorithm when no pattern is detected.

Best reply response (section 2.1)

The best reply correspondence. If there is a tie (i.e. the set BR contains more than one action) then one is chosen at random.

Smoothed response (section 2.2.1)

The trembled best reply algorithm with a response parameter of 1.

There are 40 algorithms for 2 2 games and 60 for 3 3 games in our experiment8. For each game each algorithms faces all algorithms, themselves included, in a tournament where each game lasts 1000 rounds and is repeated 100 times. Finally they are tested with 100 randomly generated games, one tournament per game. The games used are:

Asymmetric Coordination game

There are two pure strategy equilibria and A a mixed strategy equilibrium. Both players B prefer the same equilibrium which Pareto dominates the other.

A B 5; 5 0; 0 0; 0 3; 3 8 There are 10 forecasters (FP, WFP, FP2, FP3, FPwC-2, FPwC-3, FPsC-2, FPsC-3 and FPsC-20) times 2 responders (Best reply, Smoothed) times 2 or 3 possible initial moves equals 40 or 60 algorithms.

Symmetric Coordination game

There are two pure strategy equilibria and A a mixed strategy equilibrium. Neither equi- B librium is preferred or dominated.

A B 1; 1 0; 0 0; 0 1; 1 Shapley's game A B C The only equilibrium is a mixed strategy A 0 1; 0 0; 0 0; 1 1 where each player plays each strategy with B @ 0; 1 1; 0 0; 0 A equal probability. The game is a non-zero C 0; 0 0; 1 1; 0 sum variant of rock-paper-scissors designed to cause cyclic behavior in ctitious play algorithms.

Battle of the sexes

There are two Pareto optimal pure strategy A equilibria and a mixed strategy equilibrium. B A di erent pure strategy equilibrium is preferred by each player.

Chicken

A variant of Battle of the Sexes with the A same equilibria, where both players' actions B corresponding to their preferred pure equilibria yields the worst outcome for both.

Matching pennies

Zero-sum game where the only equilibrium A is a mixed strategy where each player plays B each strategy with equal probability.

Prisoner's dilemma

Each player has a dominant strategy result- A ing in a pure strategy equilibrium. The pure B strategy joint actions A; A is Pareto optimal.

B 1; 1 10; 10 A B 3; 2 0; 0 0; 0 2; 3

A 0; 0 1; 1

A B 1; 1 1; 1 1; 1 1; 1 A B 2; 2 0; 3 3; 0 1; 1

Random games

We've generated 100 random 2 2 games using GAMUT. All eight payo s lie between 0 and 100.

Pattern Recognition Results We will compare FPC with FP and FPN with WFP to see the di erence with the basic ctitious play variants they are based on. All observations are based on an in-depth analysis of the results per game which can be found in the full thesis. Both strong and weak pattern recognition perform better than FP in the random games, where only FPN scores higher. Exception is FPwC-20 which performs only marginally better than FP. This is to be expected since FPwC-20 detects patterns very slowly, in 3L = 60 rounds, and thus falls back to FP very often. FPC scores higher than FP in Shapley's game and matching pennies, there is no performance improvement in the rest of the games. The behavior is exactly the same as FP in both coordination games and the prisoner's dilemma, almost the same with no clear advantage or disadvantage in chicken, and slightly disadvantageous in the battle of the sexes. FPsC performs better than FPwC in matching pennies and Shapley's game, there is no clear performance di erence between strong and weak pattern recognition in all other games.

The pattern length L does not in uence the performance of strong pattern recognition. Shorter patterns are allowed to be detected sooner, and with only two actions fA; Bg a pattern longer than three actions always contains either ABAB or AA making it impossible to detect longer patterns. Detection speed is independent of L and the ability to detect longer patterns is not an advantage because the detection of shorter pattern always blocks the detection of longer pattern.

The di erence between weak pattern recognition with a di erent L responsible for the observed performance di erences is simply that FPwC-3 is slightly slower in detecting patterns because of the higher pattern length L. In comparison with FPwC-2 and FPwC-3, FPwC-20 is much slower in detecting patterns and very likely to not detect any patterns at all because patterns need 60=lp repetitions to be detected; short patterns are likely to be broken before they can be detected and long patterns are unlikely to occur at all. FPwC-20 performs exactly like FP in matching pennies and Shapley's game, and better than FP but still worse than FPwC-2 and FPwC-3 in the battle of the sexes, matching pennies, Shapley's game and in the random games. 5.2 In the random generated games, Shapley's game and matching pennies FPN outperforms all other algorithms tested in our experiment. It performs better than WFP in the symmetric coordination game. In the battle of the sexes WFP scores higher than FP3 but lower than FP2, and in the asymmetric coordination game WFP scores equal to FP3 and lower than FP2.

FPN quickly detects change if the opponent switches from one singleton pattern to another. Because AA and BB will occur more often than AB and BA in the observed history of play FPN will forecast the same action the opponent played in the previous round. This forecast is only wrong for one round when the opponent will switch actions.

The di erences between N = 2 and N = 3 are marginal. FP3 performs slightly better in the random games and asymmetric coordination game, where FP2's forecast is wrong in one round. FP2 performs slightly better in matching pennies. There is no di erence in performance between FP2 and FP3 in the remaining games. 6

Conclusion In this paper we have studied two distictly di erent approaches to perform pattern recognition on the observerd history of play, obtained empirical data on the performance of those algorithms and the ctitious play variants they extend, and found the new pattern-recognition based FP variants to be signi cantly more e ective than the traditional FP variants.

N -period ctitious play is an extension of weighted ctitious play inspired by pattern recognition, which has proved itself to be signi cantly more e ective than WFP. FP3 performs only slightly better than FP2 in two, and slightly worse in one of the games tested in our experiment. We cannot at this stage, however, recommend not to use a pattern length N > 2. The costs are higher, but the e ectiveness of a higher pattern length against other opponents, which are not related to FP, remains to be tested.

In contrast to FPN , the strong and weak pattern recognition capabilities of PFC have no relation at all to the ctitious play forecaster algorithm, which in our experiments only serves as a fallback forecaster in case no patterns are detected. It can easily be replaced by any other adaptive algorithm, as Sonsino describes. Furthermore, the strong and weak detection algorithms are merely implementations of the lower and upper bounds of the area where, in Sonsono's model, pattern recognition may occur. This does not mean that they are bad algorithms per se, nor does it mean that good or bad results necessarily imply cyclic pattern recognition at the end of the observed history of play is a good or bad strategy. The results of these two approaches do, however, give valuable insights that should be taken into account when designing a cyclic pattern recognizing algorithm.

The results we obtained are encouraging and illustrate the possibilities and merits of applying pattern recognition in machine learning and game theory. Adding a layer of pattern recognition on top of FP improves performance signi cantly in comparison to FP on its own and FPN nearly always outperforms WFP.

Fictitious play is a basic algorithm which has its limitations and weaknesses yet is very e ective for its simplicity. The pattern recognition algorithms we studied share some of those strengths and weaknesses because they are either and extension of FP or use FP as a fallback strategy, and the matches show similar behavior because there were only FP-based opponents participating in the tournament. But FPN and FPC do show di erent behavior, overcoming FP's vulnerably to cyclic behavior and introducing new problems coordinating in self-play, for example.

The eld of pattern recognition, however, is vast and o ers many idea's and possibilities for (advanced) pattern recognition that have no relation at all to ctitious play. Con ning further research to ctitious play is an unnecessary limitation on the broad spectrum of possibilities that pattern recognition has to o er. Despite the e ectiveness of the pattern-detection based ctitious play algorithms studied, pattern detection should be seen as a stand-alone approach to machine learning. The best reply response works well in combination with the tested pattern detecting forecasts, but we should not ignore possibilities where pattern recognition provides both components or a complete learning algorithm where such a distinction is not applicable at all.

Aoyagi . Evolution of beliefs and the nash equilibrium of normal form games . Journal of Economic Theory , 70 ( 2 ): 444 { 469 , 1996 .

G.W.

Brown . Iterative solution of games by ctitious play . Activity analysis of production and allocation , 13 ( 1 ): 374 { 376 , 1951 .

Y.W.

Cheung and

Friedman . Individual learning in normal form games: Some laboratory results . Games and Economic Behavior , 19 ( 1 ): 46 { 76 , 1997 .

Chu . Extending Fictitious Play with Pattern Recognition . Master's thesis , Universiteit Utrecht, 2013 .

Cournot . Research into the mathematical principles of the theory of wealth (1838), trans . BACON . New York, 2, 1897 .

Fudenberg and

Kreps . Learning mixed equilibria . MIT Press, 1992 .

Fudenberg and

D.K.

Levine . The theory of learning in games , volume 2 . The MIT press, 1998 .

J.C.

Harsanyi . Games with randomly disturbed payo s: A new rationale for mixedstrategy equilibrium points . International Journal of Game Theory , 2 ( 1 ):1{ 23 , 1973 .

Lahav . Behavioral pattern learning models for decision making in games . Journal of Pattern Recognition Research , 4 ( 1 ): 133 { 151 , 2009 .

10. R.D. McKelvey and T.R. Palfrey . Quantal response equilibria for normal form games . Games and economic behavior , 10 ( 1 ):6{ 38 , 1995 .

11.

Robinson . An iterative method of solving a game . The Annals of Mathematics , 54 ( 2 ): 296 { 301 , 1951 .

12. T.F. Ro theli. Pattern recognition and procedurally rational expectations . Journal of economic behavior & organization , 37 ( 1 ): 71 { 90 , 1998 .

13.

Sonsino . Learning to learn, pattern recognition, and nash equilibrium . Games and Economic Behavior , 18 ( 2 ): 286 { 331 , 1997 .

14.

Spiliopoulos . Pattern recognition and subjective belief learning in repeated mixed strategy games . 2009 .

15.

Spiliopoulos . Pattern recognition and subjective belief learning in a repeated constant-sum game . Games and Economic Behavior , 2012 .

16.

H. Peyton

Young . Learning. In Individual Strategy and Social Structure: An Evolutionary Theory of Institutions , chapter 2 , pages 25 { 43 . Princeton University Press, 1998 .