<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extending Fictitious Play with Pattern Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronald Chu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerard Vreeswijk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Computing Sciences, Faculty of Science, Utrecht University</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fictitious play, an algorithm to predict the opponents next move based on the observed history of play, is one of the oldest simple yet very e ective algorithms in game theory. Although using pattern recognition as a more sophisticated way to analyze the history of play seems a logical step, there is little research available on this subject. In this paper we will examine two di erent types of pattern recognition, and formulate several algorithms that incorporate these approaches. These algorithms and the basic ctitious play variants they extend are empirically tested in eight tournaments on some well known formal-form games. The results obtained will show that adding pattern recognition to ctitious play improves performance, and demonstrate the general possibilities of applying pattern recognition to agents in game theory.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
The eld of game theory studies strategic decision making, where a xed
number of players have a limited number of possible actions to choose
from. The combination of the actions of all players, their joint actions,
determines the outcome of the game. The mathematician George W. Brown
describes his ctitious play (FP) algorithm as follows[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
\The iterative method in question can be loosely characterized by
the fact that it rests on the traditional statistician's philosophy of
basing future decisions on the relevant history. Visualize two
statisticians, perhaps ignorant of min-max theory, playing many plays
of the same discrete zero-sum game. One might naturally expect
a statistician to keep track of the opponent's past plays and, in
the absence of a more sophisticated calculation, perhaps to choose
at each play the optimum pure strategy against the mixture
represented by all the opponent's past plays."
Fictitious play uses the observations from the past to build an observed
frequency distribution of the actions of its opponent(s) and uses the
action(s) with the highest observed frequency as a prediction of the
opponent(s) next move. It then chooses a strictly myopic response to
maximize its expected payo . It was created as a heuristic for computing Nash
equilibria by playing a ctitious game against itself, and proven valid by
Robbinson[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Note the term ctitious play may be misleading in its
current use as a model used by an actual player, since there is nothing
ctional about the observed history of play and the algorithm does not
"play a ctional game in its head" to make its predictions.
      </p>
      <p>
        A next logical step in the evolution of the ctitious play algorithm
is to perform a more sophisticated analysis of the history of play. There
are of course many di erent approaches one can take to perform such an
analysis. In this paper we will focus on performing pattern recognition;
looking at sequences of actions instead of single actions in the history of
play. Although pattern recognition is a broad and established eld with
much ongoing research there exists, as Spiliopoulos puts it, "a conspicuous
gap in the literature regarding learning in games { the absence of empirical
veri cation of learning rules involving pattern recognition"[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        In the next section will we give a thorough overview of the basic
ctitious play algorithm and two common adaptations. We will then
examine two di erent approaches to add pattern recognition, and formulate
several algorithms that extend ctitious play with the proposed pattern
recognition. Finally we will formulate a set of experiments to obtain
empirical data on the performance of the pattern recognizing algorithms
and the basic ctitious play algorithms they extend. An analysis of the
data obtained will show if, how and why pattern recognition in uences
performance. An extensive analysis and schematic overview of the data
obtained can be found in[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
2
2.1
      </p>
      <p>Fictitious play</p>
    </sec>
    <sec id="sec-2">
      <title>Brown's original algorithm</title>
      <p>A normal-form game in game theory is a
description of a game in the form of all players' strategy
spaces and payo functions. A n-player game has
a nite strategy space X = Qi Xi. The strategy
space Xi for each individual player i = 1; 2; : : : ; n
consists of all actions available to player i. There
is a payo (or utility) function ui(x) for each
player i which determines the utility received by
player i given the joint actions x 2 X played that
round.</p>
      <p>A
B
a b
1; 1 0; 0
0; 0 2; 2</p>
      <p>As mentioned before FP uses the observed history of play to determine
its actions. Because the lack of history the rst action xi1 is speci ed
arbitrarily1. The algorithm consists of two components: a forecasting and
a response component. The game is played several rounds, and for each
time index (or round number) t 2 N0 the n-tuple xt = (xt1; xt2; : : : ; xtn)
where xt 2 X, consists of the actions played that round by all players.
Let ht = (x1; x2; : : : ; xt) be the observed history up to round t. The
forecasting function pit(xj ), where xj 2 Xj , returns the proportion of the
time when xj was played in the history ht:
(2.1)
t</p>
      <p>P It u(xj )
pit(xj ) = u=1
t
Let pt = Q p</p>
      <p>i it be the associated product distribution. The indicator
function It(xj ) returns 1 if xj = xtj (action xj was played by player j at
time t) and returns 0 otherwise. Note that Pxi2Xi pit(xi) = 1 for every i
and every t.</p>
      <p>The response is an action from the best-reply correspondence BRi(pt);
the set of all actions of i that maximize expected utility given pt. The
utility of a probability distribution is de ned as the expected utility of its
outcomes by de ning ui(p) = Px2X ui(x)p(x). Let i denote the set of
all probability distributions on the set Xi, and let = Q i. Let pit 2 i
and pt 2 denote the observed probability distribution in round t for
the actions of player i and the joint actions of all players, respectively. For
every p 2 and every x 2 X the probability of x is pt(x) = Qi pit(xi).
To be able to predict the expected utility for each action we need to be
able to distinguish between our actions Xi and the actions of the other
players X i = Qi6=j Xj . In the same fashion let i = Qj6=i j and let
pt i 2 i. Now we can combine an action xi and a partial probability
distribution p i to obtain a full probability distribution (xi; p i) 2 that
places a probability 1 on xi, and de ne the best-reply correspondence as:
BRi(pt i) = fxi 2 Xi : ui(xi; pt i)
ui(x0i; pt i) for all x0i 2 Xig:
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Adaptations</title>
      <p>
        The original algorithm has a very rigid speci cation of predicting
according to the empirical distribution of play and choosing an action to
1 The superscript t in xt indicates a time index, not an exponent. When exponentiation
is intended the base is a number or between brackets
maximize the immediate expected utility. Fortunately there are
possible modi cations without losing the properties of convergence[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We will
discuss two adaptations: smoothed and weighted ctitious play.
2.2.1 Smoothed ctitious play In smoothed ctitious play (SFP)
the players' responses are smoothed by small trembles or random shocks.
It was developed by Fudenberg and Kreps[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] along the lines of Harsanyi's
puri cation theorem[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. They de ne smooth ctitious play as the family
of algorithms that employ a standard FP forecasting rule, and a response
rule that maximizes the actual utility Ui:
Here the smoothing function wi : i ! R is any smooth, di erentiable
and strictly concave function such that jrwi(qi)j ! 1 whenever xi
approaches the boundary of i, and the response parameter i &gt; 0.
      </p>
      <p>To understand the variant of SFP we will be using, suppose the choice
probabilities of player i are described by the logistic function2 qi(xijp i)
which denotes the probability that action xi will be played next:
qi(xijp i) =</p>
      <p>eui(xi;p i)= i</p>
      <p>P eui(x0i;p i)= i :
If the response parameter i is close to zero this function closely
approximates the best-reply correspondence, while with a larger the
probabilities are more evenly spread over all actions3. From an observer's
standpoint a player using eq. 2.3 may look like he is using a best-reply function
which is perturbed by small utility trembles; each period the actual
utility Ui = ui + it is perturbated by the extreme-valued variable it (whose
cumulative distribution is ln P ( it z) = e z= i ). Another argument for
using the logistic function involves the Shannon entropy. The amount of
information conveyed by a probability distribution qi can be represented
by the entropy function qiln(qi):</p>
      <p>
        X qi(x0i; p i)ln(qi(x0i; p i)):
2 See McKelvey and Palfrey[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for details about the quantal response equilibrium.
      </p>
      <p>
        The general idea is that players make errors, but since the probability of an action
being chosen is related to its utility it is unlikely that costly errors are made.
3 When ! 1 the function qi(xijp i) ! jX1ij , thus all actions have an equal chance
of being chosen next.
It can be shown[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that the optimal qi is given by the logistic function
if we de ne the actual utility (eq. 2.2) as a weighted combination of the
utility ui and the information gained from experimenting:
2.2.2 Weighted ctitious play The second adaptation of classic FP
we will discuss involves techniques where the observations from the past
decay over time, making them less in uential than more recent
observations. If, as classic ctitious play does, the entire history is considered it is
harder to obtain change in beliefs and behavior as the history gets larger
thus making it harder for ctitious play to catch up when the opponent
changes its strategy. In weighted ctitious play (WFP), also known as
exponential FP, a weight factor 0 1 is applied to the history[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
It uses the original best-reply response rule and a modi cation of the
forecasting rule (eq. 2.1), where every round every prior observation is
multiplied with the weight factor (i.e. at time t an observation from t0 t
is multiplied with ( )t t0 ):
pit(xj ) =
The weight factor determines the rate of decay. When = 1 there is
no decay and weighted ctitious play behaves like classic ctitious play.
When = 0 the entire history decays instantly and weighted ctitious
play behaves like the learning rule introduced by Cournot[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
3
      </p>
      <p>
        Pattern recognition
In this section we examine two distinct implementations of pattern
recognition in ctitious play. Rothelli[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Lahav[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Spiliopoulos[
        <xref ref-type="bibr" rid="ref14 ref15">14,15</xref>
        ]
have created similar pattern recognizing algorithms to describe human
learning behavior. Their models search the entire history of play for
possible patterns that match the last few moves played, apply a form of decay
and nally use the frequency distribution obtained to predict the next
move. This approach is similar to conditional ctitious play proposed by
Aoyagi[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], who proved that if two players both apply conditional ctitious
play that recognizes patterns of the same length their beliefs converge to
the equilibrium in zero-sum games with a unique Nash equilibrium. We
will examine Spiliopoulos' model because it is an intuitive extension of
weighted ctitious play and in its simplest form does not apply any
additional transformations designed to model how humans form beliefs and
make decisions.
      </p>
      <p>
        Sonsino[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposes confused learning with cycle detection at the end
of the realized history of play and proves convergence to a xed pattern
of pure Nash equilibria for a large class of games. We will brie y touch
confused learning and focus on the cycle detection he proposes.
3.1
Spiliopoulos[
        <xref ref-type="bibr" rid="ref14 ref15">14,15</xref>
        ] proposes n-period ctitious play (FPN ), where n 2
N+, as an extension to weighted ctitious play which keeps track of
sequences of actions of length n in the observed history of play.
      </p>
      <p>It does not directly search for patterns, but uses the observations of
the last n 1 rounds as a premise and uses only the part of the history
when it forecasts the opponent's next move4. It does respond to patterns
as they will be visible in the history; with n = 3 and a pattern AAB
the probability p(BjAA), for example, will be high. Even patterns longer
than n can indirectly be derived from the history sometimes. The
pattern AABBA will result in higher probabilities p(BjAB), p(AjBB) and
p(AjBA), but both p(AjAA) and p(BjAA) will be about equal.</p>
      <p>FP1 is exactly the same as WFP, and FP3, for example, keeps track
of the observed frequencies of all possible patterns of three, which enables
a forecast of the next round given what was played in the last two rounds.
First the indicator function It is extended to indicate wether a sequence
of actions was played. Let xj be a sequence of n actions. The indicator
It(xj ) return 1 if the sequence resembles the actions played in rounds
t n + 1; : : : ; t 1; t, and 0 otherwise. The forecasting function pit in
equation 2.4 is extended to return the proportion of the time the actions
xj were played:
4 This is similar to Aoyagi's conditional history. An important di erence is that
conditional FP uses the joint actions of both players as a premise, while we only consider
the actions of the opponent.
which can then be used to calculate the proportion of the time the action
xj was played given that the actions xj were played in the previous n 1
rounds:
pit(xj jxj ) =</p>
      <p>pit(xj + xj )</p>
      <p>P
x0j2Xj
pit(xj + x0j )
:</p>
      <p>Spiliopoulos continues by extending the FPN algorithm to allow the
players' perceptions to follow commonly ascribed psychophysics
principles, which we will not discuss here because human leaning behavior is
beyond the scope of our investigation. We will use the n-period ctitious
play algorithm in its simplest form in our experiment.
3.2</p>
    </sec>
    <sec id="sec-4">
      <title>Cyclic pattern detection</title>
      <p>
        Sonsino[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposes superimposing the ability of strategic pattern
recognition of cyclic patterns that repeat successively at the observed path of
play on a learning model called confused learning5. Detection is restricted
to basic patterns; patterns that do not contain a smaller subpattern6. If a
player recognizes such a repeated pattern it will assume the other players
will continue to follow that pattern and, being strictly myopic, plays a
best-reply response against the predicted next joint actions of the pattern.
      </p>
      <p>For every pattern p with length lp there is a uniform bound Tp such
that the model must detect p with probability 1 if it has appeared almost
Tp times (if the next round may complete Tp uninterrupted occurrences
of p). To eliminate the possible inconsistency that the model detects more
than one pattern with probability one at the same time, it is assumed that
only patterns with length shorter than some xed bound L are detected
and that Tplp 3L. This ensures Tp is large enough relative to L. The
detection of patterns after a history-independent xed number of repetitions
is to stylized to represent any realistic learning e ort. To accommodate
any form of non-stationary pattern detection a set of necessary conditions
is added to the model. Patterns longer than 2 must appear almost two
times (again, if the next round may complete two uninterrupted
occurrences of the pattern) in the past 2lp 1 rounds. Patterns of length 2 must
appear almost 21=2 times in the past 4 rounds and patterns of length 1
must appear 2 times in the past 2 rounds. For the pattern ABCD, for
example, the last 7 observations must be ABCDABC, while a shorter
pattern AB requires the last 4 observations to be BABA.
5 In confused learning players learn to choose the best learning model and are confused
in the sense that they do not a priori know what the best strategy is.
6 A and AAB are examples of basic patterns, ABAB and AA are not</p>
      <p>A detected pattern p remains detected until it is contradicted, and a
pattern p may only be detected lp rounds after the previous pattern was
detected. The existence of these minimal and the necessary conditions
de ne a range in which pattern detection can occur. Sonsino shows that
convergence of behavior to any xed strategy that is not a pure
equilibrium is incompatible with pattern recognition7, and that convergence of
behavior to some mixed strategy pro le is only possible if agents consider
arbitrarily long histories (and thus impossible for agents with bounded
memory).
4</p>
      <p>Experiment Setup
The algorithms used in our experiment (listed in Table 1) are de ned by
their forecasting and response components so we can quickly inspect not
only the performance of the algorithm as a whole but also the performance
of the individual components without diving into the internal beliefs. This
enables us to quickly identify which behavior to examine in more detail.</p>
      <p>Model Name Forecast Response Parameters
Brown's original FP FP Simple Best reply
Smooth FP SFP Simple Smoothed = 1
Weighted FP WFP Weighted Best reply
Smooth weighted FP SWFP Weighted Smoothed = 1
N -pattern FPN N -Pattern Best reply N = 2; 3
Smoothed n-pattern SFPN N -Pattern Smoothed N = 2; 3, = 1
Weak cycle FPwCL Weak cycle Best reply L = 2; 3; 20
Smoothed weak cycle SFPwCL Weak cycle Smoothed L = 2; 3; 20, = 1
Strong cycle FPsCL Strong cycle Best reply L = 2; 3; 20
Smoothed strong cycle SFPsCL Strong cycle Smoothed L = 2; 3; 20, = 1
7 A detected pattern can only sustain itself if it consists only of pure Nash equilibria,
because the player will break other patterns by playing a best-reply action.
We will use a pattern length n of both 2 and 3, resulting in four
di erent algorithms: FP2, SFP2, FP3 and SFP3. The weight factor
is 0:9.</p>
    </sec>
    <sec id="sec-5">
      <title>Weak cycle forecasting (section 3.2)</title>
      <p>Patterns are only detected at the upper bound when the next round
may complete Tp consecutive occurrences of the pattern. The
pattern length L a ect how quick a pattern is detected because Tp =
oor(3L=lp). We will use the same short pattern length L of 2 and 3,
and another large L of 20 resulting in six di erent algorithms. When
a pattern is detected the algorithms returns a probability
distribution where the next action in the pattern has probability of one. If no
pattern is detected it employs the Simple forcasting algorithm.</p>
    </sec>
    <sec id="sec-6">
      <title>Strong cycle forecasting (section 3.2)</title>
      <p>Patterns are detected at the lower bound described by Sonsino. The
pattern length L does not in uence speed with which patterns are
detected. We will use the same pattern lengths 2, 3 and 20, and this
algorithms also employs the Simple forecasting algorithm when no
pattern is detected.</p>
    </sec>
    <sec id="sec-7">
      <title>Best reply response (section 2.1)</title>
      <p>The best reply correspondence. If there is a tie (i.e. the set BR contains
more than one action) then one is chosen at random.</p>
    </sec>
    <sec id="sec-8">
      <title>Smoothed response (section 2.2.1)</title>
      <p>The trembled best reply algorithm with a response parameter of 1.</p>
      <p>There are 40 algorithms for 2 2 games and 60 for 3 3 games in our
experiment8. For each game each algorithms faces all algorithms,
themselves included, in a tournament where each game lasts 1000 rounds and
is repeated 100 times. Finally they are tested with 100 randomly
generated games, one tournament per game. The games used are:</p>
    </sec>
    <sec id="sec-9">
      <title>Asymmetric Coordination game</title>
      <p>There are two pure strategy equilibria and A
a mixed strategy equilibrium. Both players B
prefer the same equilibrium which Pareto
dominates the other.</p>
      <p>A B
5; 5 0; 0
0; 0 3; 3
8 There are 10 forecasters (FP, WFP, FP2, FP3, FPwC-2, FPwC-3, FPsC-2, FPsC-3
and FPsC-20) times 2 responders (Best reply, Smoothed) times 2 or 3 possible initial
moves equals 40 or 60 algorithms.</p>
    </sec>
    <sec id="sec-10">
      <title>Symmetric Coordination game</title>
      <p>There are two pure strategy equilibria and A
a mixed strategy equilibrium. Neither equi- B
librium is preferred or dominated.</p>
      <p>A B
1; 1 0; 0
0; 0 1; 1
Shapley's game A B C
The only equilibrium is a mixed strategy A 0 1; 0 0; 0 0; 1 1
where each player plays each strategy with B @ 0; 1 1; 0 0; 0 A
equal probability. The game is a non-zero C 0; 0 0; 1 1; 0
sum variant of rock-paper-scissors designed
to cause cyclic behavior in ctitious play
algorithms.</p>
    </sec>
    <sec id="sec-11">
      <title>Battle of the sexes</title>
      <p>There are two Pareto optimal pure strategy A
equilibria and a mixed strategy equilibrium. B
A di erent pure strategy equilibrium is
preferred by each player.</p>
    </sec>
    <sec id="sec-12">
      <title>Chicken</title>
      <p>A variant of Battle of the Sexes with the A
same equilibria, where both players' actions B
corresponding to their preferred pure
equilibria yields the worst outcome for both.</p>
    </sec>
    <sec id="sec-13">
      <title>Matching pennies</title>
      <p>Zero-sum game where the only equilibrium A
is a mixed strategy where each player plays B
each strategy with equal probability.</p>
    </sec>
    <sec id="sec-14">
      <title>Prisoner's dilemma</title>
      <p>Each player has a dominant strategy result- A
ing in a pure strategy equilibrium. The pure B
strategy joint actions A; A is Pareto
optimal.</p>
      <p>B
1; 1
10; 10
A B
3; 2 0; 0
0; 0 2; 3</p>
      <p>A
0; 0
1; 1</p>
      <p>A B
1; 1 1; 1
1; 1 1; 1
A B
2; 2 0; 3
3; 0 1; 1</p>
    </sec>
    <sec id="sec-15">
      <title>Random games</title>
      <p>We've generated 100 random 2 2 games using GAMUT. All eight payo s
lie between 0 and 100.</p>
      <p>Pattern Recognition Results
We will compare FPC with FP and FPN with WFP to see the di erence
with the basic ctitious play variants they are based on. All observations
are based on an in-depth analysis of the results per game which can be
found in the full thesis.
Both strong and weak pattern recognition perform better than FP in the
random games, where only FPN scores higher. Exception is FPwC-20
which performs only marginally better than FP. This is to be expected
since FPwC-20 detects patterns very slowly, in 3L = 60 rounds, and thus
falls back to FP very often. FPC scores higher than FP in Shapley's
game and matching pennies, there is no performance improvement in the
rest of the games. The behavior is exactly the same as FP in both
coordination games and the prisoner's dilemma, almost the same with no
clear advantage or disadvantage in chicken, and slightly disadvantageous
in the battle of the sexes. FPsC performs better than FPwC in
matching pennies and Shapley's game, there is no clear performance di erence
between strong and weak pattern recognition in all other games.</p>
      <p>The pattern length L does not in uence the performance of strong
pattern recognition. Shorter patterns are allowed to be detected sooner,
and with only two actions fA; Bg a pattern longer than three actions
always contains either ABAB or AA making it impossible to detect longer
patterns. Detection speed is independent of L and the ability to detect
longer patterns is not an advantage because the detection of shorter
pattern always blocks the detection of longer pattern.</p>
      <p>The di erence between weak pattern recognition with a di erent L
responsible for the observed performance di erences is simply that FPwC-3
is slightly slower in detecting patterns because of the higher pattern length
L. In comparison with FPwC-2 and FPwC-3, FPwC-20 is much slower in
detecting patterns and very likely to not detect any patterns at all because
patterns need 60=lp repetitions to be detected; short patterns are likely to
be broken before they can be detected and long patterns are unlikely to
occur at all. FPwC-20 performs exactly like FP in matching pennies and
Shapley's game, and better than FP but still worse than FPwC-2 and
FPwC-3 in the battle of the sexes, matching pennies, Shapley's game and
in the random games.
5.2
In the random generated games, Shapley's game and matching pennies
FPN outperforms all other algorithms tested in our experiment. It
performs better than WFP in the symmetric coordination game. In the battle
of the sexes WFP scores higher than FP3 but lower than FP2, and in the
asymmetric coordination game WFP scores equal to FP3 and lower than
FP2.</p>
      <p>FPN quickly detects change if the opponent switches from one
singleton pattern to another. Because AA and BB will occur more often than
AB and BA in the observed history of play FPN will forecast the same
action the opponent played in the previous round. This forecast is only
wrong for one round when the opponent will switch actions.</p>
      <p>The di erences between N = 2 and N = 3 are marginal. FP3 performs
slightly better in the random games and asymmetric coordination game,
where FP2's forecast is wrong in one round. FP2 performs slightly better
in matching pennies. There is no di erence in performance between FP2
and FP3 in the remaining games.
6</p>
      <p>Conclusion
In this paper we have studied two distictly di erent approaches to
perform pattern recognition on the observerd history of play, obtained
empirical data on the performance of those algorithms and the ctitious
play variants they extend, and found the new pattern-recognition based
FP variants to be signi cantly more e ective than the traditional FP
variants.</p>
      <p>N -period ctitious play is an extension of weighted ctitious play inspired
by pattern recognition, which has proved itself to be signi cantly more
e ective than WFP. FP3 performs only slightly better than FP2 in two,
and slightly worse in one of the games tested in our experiment. We
cannot at this stage, however, recommend not to use a pattern length
N &gt; 2. The costs are higher, but the e ectiveness of a higher pattern
length against other opponents, which are not related to FP, remains to
be tested.</p>
      <p>In contrast to FPN , the strong and weak pattern recognition
capabilities of PFC have no relation at all to the ctitious play forecaster
algorithm, which in our experiments only serves as a fallback forecaster
in case no patterns are detected. It can easily be replaced by any other
adaptive algorithm, as Sonsino describes. Furthermore, the strong and
weak detection algorithms are merely implementations of the lower and
upper bounds of the area where, in Sonsono's model, pattern recognition
may occur. This does not mean that they are bad algorithms per se, nor
does it mean that good or bad results necessarily imply cyclic pattern
recognition at the end of the observed history of play is a good or bad
strategy. The results of these two approaches do, however, give valuable
insights that should be taken into account when designing a cyclic pattern
recognizing algorithm.</p>
      <p>The results we obtained are encouraging and illustrate the possibilities
and merits of applying pattern recognition in machine learning and game
theory. Adding a layer of pattern recognition on top of FP improves
performance signi cantly in comparison to FP on its own and FPN nearly
always outperforms WFP.</p>
      <p>Fictitious play is a basic algorithm which has its limitations and
weaknesses yet is very e ective for its simplicity. The pattern recognition
algorithms we studied share some of those strengths and weaknesses because
they are either and extension of FP or use FP as a fallback strategy, and
the matches show similar behavior because there were only FP-based
opponents participating in the tournament. But FPN and FPC do show
di erent behavior, overcoming FP's vulnerably to cyclic behavior and
introducing new problems coordinating in self-play, for example.</p>
      <p>The eld of pattern recognition, however, is vast and o ers many idea's
and possibilities for (advanced) pattern recognition that have no relation
at all to ctitious play. Con ning further research to ctitious play is an
unnecessary limitation on the broad spectrum of possibilities that pattern
recognition has to o er. Despite the e ectiveness of the pattern-detection
based ctitious play algorithms studied, pattern detection should be seen
as a stand-alone approach to machine learning. The best reply response
works well in combination with the tested pattern detecting forecasts, but
we should not ignore possibilities where pattern recognition provides both
components or a complete learning algorithm where such a distinction is
not applicable at all.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Aoyagi</surname>
          </string-name>
          .
          <article-title>Evolution of beliefs and the nash equilibrium of normal form games</article-title>
          .
          <source>Journal of Economic Theory</source>
          ,
          <volume>70</volume>
          (
          <issue>2</issue>
          ):
          <volume>444</volume>
          {
          <fpage>469</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.W.</given-names>
            <surname>Brown</surname>
          </string-name>
          .
          <article-title>Iterative solution of games by ctitious play</article-title>
          .
          <source>Activity analysis of production and allocation</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>374</volume>
          {
          <fpage>376</fpage>
          ,
          <year>1951</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Y.W.</given-names>
            <surname>Cheung</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <article-title>Individual learning in normal form games: Some laboratory results</article-title>
          .
          <source>Games and Economic Behavior</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):
          <volume>46</volume>
          {
          <fpage>76</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Chu</surname>
          </string-name>
          .
          <article-title>Extending Fictitious Play with Pattern Recognition</article-title>
          .
          <source>Master's thesis</source>
          , Universiteit Utrecht,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cournot</surname>
          </string-name>
          .
          <article-title>Research into the mathematical principles of the theory of wealth (1838), trans</article-title>
          .
          <source>BACON</source>
          . New York, 2,
          <year>1897</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Fudenberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kreps</surname>
          </string-name>
          .
          <article-title>Learning mixed equilibria</article-title>
          . MIT Press,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>D.</given-names>
            <surname>Fudenberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.K.</given-names>
            <surname>Levine</surname>
          </string-name>
          .
          <article-title>The theory of learning in games</article-title>
          , volume
          <volume>2</volume>
          . The MIT press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.C.</given-names>
            <surname>Harsanyi</surname>
          </string-name>
          .
          <article-title>Games with randomly disturbed payo s: A new rationale for mixedstrategy equilibrium points</article-title>
          .
          <source>International Journal of Game Theory</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>23</fpage>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lahav</surname>
          </string-name>
          .
          <article-title>Behavioral pattern learning models for decision making in games</article-title>
          .
          <source>Journal of Pattern Recognition Research</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <volume>133</volume>
          {
          <fpage>151</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>R.D. McKelvey</surname>
            and
            <given-names>T.R.</given-names>
          </string-name>
          <string-name>
            <surname>Palfrey</surname>
          </string-name>
          .
          <article-title>Quantal response equilibria for normal form games</article-title>
          .
          <source>Games and economic behavior</source>
          ,
          <volume>10</volume>
          (
          <issue>1</issue>
          ):6{
          <fpage>38</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>J.</given-names>
            <surname>Robinson</surname>
          </string-name>
          .
          <article-title>An iterative method of solving a game</article-title>
          .
          <source>The Annals of Mathematics</source>
          ,
          <volume>54</volume>
          (
          <issue>2</issue>
          ):
          <volume>296</volume>
          {
          <fpage>301</fpage>
          ,
          <year>1951</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>T.F. Ro</surname>
          </string-name>
          <article-title>theli. Pattern recognition and procedurally rational expectations</article-title>
          .
          <source>Journal of economic behavior &amp; organization</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <volume>71</volume>
          {
          <fpage>90</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>D.</given-names>
            <surname>Sonsino</surname>
          </string-name>
          .
          <article-title>Learning to learn, pattern recognition, and nash equilibrium</article-title>
          .
          <source>Games and Economic Behavior</source>
          ,
          <volume>18</volume>
          (
          <issue>2</issue>
          ):
          <volume>286</volume>
          {
          <fpage>331</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>L.</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          .
          <article-title>Pattern recognition and subjective belief learning in repeated mixed strategy games</article-title>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>L.</given-names>
            <surname>Spiliopoulos</surname>
          </string-name>
          .
          <article-title>Pattern recognition and subjective belief learning in a repeated constant-sum game</article-title>
          .
          <source>Games and Economic Behavior</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>H. Peyton</given-names>
            <surname>Young</surname>
          </string-name>
          .
          <source>Learning. In Individual Strategy and Social Structure: An Evolutionary Theory of Institutions</source>
          , chapter
          <volume>2</volume>
          , pages
          <fpage>25</fpage>
          {
          <fpage>43</fpage>
          . Princeton University Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>