<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Euro 2016 Predictions Using Team Rating Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Lasek</string-name>
          <email>jan.lasek@deepsense.io</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>In this study we employ several rating systems to generate predictions for the outcome of 2016 European Championships in association football. To this end, we rst estimate probabilities of match results between all competing nations using the rating systems. Secondly, via Monte Carlo simulations we compute probabilities of advancing past a given stage of the tournament. The approach was developed for the Euro 2016 Prediction Competition organized within Sport Analytics Workshop at ECML/PKDD 2016.</p>
      </abstract>
      <kwd-group>
        <kwd>team ratings</kwd>
        <kwd>rating systems</kwd>
        <kwd>predictions</kwd>
        <kwd>Euro 2016</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Rating systems for football teams</title>
      <p>There have been multiple rating systems for various sport developed throughout
the years. We discuss here three di erent models for rating teams in
association football: the ordinal logistic regression model, the least squares model and
the Poisson model.</p>
      <p>
        Ordinal regression ratings. The rst rating system discussed is ordered
logistic regression as the match results model [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ]. Under this model, each team
is associated with a single parameter { a rating { re ecting its strength. Teams'
strength parameters are estimated based on the outcomes of games between
the teams. Let ri, rj be ratings of two teams i and j and with team i playing
at home ground. According to the model, if H and A denote a home and away
team win, respectively, and D corresponds to a draw, the probabilities of these
events are linked with teams' ratings parameters with the following equations
P(H) =
P(D) =
P(A) = 1
      </p>
      <p>1
1 + ec (rj rj+h) ;</p>
      <p>1
1 + e c (rj rj+h)</p>
      <p>1
1 + e c (rj rj+h)
;</p>
      <p>
        1
1 + ec (rj rj+h)
;
where c &gt; 0 is an intercept and h is a parameter introduced to account for
the home team advantage [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To estimate the model's parameter weighted
maximum likelihood method can be used. The estimation proceeds as follows.
Let r = (r1; r2; : : : ; rn) denote the vector of teams' ratings. Let us denote
by L(M jr; h; c) the weighted log-likelihood function of the results observed in
dataset of matches M given model parameters r; h; c. Each match m 2 M is
described as a tuple m = (i; j; k; t) where i; j are indexes encoding particular
teams, k is the type of a match (friendly, quali er to a major tournament or
a major tournament match) and t is time elapsed from the estimation period.
The estimation period is understood as the time at which we want to estimate
ratings for. The log-likelihood function is weighted by both the time the game
took place and the importance of a match. It is de ned as
      </p>
      <p>L(M jr; h; c) =
1</p>
      <p>
        X
jM j m2M
(m) log P(Rm);
where Rm 2 fH; D; Ag is the actual result of match m. We assume that the
weighting function has the form of (m) = (k) e t, where ( ) is a function that
maps match type to a numerical value representing its importance and is a time
decay parameter. The idea here is to give a higher weight for recent results as
well as di erent weights according to match type. To estimate team ratings we
minimise
Least squares ratings. The next rating system is based on a simple
observation that the di erence si sj in the scores produced by the teams should
correspond to the di erence in ratings
si
sj = ri
rj + h:
Again, h is a correction for the home team i advantage. The rating system's name
originates from its estimation method: one nds ratings ri such that the sum
of squared di erences (over a set of games) between the two sides of the above
equation is minimal [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The sum of squares function can be weighted in an
analogous manner as discussed in case of the ordinal logistic regression model.
      </p>
      <p>
        For the least squares model, we still need to generate probabilities for
particular outcomes. This is done rst by computing a logistic regression model with
binary outcomes as described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Next, the binary outcomes are mapped to
a three-way-outcome by a method proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Poisson model. The nal rating system that we discuss is based on the
assumption that the goals scored by a team can be modelled as a Poisson distributed
variable. The mean rate of this variable is dependent on the attacking
capabilities of a team and the defensive skills of its opponent. This extends ratings to
two parameters { o ensive and defensive skills per team as opposed to a single
parameter in the methods discussed above.</p>
      <p>Given the attacking and defensive skills of teams i and j, ai, aj and di, dj ,
respectively, the rates of Poisson variables for a home team i and visiting team
j, and respectively, are modelled as:
= c + h + ai</p>
      <p>
        dj ;
= c + aj
di;
where c is an intercept and h accounts for home team advantage. Under this
model, the probability of a score x to y is a product of two individual Poisson
variables with rates and respectively and equal to xxe! y ye! : Given
a dataset of matches, one can estimate the team rating parameters using the
maximum likelihood method. The likelihood can be weighted in a similar manner
as discussed in case of ordinal logistic regression model. Here, for simplicity we
employ the basic version of the model that assumes that the Poisson variables
corresponding to the goals scored by the teams, given their rating parameters,
are independent [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. There are studies which relax this assumption [
        <xref ref-type="bibr" rid="ref4 ref8">8, 4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Tuning the predictive performance</title>
      <p>We used the rating systems presented here to estimate win, draw and loss
probabilities for every pair of possible match-ups among the 24 teams
participating in Euro 2016. Given these probabilities, we simulated the
tournament multiple times and computed each team's probability of winning it all.
We used the database of international football match results provided at http:
//laenderspiel.cmuck.de/.1</p>
      <p>First of all, the rating systems involve some adjustable parameters e.g.,
weights for importance of matches, a weighing function for most recent results
and regularization parameters. We tuned these parameters (by exploring a grid
1 Thanks to the website's maintainer Christian Muck for generously exporting
the data.
of values) to maximize the predictive accuracy of the models: using a sample
of games, we predicted their results and evaluated them. For tuning the
parameters, we chose matches from major international tournaments { World Cup
nals, European Championships and Copa America.</p>
      <p>
        The parameters of the ratings systems are chosen for World Cup nals held
between 1994 and 2010 (5 tournaments), UEFA European Championships
19962008 (4) and Copa America nals 1999-2011 (5). This accounts for a set of 562
matches. In the competition, the prediction accuracy is evaluated using
logarithmic loss (logloss). Accordingly, we use this metric to tune the models'
parameters. This error metric is calculated as m1 Pm
i=1 log P(Rm); where P(Rm) is
the probability of the nal outcome of i-th game in data attributed by the model,
i = 1; 2; : : : ; m. A more direct interpretation could be provided by accuracy that
is de ned as the percentage of matches that were correctly predicted by a given
method. To estimate the nal e cacy of the methods we present results on
the validation sample comprising of 2014 World Cup nals, 2012 UEFA
European Championships and 2015 Copa America. To provide some context for
the numbers, we present a benchmark solution of random guessing and
probabilities derived from an average of bookmakers' odds. A random guess yields
a logloss of log(1=3) 1:1 and accuracy of 33% for a three-way outcome. We
also show scores achieved by two benchmark solutions based on the Elo model:
EloRatings.net and FIFA Women World Rankings methodology [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ].
      </p>
      <p>The results achieved by bookmakers (in terms of logloss) are better than
all the individual rating methods. Of course, the bookmakers can include some
additional information on player injuries, suspensions or a teams form during
the contest { this provides them with an advantage over the models. Including
such external information would be the next step to enhancing the accuracy of
the presented models. In any case, the accuracy of predictions is slightly better
in case of the rating systems. The bottom row of the table presents results for
an ensemble method { which is the average of predictions for the three best
performing methods: least squares, Poisson and ordinal regression ratings. It is
a simple method for increasing the predictive power of individual models. We
observe that this method slightly improves logloss while maintaining accuracy.</p>
    </sec>
    <sec id="sec-4">
      <title>Simulations of tournament outcome</title>
      <p>Given match outcome probabilities for each possible match-up, we simulated
1,000,000 tournaments (that many repetitions appear to provide stable results).
We sampled only win, draw and loss results. If - after considering head-to-head
results - the teams are still tied in the group stage, we resolved such ties
randomly. According to the tournament's o cial rules, we should use goal di
erences, however, this information is not available in our simulation.2 If there is
a draw in the play-o s, we sample the result again.</p>
      <p>Table 2 presents the predictions generated using the ensemble of the three
introduced ratings systems. The consecutive columns indicate the probability of
advancing to a given stage of the competition. For example, the number next
to Portugal in the rst column indicates that there is a 91.37% chance that it
will advance past the group stage. The last column indicates a team's chance of
winning the whole tournament.
2 Notably, coin-tosses were used to resolve ties (if the game was tied after extra-time)
before the penalty shoot-out was \invented." For instance, on its way to winning
Euro 1968, Italy \won" its semi nal with the USSR through a coin toss.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>We see that France tops the ranking for the championship race in terms of
associated probability. The 12th man is behind them { they are playing at home and
the methods we used give them some edge due to this fact. On the other hand,
the prediction for four-time World Cup winners Italy is somewhat
discouraging. In recent years, Italy has seen disappointing results, including draws with
Armenia, Haiti and Luxembourg (not to mention their 2010 and 2014 World
Cup records). However, what the rating system could not infer is the fact that
the Italian team usually rises to the occasion when faced with a major challenge
{ which usually happens at the big tournaments.</p>
      <p>The rating methods presented here have some limitations. There are many
factors in uencing match results and we only covered simple predictive models
based on historical data. Naturally, one could use some external and more
sophisticated information e.g., players and their skills, and include it in a model.
This could greatly improve the models' accuracy.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aitchison</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Silvey</surname>
          </string-name>
          , S.D.:
          <article-title>The Generalization of Probit Analysis to the Case of Multiple Responses</article-title>
          . Biometrika Vol.
          <volume>44</volume>
          , No.
          <volume>1</volume>
          {
          <issue>2</issue>
          , pp.
          <volume>131</volume>
          {
          <issue>140</issue>
          (
          <year>1957</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. EloRatings.net, http://eloratings.net/, on-line resources,
          <source>last access date 6 July 2016</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <article-title>Euro 2016 Predictions Using Team Rating Systems</article-title>
          , http://deepsense.io/ euro-2016-predictions/, on-line resources,
          <source>last access date 6 July 2016</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dixon</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Coles</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          :
          <article-title>Modelling Association Football Scores and Ine ciencies in the Football Betting Market</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series C (Applied Statistics)</source>
          , Vol.
          <volume>46</volume>
          , No.
          <issue>2</issue>
          , pp.
          <volume>265</volume>
          {
          <issue>280</issue>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>FIFA</given-names>
            <surname>Women</surname>
          </string-name>
          <article-title>'s World Ranking Methodology</article-title>
          , http://www.fifa.com/ fifa-world-ranking/procedure/women.html, on
          <article-title>-line resources, last access date 4 July 2016</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Glickman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Parameter Estimation in Large Dynamic Paired Comparison Experiments, Applied Statistics, Vol.
          <volume>48</volume>
          , No.
          <issue>3</issue>
          , pp.
          <volume>377</volume>
          {
          <issue>394</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Koning</surname>
            <given-names>R.H.</given-names>
          </string-name>
          :
          <article-title>Balance in Competition in Dutch Soccer</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series C (The Statistician)</source>
          , Vol.
          <volume>49</volume>
          , No.
          <issue>3</issue>
          , pp.
          <volume>419</volume>
          {
          <issue>431</issue>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          : Modelling Association Football Scores.
          <source>Statistica Neerlandica</source>
          , Vol.
          <volume>36</volume>
          , No.
          <issue>3</issue>
          , pp.
          <volume>109</volume>
          {
          <issue>118</issue>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pollard</surname>
          </string-name>
          , R.:
          <article-title>Home Advantage in Football: A Current Review of an Unsolved Puzzle</article-title>
          ,
          <source>The Open Sports Sciences Journal</source>
          , Vol.
          <volume>1</volume>
          , No.
          <issue>1</issue>
          , pp.
          <volume>12</volume>
          {
          <issue>14</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Schrader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Developing and Elo Rating for Major League Soccer and Predicting End of Season Finish</article-title>
          , https://drive.google.com/file/d/0Bxr6KEe4KY_
          <article-title>OYnJuLUw1WF9GcGs/view?pli=1, on-line resources, last access date 6 July 2016</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stefani</surname>
          </string-name>
          , R.:
          <article-title>Football and Basketball Predictions Using Least Squares</article-title>
          ,
          <source>IEEE Transactions on Systems, Man and Cybernetics</source>
          , Vol.
          <volume>7</volume>
          , No.
          <issue>2</issue>
          , pp.
          <volume>117</volume>
          {
          <issue>121</issue>
          (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Hastie</surname>
          </string-name>
          , T.:
          <article-title>Reguralization and Variable Selection via the Elastic Net</article-title>
          .
          <source>Journal of the Royal Statistical Society, Series B (Statistical Methodology)</source>
          , Vol.
          <volume>67</volume>
          , pp.
          <volume>301</volume>
          {
          <issue>320</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>