Introduction

Euro 2016 Predictions Using Team Rating Systems

Jan Lasek

jan.lasek@deepsense.io

2016

In this study we employ several rating systems to generate predictions for the outcome of 2016 European Championships in association football. To this end, we rst estimate probabilities of match results between all competing nations using the rating systems. Secondly, via Monte Carlo simulations we compute probabilities of advancing past a given stage of the tournament. The approach was developed for the Euro 2016 Prediction Competition organized within Sport Analytics Workshop at ECML/PKDD 2016.

team ratings rating systems predictions Euro 2016

Introduction Rating systems for football teams

There have been multiple rating systems for various sport developed throughout the years. We discuss here three di erent models for rating teams in association football: the ordinal logistic regression model, the least squares model and the Poisson model.

Ordinal regression ratings. The rst rating system discussed is ordered logistic regression as the match results model [ 1, 7 ]. Under this model, each team is associated with a single parameter { a rating { re ecting its strength. Teams' strength parameters are estimated based on the outcomes of games between the teams. Let ri, rj be ratings of two teams i and j and with team i playing at home ground. According to the model, if H and A denote a home and away team win, respectively, and D corresponds to a draw, the probabilities of these events are linked with teams' ratings parameters with the following equations P(H) = P(D) = P(A) = 1

1 1 + ec (rj rj+h) ;

1 1 + e c (rj rj+h)

1 1 + e c (rj rj+h) ;

1 1 + ec (rj rj+h) ; where c > 0 is an intercept and h is a parameter introduced to account for the home team advantage [ 9 ]. To estimate the model's parameter weighted maximum likelihood method can be used. The estimation proceeds as follows. Let r = (r1; r2; : : : ; rn) denote the vector of teams' ratings. Let us denote by L(M jr; h; c) the weighted log-likelihood function of the results observed in dataset of matches M given model parameters r; h; c. Each match m 2 M is described as a tuple m = (i; j; k; t) where i; j are indexes encoding particular teams, k is the type of a match (friendly, quali er to a major tournament or a major tournament match) and t is time elapsed from the estimation period. The estimation period is understood as the time at which we want to estimate ratings for. The log-likelihood function is weighted by both the time the game took place and the importance of a match. It is de ned as

L(M jr; h; c) = 1

X jM j m2M (m) log P(Rm); where Rm 2 fH; D; Ag is the actual result of match m. We assume that the weighting function has the form of (m) = (k) e t, where ( ) is a function that maps match type to a numerical value representing its importance and is a time decay parameter. The idea here is to give a higher weight for recent results as well as di erent weights according to match type. To estimate team ratings we minimise Least squares ratings. The next rating system is based on a simple observation that the di erence si sj in the scores produced by the teams should correspond to the di erence in ratings si sj = ri rj + h: Again, h is a correction for the home team i advantage. The rating system's name originates from its estimation method: one nds ratings ri such that the sum of squared di erences (over a set of games) between the two sides of the above equation is minimal [ 11 ]. The sum of squares function can be weighted in an analogous manner as discussed in case of the ordinal logistic regression model.

For the least squares model, we still need to generate probabilities for particular outcomes. This is done rst by computing a logistic regression model with binary outcomes as described in [ 6 ]. Next, the binary outcomes are mapped to a three-way-outcome by a method proposed in [ 10 ].

Poisson model. The nal rating system that we discuss is based on the assumption that the goals scored by a team can be modelled as a Poisson distributed variable. The mean rate of this variable is dependent on the attacking capabilities of a team and the defensive skills of its opponent. This extends ratings to two parameters { o ensive and defensive skills per team as opposed to a single parameter in the methods discussed above.

Given the attacking and defensive skills of teams i and j, ai, aj and di, dj , respectively, the rates of Poisson variables for a home team i and visiting team j, and respectively, are modelled as: = c + h + ai

dj ; = c + aj di; where c is an intercept and h accounts for home team advantage. Under this model, the probability of a score x to y is a product of two individual Poisson variables with rates and respectively and equal to xxe! y ye! : Given a dataset of matches, one can estimate the team rating parameters using the maximum likelihood method. The likelihood can be weighted in a similar manner as discussed in case of ordinal logistic regression model. Here, for simplicity we employ the basic version of the model that assumes that the Poisson variables corresponding to the goals scored by the teams, given their rating parameters, are independent [ 8 ]. There are studies which relax this assumption [ 8, 4 ]. 3

Tuning the predictive performance

We used the rating systems presented here to estimate win, draw and loss probabilities for every pair of possible match-ups among the 24 teams participating in Euro 2016. Given these probabilities, we simulated the tournament multiple times and computed each team's probability of winning it all. We used the database of international football match results provided at http: //laenderspiel.cmuck.de/.1

First of all, the rating systems involve some adjustable parameters e.g., weights for importance of matches, a weighing function for most recent results and regularization parameters. We tuned these parameters (by exploring a grid 1 Thanks to the website's maintainer Christian Muck for generously exporting the data. of values) to maximize the predictive accuracy of the models: using a sample of games, we predicted their results and evaluated them. For tuning the parameters, we chose matches from major international tournaments { World Cup nals, European Championships and Copa America.

The parameters of the ratings systems are chosen for World Cup nals held between 1994 and 2010 (5 tournaments), UEFA European Championships 19962008 (4) and Copa America nals 1999-2011 (5). This accounts for a set of 562 matches. In the competition, the prediction accuracy is evaluated using logarithmic loss (logloss). Accordingly, we use this metric to tune the models' parameters. This error metric is calculated as m1 Pm i=1 log P(Rm); where P(Rm) is the probability of the nal outcome of i-th game in data attributed by the model, i = 1; 2; : : : ; m. A more direct interpretation could be provided by accuracy that is de ned as the percentage of matches that were correctly predicted by a given method. To estimate the nal e cacy of the methods we present results on the validation sample comprising of 2014 World Cup nals, 2012 UEFA European Championships and 2015 Copa America. To provide some context for the numbers, we present a benchmark solution of random guessing and probabilities derived from an average of bookmakers' odds. A random guess yields a logloss of log(1=3) 1:1 and accuracy of 33% for a three-way outcome. We also show scores achieved by two benchmark solutions based on the Elo model: EloRatings.net and FIFA Women World Rankings methodology [ 2, 5 ].

The results achieved by bookmakers (in terms of logloss) are better than all the individual rating methods. Of course, the bookmakers can include some additional information on player injuries, suspensions or a teams form during the contest { this provides them with an advantage over the models. Including such external information would be the next step to enhancing the accuracy of the presented models. In any case, the accuracy of predictions is slightly better in case of the rating systems. The bottom row of the table presents results for an ensemble method { which is the average of predictions for the three best performing methods: least squares, Poisson and ordinal regression ratings. It is a simple method for increasing the predictive power of individual models. We observe that this method slightly improves logloss while maintaining accuracy.

Simulations of tournament outcome

Given match outcome probabilities for each possible match-up, we simulated 1,000,000 tournaments (that many repetitions appear to provide stable results). We sampled only win, draw and loss results. If - after considering head-to-head results - the teams are still tied in the group stage, we resolved such ties randomly. According to the tournament's o cial rules, we should use goal di erences, however, this information is not available in our simulation.2 If there is a draw in the play-o s, we sample the result again.

Table 2 presents the predictions generated using the ensemble of the three introduced ratings systems. The consecutive columns indicate the probability of advancing to a given stage of the competition. For example, the number next to Portugal in the rst column indicates that there is a 91.37% chance that it will advance past the group stage. The last column indicates a team's chance of winning the whole tournament. 2 Notably, coin-tosses were used to resolve ties (if the game was tied after extra-time) before the penalty shoot-out was \invented." For instance, on its way to winning Euro 1968, Italy \won" its semi nal with the USSR through a coin toss.

Discussion

We see that France tops the ranking for the championship race in terms of associated probability. The 12th man is behind them { they are playing at home and the methods we used give them some edge due to this fact. On the other hand, the prediction for four-time World Cup winners Italy is somewhat discouraging. In recent years, Italy has seen disappointing results, including draws with Armenia, Haiti and Luxembourg (not to mention their 2010 and 2014 World Cup records). However, what the rating system could not infer is the fact that the Italian team usually rises to the occasion when faced with a major challenge { which usually happens at the big tournaments.

The rating methods presented here have some limitations. There are many factors in uencing match results and we only covered simple predictive models based on historical data. Naturally, one could use some external and more sophisticated information e.g., players and their skills, and include it in a model. This could greatly improve the models' accuracy.

1. Aitchison , J. and Silvey , S.D.: The Generalization of Probit Analysis to the Case of Multiple Responses . Biometrika Vol. 44 , No. 1 { 2 , pp. 131 { 140 ( 1957 )

2. EloRatings.net, http://eloratings.net/, on-line resources, last access date 6 July 2016

3. Euro 2016 Predictions Using Team Rating Systems , http://deepsense.io/ euro-2016-predictions/, on-line resources, last access date 6 July 2016

4. Dixon , M.J. and Coles , S.G. : Modelling Association Football Scores and Ine ciencies in the Football Betting Market . Journal of the Royal Statistical Society: Series C (Applied Statistics) , Vol. 46 , No. 2 , pp. 265 { 280 ( 1997 )

FIFA

Women 's World Ranking Methodology , http://www.fifa.com/ fifa-world-ranking/procedure/women.html, on -line resources, last access date 4 July 2016

6. Glickman , M. : Parameter Estimation in Large Dynamic Paired Comparison Experiments, Applied Statistics, Vol. 48 , No. 3 , pp. 377 { 394 ( 1999 )

7. Koning

R.H.

: Balance in Competition in Dutch Soccer . Journal of the Royal Statistical Society: Series C (The Statistician) , Vol. 49 , No. 3 , pp. 419 { 431 ( 2000 )

8. Maher , M.J. : Modelling Association Football Scores. Statistica Neerlandica , Vol. 36 , No. 3 , pp. 109 { 118 ( 1982 )

9. Pollard , R.: Home Advantage in Football: A Current Review of an Unsolved Puzzle , The Open Sports Sciences Journal , Vol. 1 , No. 1 , pp. 12 { 14 ( 2008 )

10. Schrader , A. : Developing and Elo Rating for Major League Soccer and Predicting End of Season Finish , https://drive.google.com/file/d/0Bxr6KEe4KY_ OYnJuLUw1WF9GcGs/view?pli=1, on-line resources, last access date 6 July 2016

11. Stefani , R.: Football and Basketball Predictions Using Least Squares , IEEE Transactions on Systems, Man and Cybernetics , Vol. 7 , No. 2 , pp. 117 { 121 ( 1977 )

12. Zou , H.

and

Hastie , T.: Reguralization and Variable Selection via the Elastic Net . Journal of the Royal Statistical Society, Series B (Statistical Methodology) , Vol. 67 , pp. 301 { 320 ( 2005 )