<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ranking Footballers with Multilevel Modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gregor Grbec</string-name>
          <email>ggrbec@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nino Bašić</string-name>
          <email>nino.basic@famnit.upr.si</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marko Tkalčič</string-name>
          <email>marko.tkalcic@famnit.upr.si</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Koper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Slovenia</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Primorska, Faculty of Mathematics</institution>
          ,
          <addr-line>Natural Sciences and Information Technologies, Glagoljaška 8</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Despite football's collaborative nature, the inquiry into the identity of the best player is a frequent topic in the footballing realm. This discussion disproportionately highlights attacking players, creating an apparent bias, as every team role holds significance. Our study aimed to delineate player performance from team performance and ensure the inclusion of players from all positions in the ultimate ranking of the best players. We sourced data from FBref, encompassing every player in every match played by a top 20 European team in the current century's top 5 European leagues. Employing a multilevel linear mixed-efects model, we utilized team points as the response variable, accounting for both player and opponent team strength. The extraction of level-2 player residuals, averaged by player, facilitated the creation of a comprehensive ranking for the best players of this century. Surprisingly, two players widely regarded as among the best of all time, Messi and Ronaldo, secured relatively low positions on our list (Ronaldo at 12th, and Messi at 14th).</p>
      </abstract>
      <kwd-group>
        <kwd>ranking</kwd>
        <kwd>football</kwd>
        <kwd>multi-level modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Despite football being a team sport, determining the best male football player remains a widely
debated topic. In every era, a standout candidate emerges—Pelé in the 1960s, Diego Maradona in
the 1980s, and more recently, Lionel Messi and Cristiano Ronaldo dominating the past 15 years.
The ongoing debate centers on which of these four players is the all-time best. Our attempt to
provide a data-driven solution was hindered by the historical match data’s poor quality, leading
CEUR
Workshop
Proceedings
the best player debate. Goals scored, being the most coveted statistic, contributes to this bias.
Defenders’ performance is typically measured by goals conceded per game, yet this metric is
assigned to the entire defense, creating an imbalance in player position evaluation. Our second
research objective aimed to rectify this bias and provide a fair comparison among players
irrespective of their position or style of play.</p>
      <p>To fulfill our research goals—ranking players based on impact and ensuring equal
opportunities for all positions—we employed a multilevel mixed-efects model. This model, utilizing
achieved points in the game as a performance metric, underwent training on the last 23 seasons
of every league match involving the top 20 European football teams across the top 5 European
leagues.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>In this section, we explore prior research on ranking individual ability and multilevel modeling
in team sports.</p>
      <p>
        Brooks et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] assessed players’ ofensive ability by analyzing completed passes leading
to shots. They predicted pass quality by training a model on La Liga data, ranking players
based on the quality of their passes. McHale and Scarf [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] also ranked players, correlating a
team’s and player’s contributions to match outcomes. Their index awarded points for player
contributions, validated in the Premier League, with a focus on eliminating player role bias.
      </p>
      <p>
        Pappalardo et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used extensive event data from various leagues to rank players, employing
weights for metrics. While successful in extracting player performance, these studies did not
account for player role bias.
      </p>
      <p>
        Mixed-efects modeling in football research includes Grund [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], who studied passing
structures’ impact on match outcomes. Beyond football, Casals and Martinez [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] analyzed basketball
player performance, while Gerber and Craig [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] predicted baseball players’ performance across
leagues.
      </p>
      <p>
        Inspired by Bell et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], our study adopted multilevel modeling to extract player performance
from team performance. Their F1 driver analysis, employing a cross-classified model, served as
a valuable model for our approach. The model, controlling for team switches and opponent
strength, allowed us to eliminate player role bias and extract meaningful player performance
metrics.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Modeling</title>
      <p>
        The linear mixed-efects model, also referred to as a mixed model, random efects model,
multilevel model, or hierarchical model, serves as a statistical model tailored for hierarchical
data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>This model empowers the control of variables at higher levels, efectively addressing the
variation and correlation within the data structure to yield more precise outcomes. It encompasses
ifxed variables, representing coeficients with a consistent impact on the response variable
across all groups, and at least one random variable, introducing a variable efect contingent on
the group.</p>
      <p>
        Building upon a fundamental linear regression model, the mixed-efects model enables the
variation of intercepts and/or slopes of the regression line across diferent data groups for select
variables. For instance, in a study by Bell et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], team strength was controlled for, recognizing
that the performance of drivers from superior teams, like Ferrari, may difer from those in
weaker teams like Renault. This flexibility allowed for a nuanced assessment of driver quality,
accounting for team efects on intercepts and slopes.
      </p>
      <sec id="sec-4-1">
        <title>3.1. Toy Example</title>
        <p>Consider predicting students’ performance on the fictional National Test of Mathematics based
on their average percentage of points achieved in their Mathematics class. The data is nested
on two levels: the school (School A and School B) and the student. Each data point represents a
student’s average percentage in class and the competition.</p>
        <p>School A is known for its strict grading, while School B is more lenient. Predicting overall
performance without accounting for school variations would be inaccurate due to the substantial
diference in expected competition scores between the schools.</p>
        <p>The mixed-efects multilevel model addresses this issue by allowing control for school,
enhancing prediction accuracy. In our case, random slopes are not suitable; thus, random
intercepts and fixed percentages of points in school are included in the formula:
   
 = ( 0 +  0,ℎ
) + ( 1 +  1,ℎ
) ⋅    ℎ + 

Here,  0 and  1 represent the overall intercept and slope,  0,ℎ
and  1,ℎ
variations by school, and   is the student’s residual. The overall regression line is:
account for
 =    ℎ − 5.3
The school-specific lines are:
 = 0.81 ⋅    ℎ + 24.5
 = 1.18 ⋅    ℎ − 35.1
(School A)
(School B)</p>
        <p>Intercept and slope values for the overall and school-specific cases are presented in Table ??.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Footballer’s Ranking Case</title>
        <p>In our study, we employed a similar framework featuring two fixed efects (opponents’s points
per game and home or away indicator) and 3 random efects (team, team in a particular season,
and player). These incorporate random intercepts and slopes, varying based on the predicting
variables. The data is nested across four levels, comprising 20 teams, each spanning multiple
seasons, players associated with one or multiple clubs across diferent seasons, and repeated
measures for every match of every player.</p>
        <p>For instance, Cristiano Ronaldo participated in 597 matches over 14 seasons for 3 diferent
clubs. Teams varied in participation, with RB Leipzig, for instance, joining the Bundesliga from
the 18/19 season onwards but achieving significant success in those four seasons, securing a
spot in our top 20 teams list.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Good Player Definition</title>
        <p>To identify the best football player, we established criteria defining a standout player as someone
who consistently elevated top teams in the premier European leagues—English, Spanish, Italian,
German, and French divisions—over an extended period.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Data Acquisition</title>
        <p>
          Data from the top five European leagues was scraped from FBref [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a division of Sports
Reference [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. We utilized Python libraries, including ”requests” by Reitz [11] and ”bs4” by
Crummy [12], for web scraping. The dataset encompassed matches from the 2000/2001 to
2022/2023 seasons, including columns such as team points, player name, team name, season,
opponent’s points per game, minutes played, and home or away status.
        </p>
        <p>We filtered players with a minimum of 340 matches for the top 20 teams across the leagues,
setting the threshold close to a full season. Additionally, players with fewer than 15 minutes of
playtime were excluded, ensuring impactful player contributions.</p>
      </sec>
      <sec id="sec-4-5">
        <title>3.5. Model Building</title>
        <p>
          For the multilevel mixed-efects model, we utilized the ”lmer” function from lme4 [13] in R.
Model building, inspired by Bell et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], involved iterative development, comparing versions
using AIC and BIC values. The final model includes fixed efects (opponent’s points per game
and home/away), and random efects for team, team in a season, and player, with random
intercepts and slopes for diferentiation.
        </p>
      </sec>
      <sec id="sec-4-6">
        <title>3.6. Level-2 Residual Extraction</title>
        <p>Player-specific intercepts and slopes were obtained using the ”ranef” function from lme4 [13]. A
custom function calculated player contributions to matches, extracting level-2 residuals. Team
residuals were similarly extracted for the top 20 list.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <p>In Table 1, the ranking displays players with over 340 league games for the top 20 teams in the
top 5 European leagues. Players are ordered by mean residual values, showcasing their impact
on team performance. Giorgio Chiellini leads the ranking with a mean value of 4.091820 × 10−9,
signifying an average improvement in his team’s performance when he played—he contributed
to scoring more points. Conversely, Marcelo had a negative impact on his team, indicated by
a mean value of −4.280433 × 10−9. In simpler terms, when Giorgio Chiellini played, he, on
average, exceeded predicted team performance by 4.091820 × 10−9 points.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>Football, despite being a team sport, perpetually raises the question of the best player, creating
endless debates and discussions. Opinions, often subjective, vary based on personal criteria.
Notably, ofensive players dominate discussions, overshadowing the defensive aspect, crucial
but overlooked. This study aims to objectively extract player performances from team data,
ofering an equitable assessment of all roles.</p>
      <p>Our definition of a good player hinges on their team’s reliance—a player missed when absent,
impacting team performance. To ensure an accurate evaluation, we employed a multilevel
mixed-efects model, controlling for team strength. Data from FBref encompassed player and
team details, match points, home/away status, season, and opposition’s average points per game.
A linear mixed-efects model allowed us to control for team strength, with extracted level-2
player residuals forming the final rankings.</p>
      <p>The list featured impactful players in this century, with Giorgio Chiellini topping, followed
by Andrea Pirlo and Petr Čech. Surprisingly, iconic players like Ronaldo and Messi ranked
12th and 14th. The top 30 showed balance across positions—8 goalkeepers, defenders, and
midfielders, and 6 attackers.</p>
      <p>Player level-2 residuals were small due to players’ extensive playing time, emphasizing
team and team season efects. Future exploration could widen the timeframe, create
leaguespecific rankings, and incorporate diverse metrics for player contribution, potentially examining
managerial impact and expanding related studies to extensive periods.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has received support from the research program CogniCom (0013103) at the University
of Primorska.
[11] K. Reitz, requests: Python HTTP for Humans., 2023. URL: https://requests.readthedocs.io.
[12] Crummy, Beautiful Soup 4.12.0 documentation, 2023. URL: https://www.crummy.com/
software/BeautifulSoup/bs4/doc/.
[13] lme4, lme4 package, 2023. URL: https://github.com/lme4/lme4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kerr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guttag</surname>
          </string-name>
          ,
          <article-title>Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , San Francisco California USA,
          <year>2016</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>55</lpage>
          . URL: https://dl.acm.org/doi/10.1145/2939672.2939695. doi:
          <volume>10</volume>
          .1145/ 2939672.2939695.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>I. McHale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Scarf</surname>
          </string-name>
          ,
          <source>Ranking Football Players, Significance</source>
          <volume>2</volume>
          (
          <year>2005</year>
          )
          <fpage>54</fpage>
          -
          <lpage>57</lpage>
          . URL: https://doi. org/10.1111/j.1740-
          <fpage>9713</fpage>
          .
          <year>2005</year>
          .
          <volume>00091</volume>
          .x. doi:
          <volume>10</volume>
          .1111/j.1740-
          <fpage>9713</fpage>
          .
          <year>2005</year>
          .
          <volume>00091</volume>
          .x, _eprint: https://academic.oup.com/jrssig/article-pdf/2/2/54/49108761/sign_2_2_
          <fpage>54</fpage>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cintia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferragina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Massucco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <article-title>PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          . URL: https://dl.acm.org/doi/10.1145/3343172. doi:
          <volume>10</volume>
          .1145/3343172.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. U.</given-names>
            <surname>Grund</surname>
          </string-name>
          ,
          <article-title>Network structure and team performance: The case of English Premier League soccer teams</article-title>
          ,
          <source>Social Networks</source>
          <volume>34</volume>
          (
          <year>2012</year>
          )
          <fpage>682</fpage>
          -
          <lpage>690</lpage>
          . URL: https://linkinghub.elsevier. com/retrieve/pii/S0378873312000500. doi:
          <volume>10</volume>
          .1016/j.socnet.
          <year>2012</year>
          .
          <volume>08</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <article-title>Modelling player performance in basketball through mixed models</article-title>
          ,
          <source>International Journal of Performance Analysis in Sport 13</source>
          (
          <year>2013</year>
          )
          <fpage>64</fpage>
          -
          <lpage>82</lpage>
          . URL: https: //www.tandfonline.com/doi/full/10.1080/24748668.
          <year>2013</year>
          .
          <volume>11868632</volume>
          . doi:
          <volume>10</volume>
          .1080/24748668.
          <year>2013</year>
          .
          <volume>11868632</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E. A. E.</given-names>
            <surname>Gerber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Craig</surname>
          </string-name>
          ,
          <article-title>A mixed efects multinomial logistic-normal model for forecasting baseball performance</article-title>
          ,
          <source>Journal of Quantitative Analysis in Sports 17</source>
          (
          <year>2021</year>
          )
          <fpage>221</fpage>
          -
          <lpage>239</lpage>
          . URL: https://www.degruyter.com/document/doi/10.1515/jqas-2020-0007/html. doi:
          <volume>10</volume>
          .1515/jqas- 2020- 0007.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Sabel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Formula for success: Multilevel modelling of Formula One Driver and Constructor performance,</article-title>
          <year>1950</year>
          -
          <fpage>2014</fpage>
          <source>, Journal of Quantitative Analysis in Sports 12</source>
          (
          <year>2016</year>
          )
          <fpage>99</fpage>
          -
          <lpage>112</lpage>
          . URL: https://www.degruyter.com/document/doi/10. 1515/jqas-2015-0050/html. doi:
          <volume>10</volume>
          .1515/jqas- 2015- 0050.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bryk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raudenbush</surname>
          </string-name>
          , Hierarchical Linear Models:
          <article-title>Applications and Data Analysis Methods, Advanced Quantitative Techniques in the Social Sciences</article-title>
          ,
          <source>SAGE Publications</source>
          ,
          <year>1992</year>
          . URL: https://books.google.si/books?id=eE-CAAAAIAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9] FBref, Football Statistics and History,
          <year>2023</year>
          . URL: https://fbref.com/en/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Sports</surname>
            <given-names>Reference</given-names>
          </string-name>
          , Sports Reference |
          <article-title>Sports Stats, fast, easy, and up-to-</article-title>
          <string-name>
            <surname>date</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://www.sports-reference.com/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>