=Paper= {{Paper |id=Vol-3933/Paper_8.pdf |storemode=property |title=An Intelligent System For Analyzing The Dependence Of A Chess Player's Rating On Inaccurate Moves |pdfUrl=https://ceur-ws.org/Vol-3933/Paper_8.pdf |volume=Vol-3933 |authors=Andriy Dudnik,Yurii Moroz,Olga Leshchenko,Oleksandr Makhovych,Ihor Kolisnyk |dblpUrl=https://dblp.org/rec/conf/iti2/DudnikMLMK24 }} ==An Intelligent System For Analyzing The Dependence Of A Chess Player's Rating On Inaccurate Moves== https://ceur-ws.org/Vol-3933/Paper_8.pdf
                                An intelligent system for analyzing the dependence of a
                                chess player's rating on inaccurate moves
                                Andriy Dudnik1,2* , Yurii Moroz2 , Olga Leshchenko1 , Oleksandr Makhovych1 and Ihor
                                Kolisnyk2
                                1
                                    Kyiv National Taras Shevchenko University,60, Volodymyrska Str., Kyiv, 03022, Ukraine
                                2
                                    Interregional Academy of Personnel Management, 2 Frometivska str, Kyiv, 03039, Ukraine



                                                   Abstract
                                                   This paper presents an intelligent system
                                                   ratings and the mistakes they make during the game. To achieve the goal of this study, we used various
                                                   techniques and algorithms of machine learning. With the help of big data tools, we analyzed thousands of
                                                   chess games and received a large number of statistical insights about them. We created an algorithm that
                                                   processes the obtained data and returns the results of the computations. Using different tools of information
                                                   technologies, based on this algorithm, it will be possible to build numerous frameworks, such as human-
                                                   like bots, in further development.

                                                   Keywords
                                                   chess, ACPL, Elo rating, correlation, FIDE, chess analysis, move evaluation, chess engines, rating
                                                   prediction, machine learning.1



                                1. Introduction
                                Information technologies play an important role in our lives. It brought people closer to collaborate
                                and change this world for the better. With more and more new methods being created, it has
                                transformed many industries, making processes more efficient. Now, it can help us to build tools and
                                frameworks for collecting, processing, and analyzing huge datasets. To achieve the goal of our
                                research, we merged different techniques of working with data to create an algorithm that accurately
                                spots even the slightest mistakes made by chess players.
                                   In chess, the main value that represents the strength of the player is their rating. But the rating
                                depends not on the decisions that are made on the board but on the results of the games. That is why
                                                                                          mance and their actual moves for a deeper
                                analysis of the connection between these two.
                                   This exploration is intended to find out how strongly the rating correlates with different types of
                                errors, such as blunders, mistakes, and inaccuracies that players make at different levels, from
                                average to top grandmasters. The main tools that can help us achieve such a goal are Python,
                                JupyterLab, ChessBase, and Stockfish.
                                   We chose the Python programming language because it has many useful libraries for working
                                with big data, data analysis, and machine learning. For our research, we used mostly Pandas, NumPy,
                                and Seaborn. It is also convenient to work with chess data and manipulate it because it can work not



                                Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine
                                 Corresponding author.
                                 These authors contributed equally.
                                   a.s.dudnik@gmail.com (A. Dudnik); yuri.moroz91@gmail.com (Y. Moroz); olga.leshchenko@knu.ua (O. Leshchenko);
                                umismag@gmail.com (O. Makhovych); ihorko95@gmail.com (I. Kolisnyk)
                                    0000-0003-1339-7820 (A Dudnik); 0009-0002-1254-9984 (Y. Moroz); 0000-0002-3997-2785 (O. Leshchenko); 0000-0002-
                                4684-9881 (O. Makhovych); 0009-0007-7113-7652 (I. Kolisnyk)
                                              Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                                                                                                                                        93
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
only with CSV (Comma-Separated Values) files but also with PGN (Portable Game Notation), which
is crucial for our research.
    JupyterLab is a practical choice as a development environment for data science and scientific
computing. It is a useful browser-based tool that allows you to interactively write the code,
markdown commentaries, and create well-structured documents in one place. It is also handy to see
the results of each cell immediately. An integration of data analysis with visualization makes it a
good option for data scientists and researchers.
    ChessBase is a powerful database that stores millions of games from beginners to top
grandmasters, from the 15th century to the most recent ones. While primarily designed for storing
the games, it also provides many additional features that help to analyze games, prepare for a
tournament, study openings and endgames, implement engine analysis, create personal game
databases, etc.
    Stockfish is an engine that combines traditional algorithms with neural networks, making it the
strongest among the others. It has a huge community of enthusiasts who help to make it stronger in
many different ways, such as volunteering a computer's processing power or writing the code to
make an evaluation function more accurate. Because it is being updated regularly and is an open-
source project, it is a great tool for evaluating games precisely.
    Combining all the tools, we were able to build an algorithm that fetches all the games, analyzes

research.
   This study highlights the importance of information technologies in creating such software,
where we were able to conduct this research. With the help of big data, machine learning, and
statistical analysis, an intelligent system for analyzing chess games was built. Our approach can also
be used in some other board games.

2. Analysis of literature and problem statement

One of the first researchers who tried to compare the quality of the games of the players was Guid
and Bratko in 2006 [1]. Their research aimed
                                    The question was complicated because the champions were from
different eras,
different playing styles. For that purpose, they analyzed all games played in World Championship
matches using one of the strongest engines at that time Crafty. They slightly modified the engine
for their research, adding more depth for tactically sharp positions. Apart from using ACPL as the
main value to rank the players, they also added the complexity of a position as a weighting factor to
make the results more accurate.

results of the games, was made by Regan and Haworth in 2011 [2]. When one of the players is far
ahead in the evaluation, they can make suboptimal moves to safely converge their advantage. Also,
the player who is behind can take much more risk, attempting to save a game. To penalize such
moves less, the authors developed their formulas with intrinsic skill parameters for sensitivity and
consistency, tested on large sets of games. One of the main points of the research was to find out
whether a player with a certain current rating represents the same strength as one with the same
rating several decades ago. For that purpose, they analyzed 150,000 games with the strongest engine
at that time Rybka 3.
    More recent research in the field was done by Romero, Parra, Cuenca, and Lloret in 2019 [3]. They
did a similar job to the first one but with a more recent and a lot stronger engine, Stockfish 10, and
with much more powerful computers. It helped them to get more accurate results and find some new
insights. They also compared their findings with the Guid-Bratko study, which showed similar
results. However, the research of 2006 gave a bit higher average error because Crafty overestimated
blunders and worked at a much lower depth. The authors added several new values to the ACPL,
                                                                                                    94
such as the percentage of best moves, the average of blunders, etc. Also, they added world
championship runners-up to finalize their rankings.
    The focus of the study conducted by McIlroy-Young, Sen, Kleinberg, and Anderson in 2020 was
on modeling a human-like neural network [4]. There was always a possibility to weaken existing
engines to make them play less precisely at a particular level before. It can be achieved by choosing
suboptimal moves or by limiting the depth of a search tree. But the issue with that is they tend to
choose the moves that a human player is unlikely to opt for. The authors aimed to build a bridge
between human and artificial intelligence by introducing Maia a customized version of the Alpha-
Zero engine to achieve maximum accuracy in predicting human decisions at a specific level. They
trained it on 12 million online games played on the free open-source chess platform Lichess. The
games were played by players of all skill levels, from amateurs to the current world champion,
Gukesh Dommaraju, and with several different time controls: HyperBullet, Bullet, Blitz, Rapid, and
Classical.
    The same team of researchers, together with Wang, concluded their exploration in 2022, aiming
to characterize an individual human behavior of a chess player [5]. Instead of characterizing players
by skill, rating, or performance, they intended the artificial intelligence systems to interpret humans
as individuals and understand their strengths, weaknesses, and style. In addition to the already built
model that could mimic a human player given its rating, which they did in previous research, they
also developed a system that can capture the style of a specific player. They trained it on 63.7 million
games played by 16,181 players. The results they achieved are remarkable: their model could
correctly identify a player out of thousands of candidate players with 98% accuracy given only 100
games.
    In our research, a similar approach to [1-3] was proposed to analyze errors of chess players in
their games and to study their impact on the rating. We analyzed thousands of games played at
different levels throughout two years. Apart from average centipawn loss, we also added several
statistical insights, such as blunder, mistake, inaccuracy, etc. to estimate the value of errors and
understand better what types of errors a player tends to make. Based on the results we received, in
further studies, we could form more realistic images for the behavioral models of chess players. The
obtained models can also be used to train artificial intelligence to increase accuracy in such studies,
as well as to create a chess bot that can capture the human decision-making process and mimic the
behavior of chess players, as in [4-5].

3. Methodology
The Elo rating system was originally invented by Arpad Emmerich Elo to predict chess game
outcomes and is based on the results of the games. It was adopted by FIDE in 1970 and has been used
till now. It is defined by the following formula [6-12]:

                                       𝑅𝑛 = π‘…π‘œ + 𝐾(π‘Š βˆ’ π‘Šπ‘’ )
where 𝑅𝑛 is the new rating of the player after the event, π‘…π‘œ is the rating prior to the event, π‘Š is the
number of wins, π‘Šπ‘’ is the expected number of wins, and 𝐾 is a scaling factor that is higher for young
and lower rated players.
   Average centipawn loss is a measure of how accurately a player plays their moves during the
game. It is defined as the difference between the best move according to the engine and an actual
move that has been played. Then the sum of these differences divided by the number of moves in the
game makes ACPL value [13-22]:
                                                    𝑛
                                             1
                                       𝐴𝐢𝑃𝐿 = βˆ‘|𝐸𝑖 βˆ’ 𝑀𝑖 |
                                             𝑛
                                                   𝑖=1


                                                                                                     95
where 𝑛 is the number of moves made by the player in the game, 𝐸𝑖 is an engine evaluation of the
position after the optimal move, and 𝑀𝑖
   Correlation is a statistical relationship between two variables. The Pearson correlation coefficient,
often denoted as π‘Ÿ, measures the strength and direction of the linear relationship between two
continuous variables. It ranges from -1 to 1 and is defined by the following formula [23-27]:

                                          𝑛(βˆ‘ π‘₯𝑦) βˆ’ (βˆ‘ π‘₯)(βˆ‘ 𝑦)
                             π‘Ÿ=
                                  √[𝑛 βˆ‘ π‘₯ 2 βˆ’ (βˆ‘ π‘₯)2 ][𝑛 βˆ‘ 𝑦 2 βˆ’ (βˆ‘ 𝑦)2 ]
where 𝑛 is the number of pairs of scores, π‘₯ and 𝑦 are the two variables being compared.
   The games used for this study were collected from ChessBase 17. The code for fetching and
processing the games was written in Python 3.12. We analyzed 8000 games using Stockfish 16, at
depth 20 with Intel Core i7, at 2.5 GHz. There are 80 players here divided by 8 rating classes with
their standard Elo rating on January 2024 [28]. For class A, we decided to have 7 players from
Candidates 2024 Tournament (except Nijat Abasov who has the lower rating) and the 3 strongest
                                                                                                  t did
not have big rating fluctuations during the last 2 years and represent all genders, all age groups and
as many nationalities as possible. We took exactly 100 games with a standard time control for each
player played during 2022 and 2023, excluding short games that lasted less than 15 moves. For several
players who did not play enough games, we took games from 2021. The engine starts working after
the end of the opening and stops after the 60th move to prevent the artificial lowering of ACPL [29-
33].

4. Results
In the first two figures, we have a scatter plot and a box plot with the ACPL value of all games played
by all players that we have analyzed. The players are sorted by their mean ACPL in descending order.
According to our results, Wesley So has the lowest mean ACPL, meaning that he is the most accurate
player. The classes in the plots represent the rating group of the player, such as class H stands for
2000-2099 Elo, class G stands for 2100-2199 and so on.
    In Figure 1 we can see the ACPL of all the games we analyzed. There are only 4 games out of 8000
that ended with more than a 100 average centipawn loss. From a human perspective, it means that a
player loses a pawn every move. But all these games were played by players under a 2200 rating, so
it is quite normal for them to have one such game out of 100. If we take a look at stronger players,
the most-
value 95. It was a game against Nikolas Theodorou in Saint Louis in 2022. Nijat played a rare move
in Tarrasch Defence and the whole game was a total mess. Abasov is a very dangerous player who
prefers to choose dubious moves and rare lines in the openings. And if we look at super
                                                             an ACPL value of 57, which is very strange
for him. It was one of the most controversial games in chess history he played in the Sinquefield Cup
in 2022 against Hans Niemann. After his loss with white pieces, Magnus immediately left the
tournament for the first time in his career. Many interpreted his withdrawal as tacitly accusing
Niemann of cheating.
    According to our results, Wesley So has the lowest mean. The highest-rated player, Magnus
Carlsen, has the second-lowest mean. And Nihal Sarin has third.
    In Figure 2 we can see a box plot of the same games which gives us more statistical insights about
the games. The horizontal lines that lie outside the box show the data minimum and maximum,
excluding outliers. The lines of the box mark the first and third quartile of the data, meaning that
half of the data lies inside the box. The line inside the box shows the median. And the dots above are
outliers. According to our results, Wesley So also has the lowest median. The previous World
Champion, Ding Liren, has the second-lowest median. And Fabiano Caruana has third.

                                                                                                     96
Figure 1.                                                   .




Figure 2.                                               .

   Apart from using ACPL as the main value of the analysis, we decided to add several insights about
the quality of the moves. In an equal position, for example, blunder means a big mistake and the
position becomes lost. Mistake means that the position becomes worse but remains playable.
Inaccuracy slightly worsens the position. Ok means that the move was good, but there was a better
one. Missed win means that the player did not find a winning move.

                                                                                                 97
   According to our analysis, Wesley So is the most accurate player out of the players we analyzed.

mistakes. Many chess experts call him one of the best at pure calculation. The second accurate player
is Magnus Carlsen. Magnus is the highest-rated player since 2011 and the World Chess Champion
2013-2023. Many people call him the best player in chess history. The third accurate player is a young
rising Indian star, Nihal Sarin. All the other players with ACPL values lower than 15 can be seen in
Figure 3.




Figure 3. Players with ACPL value lower than 15.

   The value in the columns means how many times the player makes such type of mistake in one
game. For example, Nihal Sarin made 3 blunders (big mistakes) in his 100 games, so on average he
makes 0.03 blunders in one game. It is the lowest value in this column, so we can say that Nihal is
the best at avoiding blunders. Wesley So is the best at making the least mistakes and inaccuracies.
   Figure 4 shows the relationship between the Elo Rating and the mean ACPL of all players. There
are no large residuals, as we can see. Of course, some players play better than the others in their


                                                                                                   98
rating class. And some players play worse. However, the distance of each player to the regression
line is relatively small.
    The bigger distances can be seen in the lower rating classes. It can be explained that the moves
of weaker players are less consistent than the moves of stronger players. So that they are less
predictable and can alternate good play with bad. Saying that we can make an assumption that the
lower rating class, the bigger ACPL range.
    If we take a look at super grandmasters, Class A, we can see Wesley So, who is a bit lower, and
Alireza Firouzja who is much higher than the others.




Figure 4.                                                                        .

   A correlation heatmap of all numerical features is pictured in Figure 5. The value with a minus
sign means a negative correlation, so it can be interpreted that the higher the rating, the lower the
ACPL. As we can see, Rating strongly correlates not only with ACPL but also with all types of
mistakes made by the player. However, the correlation of -0.95 with ACPL seems very strong.




Figure 5. Correlation heatmap.

   According to our findings, Wesley So is the most precise player. The highest-rated player and the
former champion Magnus Carlsen is second. Nihal Sarin, who has not even reached a 2700 Elo rating
yet, is third. Also, Wesley made the least mistakes and inaccuracies, while Nihal made the least
blunders (big mistakes).

                                                                                                  99
5. Discussion
The findings of this study clearly show a strong correlation not only between the rating of the player
and the average centipawn loss but also between the rating and the types of mistakes made by the
player. The exact ACPL can vary greatly depending on the level of play and the players involved.
Generally, stronger players tend to make fewer mistakes. So, there definitely is a correlation between
Average Centipawn Loss and Elo Rating.

the openings they choose, the types of positions they achieve, the strength of the opponent,
psychological factors etc.
    There are several outliers in our results. Most of them we can find in lower-rated classes. But if
we talk about super grandmasters, we can point out a grandmaster, Alireza Firouzja, who has a
higher ACPL than the players from his rating group. One explanation for this could be that he is the
type of player who prefers an aggressive style of play, which leads to complicated positions where
it is very hard to make precise moves.
    This study was limited by the number of players and the depth of analysis. Our first intention
was to compare all players starting from a 1000 Elo rating. But there were two main problems. The
first one is that we could not find 100 games for the last 2 years of such players.
    And the second one is that FIDE decided to change the rating minimum from 1000 to 1400 Elo. It
affected all the players below 2000 Elo who got an increase in their rating by the following formula:

                                     𝑅𝑛 = π‘…π‘œ + 0.4(2000 βˆ’ π‘…π‘œ )
    where 𝑅𝑛 is the new rating of the player, and π‘…π‘œ is the old rating.
    New rating regulations came into force from March 2024. Having that in mind, we decided to
compare players starting from 2000 Elo.
    The depth of the engine is another important question. The higher the depth, the more accurate
the evaluation. We tried higher depths, up to 25, with several players, but the results were similar to
depth 20. Considering analyzing 8000 games, increasing depth would lead to much more computing
time. So, we think our choice of depth 20 is quite reasonable.
    Furthermore, the next question is where to start and where to stop the engine. We started it after
the end of the book because players tend to play the theory which is studied very well. And stopped
it after move 60, because many long games end in a draw a
    Many other features could be added in future work, such as the style of play or the complexity of
the position. Moreover, there is also a question about the sensibility of using ACPL as a precision
factor. The biggest chess online servers also added Accuracy to make it clearer and more
understandable for players.

6. Conclusion
In this paper, we compared the quality of play of a wide range of players. It was proposed to use

improved the accuracy of the analysis results.
   Also, we added move classifications such as blunder, mistake, inaccuracy, etc. to show more
statistical insights.
   A high correlation was found not only between Elo rating and ACPL but also between Elo rating
and different types of player errors. The correlation between Elo and ACPL is -0.95.
   In future work, some other features, such as playing style or position complexity, will be added
to more accurately model the behavior of real players for use in creating human-like chess bots.

                                                                                                   100
Declaration on Generative AI
The authors have not employed any Generative AI tools.

References
[1]
      2006, pp.65-73.
[2]
      Artificial Intelligence.
[3]
                              The Fourteen International Conference on Software Engineering
    Advances, 2019, pp. 200-205.
[4] McIlroy-Young, Reid & Sen, Siddhartha & Kleinberg, Jon & Anderson, Ashton. (2020). Aligning
    Superhuman AI with Human Behavior: Chess as a Model System. 1677-1687.
    10.1145/3394486.3403219.
[5] McIlroy-Young, Reid & Wang, Russell & Sen, Siddhartha & Kleinberg, Jon & Anderson, Ashton.
    (2022). Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess.
    10.48550/arXiv.2208.01366.
[6]

[7]
    2016, pp. 76-91.
[8] E. VΓ‘zquez-

     World Automation Congress (WAC 2012), 24-28 June 2012, Puerto Vallarta, Mexico, pp. 1-6.
[9]                                                                                        -by-
     move dynamics of the advantage in chess matches reveals population-level learning of the
                                  -7.
[10]                                                         Jeeves algorithm with a stochastic
     Newton Raphson-like step-
     March 2019, pp. 166-175.
[11]

     277, 2014, pp. 656-679.
[12] Ashton Anderson, Jon Kleinberg, and Sendhil Mullainathan. 2017. Assessing human error
     against a benchmark of perfection. ACM Transactions on Knowledge Discovery from Data
     (TKDD) 11, 4 (2017), 45.
[13] John R Anderson, C Franklin Boyle, and Brian J Reiser. 1985. Intelligent tutoring systems.
     Science 228, 4698 (1985), 456 462.
[14] Tamal Biswas and Kenneth W Regan. 2015. Measuring Level-K Reasoning, Satisficing, and
     Human Error in Game-Play Data. In 2015 IEEE 14th International Conference on Machine
     Learning and Applications. IEEE, Miami, FL, 941 947.
[15] Neil Charness. 1992. The impact of chess research on cognitive science. Psychological research
     54, 1 (1992), 4 9.
[16] Finale Doshi-Velez and Been Kim. 2017. A Roadmap for a Rigorous Science of Interpretability.
     Technical Report 1702.08608. arxiv.org.
[17] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and
     Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural
     networks. Nature 542, 7639 (2017), 115 118.
                                                                                               101
[18] Ashton Anderson, Jon Kleinberg, and Sendhil Mullainathan. Assessing human error against a
     benchmark of perfection. ACM Transactions on Knowledge Discovery from Data (TKDD),
     11(4):45, 2017.
[19] Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom
     Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by
     gradient descent. In Advances in neural information processing systems, pages 3981 3989, 2016.
[20] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.
[21] Philip Bachman, Alessandro Sordoni, and Adam Trischler. Learning algorithms for active
     learning. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International
     Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research,
     pages 301 310. PMLR, 06 11 Aug 2017.
[22]                                                                                            . Real or
     fake? learning to discriminate machine from human generated text. ArXiv, abs/1906.03351, 2019.
[23] Gurieiev, V., Kutsan, Y., Iatsyshyn, A., (...), Artemchuk, V., Popov, O. (2020). Simulating systems
     for advanced training and professional development of energy specialists in power sector CEUR
     Workshop Proceedings,2732, pp. 693-708.
[24] Dutchak, S., Opolska, N., Shchokin, R., Durman, O., Shevtsiv, M. (2020). International aspects of
     legal regulation of information relations in the global internet network Journal of Legal, Ethical
     and Regulatory Issues, 23(3), pp. 1-7
[25] FIDE      standard      ratings    Accessed:        Nov.    5,    2024.     [Online].     Available:
     https://ratings.fide.com/rankings.phtml?rating=standard&period=2024-01-01.
[26] Dudnik, A., Kravchenko, Y., Trush, O., Leshchenko, O., Dakhno, N., & Rakytskyi, V. (2021,
     December). Study of the Features of Ensuring Quality Indicators in Multiservice Networks of
     the Wi-Fi Standard. In 2021 IEEE 3rd International Conference on Advanced Trends in
     Information Theory (ATIT) (pp. 93-98). IEEE.
[27] Kuzmych, L., Ornatskyi, D., Kvasnikov, V., Kuzmych, A., Dudnik, A., & Kuzmych, S. (2022,
     December). Development of the intelligent instrument system for measurement parameters of
     the stress-strain state of complex structures. In 2022 IEEE 4th International Conference on
     Advanced Trends in Information Theory (ATIT) (pp. 120-124). IEEE.
[28] Bakhov, I., Rudenko, Y., Dudnik, A., Dehtiarova, N., & Petrenko, S. (2021). Problems of Teaching
     Future Teachers of Humanities the Basics of Fuzzy Logic and Ways to Overcome Them.
     International Journal of Early Childhood Special Education, 13(2).
[29] Trush, O., Dudnik, A., Trush, M., Leshchenko, O., Shmat, K., & Mykolaichuk, R. (2022,
     December). Mask Mode Monitoring Systems Using IT Technologies. In 2022 IEEE 4th
     International Conference on Advanced Trends in Information Theory (ATIT) (pp. 219-224). IEEE




                                                                                                     102