=Paper= {{Paper |id=Vol-2667/paper17 |storemode=property |title=Comparative analysis of football statistics data clustering algorithms based on deep learning and Gaussian mixture model |pdfUrl=https://ceur-ws.org/Vol-2667/paper17.pdf |volume=Vol-2667 |authors=Nikita Andriyanov }} ==Comparative analysis of football statistics data clustering algorithms based on deep learning and Gaussian mixture model == https://ceur-ws.org/Vol-2667/paper17.pdf
        Comparative analysis of football statistics data
       clustering algorithms based on deep learning and
                    Gaussian mixture model
                                                                  Nikita Andriyanov
                                                        JSC "RPC "Istok" named after Shokin
                                                          Fryazino, Moscow Region, Russia
                                                           Telecommunication department
                                                        Ulyanovsk State Technical University
                                                                  Ulyanovsk, Russia
                                                               nikita-and-nov@mail.ru


      Abstract—The paper considers the Gaussian mixtures                       Gaussian distributions. And the comparison algorithm is
  model and the possibilities of its application for solving                   trained neural network clustering. It should be noted that for
  clustering tasks. First, the case is considered when the                     the first time a comparison of the GMM and trained neural
  Gaussian mixtures model is formed in such a way that all the                 networks will be performed as part of the task of analyzing
  parameters of the model are known. Next, the case is
                                                                               football statistics. In addition, a combination of the proposed
  considered when the approximation of normally distributed
  data occurs using the Gaussian mixtures model. Finally, the                  clustering methods can lead to a new type of clustering
  article presents a study of the accuracy of clustering two-                  bases simultaneously on supervised and unsupervised
  dimensional data of football statistics of medal-position teams,             learning.
  middle-table teams and worst teams of the top 5 European
  football championships such as English Premier League,                         II. BRIEF CLASSIFICATION OF CLUSTERING ALGORITHMS
  Spanish La Liga, German Bundesliga, Italian Serie A and                          Known clustering algorithms [3] can be divided
  French League One. The results of the algorithm based on the                 according to 2 basic principles. Let consider main features
  Gaussian mixtures models are compared with the results of
                                                                               for them.
  clustering performed using neural networks.
                                                                                  First, clustering can be crisp or fuzzy. In the first case
     Keywords—Gaussian mixture models; machine learning;                       each object as a result of clustering is assigned exactly one
  data clustering; data analysis; football statitstics                         group. With fuzzy clustering a set of values is usually
                          I. INTRODUCTION                                      determined that characterize the belonging probability of
                                                                               each object to each group, i.e. such clustering gives some
      Today data mining as intelligent analysis allows                         probability distribution.
  specialists in various fields to greatly simplify their work.
  For example, on the basis of such an analysis, deliberately                      Secondly, cluster analysis can be flat single-level or
  non-solvent customers who apply for a loan to the bank can                   hierarchical multi-level. In the first case the initial selection
  be eliminated, and data on the number of taxi service orders                 of objects according to some criterion is divided into several
  can be predicted [1,2]. Indeed, digitalization of various areas              classes in the form of a single partition. For example,
  of the economy and areas of state activity on an ongoing                     clustering the university students again only by gender. If
  basis provides significant amounts of information. In this                   the further clustering considers that male students and
  regard the range of tasks solved using data mining is so                     female students will be separated, keeping the first level,
  wide.                                                                        then a deeper clustering will be obtained, in particular, the
                                                                               original object in the sample can be characterized not just as
      One of the most interesting tasks in this area is the                    a male student or female student, but as an excellent (“A”)
  problem of data clustering [3,4], which should be associated                 male student, excellent (“A”) female student, bad (“F”) male
  with the recognition, classification or segmentation tasks [5-               student or bad (“F”) female student. This separation
  9]. However, in these tasks it is usually possible to                        provides hierarchical clustering. It should be noted that the
  distinguish several groups of objects. The simplest example                  deep Gaussian mixtures model (DGMM) considered in [11]
  is the choice of male students and female students in a                      copes well with the goals of hierarchical clustering.
  group. Every person here can be described by their height                    Moreover, the assignment of an object to a particular group
  and weight. Each object in such sample can be displayed at                   is carried out according to the principle of crisp clustering.
  a specific point on the plane. In this case this plane is two-
  dimensional. It is possible to expand dimensions of the                          Finally, neural networks are gaining more and more
  plane if the new parameter, for example, a hair length will                  popularity in clustering problems [12]. Depending on the
  be introduced. Then the solution of the clustering task will                 training parameters and type of networks, various models
  be simplified. Each group of objects can be represented by                   for clustering can be obtained. And now a deep learning is a
  some ellipsoid at the plane. Then the clustering decision for                very perspective tool for mentioned tasks.
  a particular new object will depend on which ellipsoid is                       Thus, before choosing a clustering algorithm, it is
  closest to the point characterizing this object.                             necessary to first formulate the clustering problem itself,
      So the further research considers a clustering algorithm                 and then perform the data splitting.
  based on Gaussian mixtures models (GMM) [10, 11],
  because quite often real data can be well approximated by

Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science

               III. GAUSSIAN MIXTURE MODEL                                other hand, it can also lead to an increase in anomalous
    The application of flat, crisp clustering is considered on            points (“too successful”, “too unsuccessful” or “strange”
the example of analysis of football statistics from the Top 5             season for one team or in general). Fig. 2 shows the
European Championships (England, Spain, Germany, Italy,                   collected statistics. The points are plotted on the abscissa
France). Since the problem of multilevel clustering is not                axes, and goals scored are plotted on the ordinate axes.
posed, it is possible to use GMM [10]. This is such a model,
the probability density function (PDF) of which is described
by the sum of the PDFs of Gaussian distributions. The
number of terms in the sum is the number of clusters. Thus,
the total distribution has several peaks, and for each of the
objects during clustering the proximity to each peak is
considered and the peak with the smallest distance is
selected. Moreover, each object can be characterized not by
one but by several parameters, for which multidimensional
PDFs are found. Fig. 1 presents an example of the PDF of
the GMM of three distributions with two parameters.
                                                                          Fig. 2. Statistics of the top 5 football championships for the seasons
                                                                          2016/2017, 2017/ 2018 and 2018/ 2019.

                                                                              From Fig. 2 it can be seen that the selected parameters
                                                                          have an almost linear relationship and visually the most
                                                                          preferable division seems to be simply dividing by lines
                                                                          along the abscissa (points). In this case, the numbers 40
                                                                          (points) and 60 (points) can be chosen as the visual
                                                                          threshold. In fact, such a division will provide only one
                                                                          erroneously clustered point. Fig. 3 shows 3 clusters
Fig. 1. PDF of 3 distribution GMM.                                        according to real championship tables.
    An analysis of Fig. 1 allows to conclude that there are                   An analysis of Fig. 3 shows that there is a point in the 3rd
two groups of objects that are characterized by a large                   cluster which is closer to the center and other points of the
variance along one of the axes (ordinates or abscissas), and              1st cluster than to the cluster to which it really belongs.
one group with approximately the same variance along both
axes. In addition, three characteristic peaks or mathematical                 Next it is necessary to approximate the statistics of Fig. 2
expectations can be seen in Fig. 1.                                       by GMMs with various parameters. Let use the following
                                                                          parameters:
    The advantage of using the GMM is that for a given                        1) The number of clusters k=1…5.
number of objects, the model itself performs estimates of the                 2) Covariance matrix (CM) which can be described
component distributions. This allows the approximation of                 by the following statements: diagonal/full and
real data using such a model. However, even if the number                 shared/unshared. The diagonal or full structure of CM
of clusters is not known in advance, it is possible to build              characterizes the relationships between the parameters of
several models of mixtures and choose the optimal one                     one cluster, and the shared or unshared structure of CM
according to some criterion. Most often, the Akaikeian                    characterizes the relationships between different classes. For
information criterion (AIC) [13] and the Bayesian                         the diagonal structure of the CM, the axes of the ellipse are
information criterion (BIC) [14] are used. Application of                 parallel or perpendicular to the axes of abscissas and
these criteria allows to cope with the problem of a priori                ordinates, and for the shared structure, the dimensions and
uncertainty regarding the number of classes.                              orientation of all ellipses are the same.
                                                                              3) The regularization parameter R = 0.01 or R = 0.1 is
   IV. CLUSTERING WITH A GAUSSIAN MIXTURE MODEL                           introduced to provide a positive determinant of the CM.
    Consider an example of the GMM application in the
clustering of teams playing in the European football
championships in England, Spain, Germany, Italy and
France. Only 2 parameters will be included in the initial
sample. It is goals scored and points. However, in order to
make it more convenient to check the accuracy of clustering,
it is good idea to exclude some teams from the selection.
Thus, the thinning done will include 3 teams in the upper
part of the tournament table (1 - 3 places), 3 teams in the
middle of the table (9/8 - 11 /10 places) and 3 teams in the
lower part of the table (18/16 - 20/18 places). Such thinning
is done for each championship. In addition, a statistics on
such teams not only for the last season, but also for the
previous 2 seasons is taken. This, on the one hand, will                  Fig. 3. Clustering of teams into classes according to the championship’s
increase the information content of the sample, and on the                tables.



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                         72
Data Science




                                   a)
                                                                          Fig. 6. Data clustering using GMM.

                                                                                   V. CLUSTERING USING NEURAL NETWORKS
                                                                              In this section clustering based on neural networks is
                                                                          performed. Since the sample size is small, a feed forward
                                                                          network with the back propagation of error, consisting of 1
                                                                          layer of 15 neurons, is used. For such a network, training
                                                                          based on data for the seasons 2016/2017 (train dataset) and
                                                                          2017/2018 (validation dataset) is carried out. For test dataset
                                                                          statistics of season 2015/2016 is used. A pair of parameters,
                                                                          such as goals scored and points, is fed to the input of such a
                                                                          network, and the cluster number is obtained at the output.
                                                                          Fig. 7 shows the structure of the neural network, and Fig. 8
                                                                          shows the learning process.
                                   b)
Fig. 4. AIC and BIC for various models.

   By changing the above parameters, one can obtain
several distributions of Gaussian mixtures, for which then it
is possible to calculate the AIC and BIC coefficients                     Fig. 7. Neural network structure.
presented. Fig. 4a and Fig. 4b shows AIC and BIC
coefficients respectively for investigated football statistics
with different parameters.
   According to Fig. 4, the minimum values of AIC and
BIC are provided by the model for k = 3 clusters, which has
a full and unshared CM structure with a regularization
parameter
R = 0.01. Fig. 5 shows the PDF of this model, and Fig. 6
shows the result of clustering using this model.
    Comparison with the clustering presented in Fig. 3                    Fig. 8. Neural network training.
shows that the clustering error was 1.48% or 2 incorrect                      The analysis of Fig. 8 shows that the network converges
assignment of teams to the group. Thus, high accuracy was                 quite quickly by the 12th epoch, achieving minimal error on
obtained during clustering using the GMM.                                 the validation data. Fig. 9 shows the correct clustering (a),
                                                                          clustering using GMM (b) and clustering by the neural
                                                                          network (c).
                                                                              So Fig. 9 shows that the neural network also provides
                                                                          satisfactory clustering, for which the error percentage is
                                                                          1.48% or 2 objects (teams). Moreover, if the Gaussian
                                                                          mixture model mistakenly assigned one team from the group
                                                                          of outsiders (worst teams) to the middle-table teams and one
                                                                          team from the group of leaders (medal-position teams) to
                                                                          the middle-table teams, then the neural network incorrectly
                                                                          assigned two teams from the middle of the table (middle-
                                                                          table teams) to the teams of the upper part (medal position
Fig. 5. PDF of the best approximation GMM.
                                                                          teams). It should also be noted that the use of deep learning




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                73
Data Science

(increasing the number of layers to 5, and the number of                  adequate team rating, since, for example, the FIFA rating
neurons to 128) does not lead to improved results.                        existing today does not reflect the actual strength of teams.
                                                                          Thus, the use of GMM for data mining is currently advisable.
                                                                          Moreover, in the future it is also planned to investigate the
                                                                          operation of the DGMM.
                                                                                                   ACKNOWLEDGMENT
                                                                             This work was supported by the RFBR and the
                                                                          Government of the Ulyanovsk Region Grant, Project No. 19-
                                                                          47-730011 and partly RFBR Grant, Project No. 19-29-
                                                                          09048.
                                                                                                        REFERENCES
                                     a)                                   [1]  A.N. Danilov, N.A. Andriyanov and P.T. Azanov, “Ensuring the
                                                                               effectiveness of the taxi order service by mathematical modeling and
                                                                               machine learning,” Journal of Physics: Conference Series, vol. 1096,
                                                                               pp. 1-8, 2018. DOI:10.1088/1742-6596/1096/1/012188.
                                                                          [2] N.A. Andriyanov and V.A. Sonin, “Using mathematical modeling of
                                                                               time series for forecasting taxi service orders amount,” CEUR
                                                                               Workshop Proceedings, vol. 2258, pp. 462-472, 2018.
                                                                          [3] K.V. Vorontsov, “Clustering and multidimensional scaling
                                                                               algorithms,” Lecture course. Moscow State University, 2007.
                                                                               [Online]. URL: http://www.ccas.ru/voron/download/Clustering.pdf.
                                                                          [4] I.A. Rytsarev, D.V. Kirsh and A.V. Kupriyanov, “Clustering media
                                                                               content from social networks using BigData technology,” Computer
                                                                               Optics, vol. 42, no. 5, pp. 921-927, 2018. DOI: 10.18287/2412-6179-
                                     b)                                        2018-42-5-921-927.
                                                                          [5] V.B. Nemirovsky and A.K. Stoyanov, “Clustering face images,”
                                                                               Computer Optics, vol. 41, no. 1, pp. 59-66, 2017. DOI: 10.18287/
                                                                               2412-6179-2017-41-1-59-66.
                                                                          [6] Y. Tarabalka, J.A. Benediktsson and J. Chanussot, “Spectral–spatial
                                                                               classification of hyperspectral imagery based on partitional clustering
                                                                               techniques,” IEEE Transactions on Geoscience and Remote Sensing,
                                                                               vol. 47, no. 8, pp. 2973-2987, 2009.
                                                                          [7] N.A. Andriyanov and V.E. Dementiev, “Developing and studying the
                                                                               algorithm for segmentation of simple images using detectors based on
                                                                               doubly stochastic random fields,” Pattern Recognition and Image
                                                                               Analysis, vol. 29, no. 1, pp. 1-9, 2019. DOI: 10.1134/
                                                                               S105466181901005X
                                                                          [8] N.A. Andriyanov and V.E. Dement'ev, “Application of mixed models
                                     c)                                        of random fields for the segmentation of satellite images,” CEUR
                                                                               Workshop Proceedings, vol. 2210, pp. 219-226, 2018.
Fig. 9. Comparison of clustering results.
                                                                          [9] K.K. Vasiliev, V.E. Dementyiev and N.A. Andriyanov, “Using
                                                                               probabilistic statistics to determine the parameters of doubly
                          VI. CONCLUSION                                       stochastic models based on autoregression with multiple roots,”
    The paper studies data clustering algorithms using the                     Journal of Physics: Conference Series, vol. 1368, pp. 1-7, 2019. DOI:
                                                                               10.1088/1742-6596/1368/3/032019.
example of clustering football statistics. The clustering
                                                                          [10] Y.A. Philin and A.A. Lependin, “Application of the Gaussian mixture
algorithms based on the GMM and the neural network                             model for speaker verification by arbitrary speech and counteracting
algorithm are considered. A comparative analysis of the                        spoofing attacks,” Multicore processors, parallel programming,
accuracy of clustering showed that for the presented                           FPGAs, signal processing systems, vol. 1, no. 6, pp. 64-66, 2016.
example, both algorithms provide the same result. Moreover,               [11] C. Viroli and G.J. McLachlan, “Deep Gaussian mixture models,” Stat
the clustering error is only 1.48%. However, the model of                      Comput, vol. 29, pp. 43-51, 2019. DOI:10.1007/s11222-017-9793-z.
Gaussian mixtures looks preferable for several reasons.                   [12] J. Guérin and B. Boots, “Improving Image Clustering With Multiple
Firstly, it can determine the number of clusters by some                       Pretrained CNN Feature Extractors,” ArXiv Preprint: 1807.07760.
information criterion. Secondly, when training the neural                 [13] H. Akaike, “A new look at the statistical model identification,” IEEE
network, the data included in the data for which clustering                    Transactions on Automatic Control, vol. 19, pp. 716-723, 1974.
was performed was used. Thirdly, in the neural network                    [14] H.S. Bhat and N. Kumar, “On the derivation of the Bayesian
                                                                               Information Criterion” [Online]. URL: https://faculty.ucmerced.edu/
algorithm there were insignificant computational costs for                     hbhat/BICderivation.pdf.
training. The results obtained indicate that with the use of
intelligent clustering algorithms it is possible to build a more




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                            74