<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparative analysis of football statistics data clustering algorithms based on deep learning and Gaussian mixture model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Moscow Region</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Russia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Telecommunication department Ulyanovsk State Technical University Ulyanovsk</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>71</fpage>
      <lpage>74</lpage>
      <abstract>
        <p>-The paper considers the Gaussian mixtures model and the possibilities of its application for solving clustering tasks. First, the case is considered when the Gaussian mixtures model is formed in such a way that all the parameters of the model are known. Next, the case is considered when the approximation of normally distributed data occurs using the Gaussian mixtures model. Finally, the article presents a study of the accuracy of clustering twodimensional data of football statistics of medal-position teams, middle-table teams and worst teams of the top 5 European football championships such as English Premier League, Spanish La Liga, German Bundesliga, Italian Serie A and French League One. The results of the algorithm based on the Gaussian mixtures models are compared with the results of clustering performed using neural networks.</p>
      </abstract>
      <kwd-group>
        <kwd>Gaussian mixture models</kwd>
        <kwd>machine learning</kwd>
        <kwd>data clustering</kwd>
        <kwd>data analysis</kwd>
        <kwd>football statitstics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Today data mining as intelligent analysis allows
specialists in various fields to greatly simplify their work.
For example, on the basis of such an analysis, deliberately
non-solvent customers who apply for a loan to the bank can
be eliminated, and data on the number of taxi service orders
can be predicted [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. Indeed, digitalization of various areas
of the economy and areas of state activity on an ongoing
basis provides significant amounts of information. In this
regard the range of tasks solved using data mining is so
wide.
      </p>
      <p>
        One of the most interesting tasks in this area is the
problem of data clustering [
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ], which should be associated
with the recognition, classification or segmentation tasks
[59]. However, in these tasks it is usually possible to
distinguish several groups of objects. The simplest example
is the choice of male students and female students in a
group. Every person here can be described by their height
and weight. Each object in such sample can be displayed at
a specific point on the plane. In this case this plane is
twodimensional. It is possible to expand dimensions of the
plane if the new parameter, for example, a hair length will
be introduced. Then the solution of the clustering task will
be simplified. Each group of objects can be represented by
some ellipsoid at the plane. Then the clustering decision for
a particular new object will depend on which ellipsoid is
closest to the point characterizing this object.
      </p>
      <p>
        So the further research considers a clustering algorithm
based on Gaussian mixtures models (GMM) [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ],
because quite often real data can be well approximated by
Gaussian distributions. And the comparison algorithm is
trained neural network clustering. It should be noted that for
the first time a comparison of the GMM and trained neural
networks will be performed as part of the task of analyzing
football statistics. In addition, a combination of the proposed
clustering methods can lead to a new type of clustering
bases simultaneously on supervised and unsupervised
learning.
      </p>
      <p>II. BRIEF CLASSIFICATION OF CLUSTERING ALGORITHMS</p>
      <p>
        Known clustering algorithms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can be divided
according to 2 basic principles. Let consider main features
for them.
      </p>
      <p>First, clustering can be crisp or fuzzy. In the first case
each object as a result of clustering is assigned exactly one
group. With fuzzy clustering a set of values is usually
determined that characterize the belonging probability of
each object to each group, i.e. such clustering gives some
probability distribution.</p>
      <p>
        Secondly, cluster analysis can be flat single-level or
hierarchical multi-level. In the first case the initial selection
of objects according to some criterion is divided into several
classes in the form of a single partition. For example,
clustering the university students again only by gender. If
the further clustering considers that male students and
female students will be separated, keeping the first level,
then a deeper clustering will be obtained, in particular, the
original object in the sample can be characterized not just as
a male student or female student, but as an excellent (“A”)
male student, excellent (“A”) female student, bad (“F”) male
student or bad (“F”) female student. This separation
provides hierarchical clustering. It should be noted that the
deep Gaussian mixtures model (DGMM) considered in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
copes well with the goals of hierarchical clustering.
Moreover, the assignment of an object to a particular group
is carried out according to the principle of crisp clustering.
      </p>
      <p>
        Finally, neural networks are gaining more and more
popularity in clustering problems [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Depending on the
training parameters and type of networks, various models
for clustering can be obtained. And now a deep learning is a
very perspective tool for mentioned tasks.
      </p>
      <p>Thus, before choosing a clustering algorithm, it is
necessary to first formulate the clustering problem itself,
and then perform the data splitting.</p>
    </sec>
    <sec id="sec-2">
      <title>III. GAUSSIAN MIXTURE MODEL</title>
      <p>
        The application of flat, crisp clustering is considered on
the example of analysis of football statistics from the Top 5
European Championships (England, Spain, Germany, Italy,
France). Since the problem of multilevel clustering is not
posed, it is possible to use GMM [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This is such a model,
the probability density function (PDF) of which is described
by the sum of the PDFs of Gaussian distributions. The
number of terms in the sum is the number of clusters. Thus,
the total distribution has several peaks, and for each of the
objects during clustering the proximity to each peak is
considered and the peak with the smallest distance is
selected. Moreover, each object can be characterized not by
one but by several parameters, for which multidimensional
PDFs are found. Fig. 1 presents an example of the PDF of
the GMM of three distributions with two parameters.
Fig. 1. PDF of 3 distribution GMM.
      </p>
      <p>An analysis of Fig. 1 allows to conclude that there are
two groups of objects that are characterized by a large
variance along one of the axes (ordinates or abscissas), and
one group with approximately the same variance along both
axes. In addition, three characteristic peaks or mathematical
expectations can be seen in Fig. 1.</p>
      <p>
        The advantage of using the GMM is that for a given
number of objects, the model itself performs estimates of the
component distributions. This allows the approximation of
real data using such a model. However, even if the number
of clusters is not known in advance, it is possible to build
several models of mixtures and choose the optimal one
according to some criterion. Most often, the Akaikeian
information criterion (AIC) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and the Bayesian
information criterion (BIC) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are used. Application of
these criteria allows to cope with the problem of a priori
uncertainty regarding the number of classes.
      </p>
    </sec>
    <sec id="sec-3">
      <title>IV. CLUSTERING WITH A GAUSSIAN MIXTURE MODEL</title>
      <p>Consider an example of the GMM application in the
clustering of teams playing in the European football
championships in England, Spain, Germany, Italy and
France. Only 2 parameters will be included in the initial
sample. It is goals scored and points. However, in order to
make it more convenient to check the accuracy of clustering,
it is good idea to exclude some teams from the selection.
Thus, the thinning done will include 3 teams in the upper
part of the tournament table (1 - 3 places), 3 teams in the
middle of the table (9/8 - 11 /10 places) and 3 teams in the
lower part of the table (18/16 - 20/18 places). Such thinning
is done for each championship. In addition, a statistics on
such teams not only for the last season, but also for the
previous 2 seasons is taken. This, on the one hand, will
increase the information content of the sample, and on the
other hand, it can also lead to an increase in anomalous
points (“too successful”, “too unsuccessful” or “strange”
season for one team or in general). Fig. 2 shows the
collected statistics. The points are plotted on the abscissa
axes, and goals scored are plotted on the ordinate axes.</p>
      <p>From Fig. 2 it can be seen that the selected parameters
have an almost linear relationship and visually the most
preferable division seems to be simply dividing by lines
along the abscissa (points). In this case, the numbers 40
(points) and 60 (points) can be chosen as the visual
threshold. In fact, such a division will provide only one
erroneously clustered point. Fig. 3 shows 3 clusters
according to real championship tables.</p>
      <p>An analysis of Fig. 3 shows that there is a point in the 3rd
cluster which is closer to the center and other points of the
1st cluster than to the cluster to which it really belongs.</p>
      <p>Next it is necessary to approximate the statistics of Fig. 2
by GMMs with various parameters. Let use the following
parameters:
1) The number of clusters k=1…5.</p>
      <p>2) Covariance matrix (CM) which can be described
by the following statements: diagonal/full and
shared/unshared. The diagonal or full structure of CM
characterizes the relationships between the parameters of
one cluster, and the shared or unshared structure of CM
characterizes the relationships between different classes. For
the diagonal structure of the CM, the axes of the ellipse are
parallel or perpendicular to the axes of abscissas and
ordinates, and for the shared structure, the dimensions and
orientation of all ellipses are the same.</p>
      <p>3) The regularization parameter R = 0.01 or R = 0.1 is
introduced to provide a positive determinant of the CM.</p>
      <p>By changing the above parameters, one can obtain
several distributions of Gaussian mixtures, for which then it
is possible to calculate the AIC and BIC coefficients
presented. Fig. 4a and Fig. 4b shows AIC and BIC
coefficients respectively for investigated football statistics
with different parameters.</p>
      <p>According to Fig. 4, the minimum values of AIC and
BIC are provided by the model for k = 3 clusters, which has
a full and unshared CM structure with a regularization
parameter
R = 0.01. Fig. 5 shows the PDF of this model, and Fig. 6
shows the result of clustering using this model.</p>
      <p>Comparison with the clustering presented in Fig. 3
shows that the clustering error was 1.48% or 2 incorrect
assignment of teams to the group. Thus, high accuracy was
obtained during clustering using the GMM.</p>
    </sec>
    <sec id="sec-4">
      <title>V. CLUSTERING USING NEURAL NETWORKS</title>
      <p>In this section clustering based on neural networks is
performed. Since the sample size is small, a feed forward
network with the back propagation of error, consisting of 1
layer of 15 neurons, is used. For such a network, training
based on data for the seasons 2016/2017 (train dataset) and
2017/2018 (validation dataset) is carried out. For test dataset
statistics of season 2015/2016 is used. A pair of parameters,
such as goals scored and points, is fed to the input of such a
network, and the cluster number is obtained at the output.
Fig. 7 shows the structure of the neural network, and Fig. 8
shows the learning process.</p>
      <p>The analysis of Fig. 8 shows that the network converges
quite quickly by the 12th epoch, achieving minimal error on
the validation data. Fig. 9 shows the correct clustering (a),
clustering using GMM (b) and clustering by the neural
network (c).</p>
      <p>So Fig. 9 shows that the neural network also provides
satisfactory clustering, for which the error percentage is
1.48% or 2 objects (teams). Moreover, if the Gaussian
mixture model mistakenly assigned one team from the group
of outsiders (worst teams) to the middle-table teams and one
team from the group of leaders (medal-position teams) to
the middle-table teams, then the neural network incorrectly
assigned two teams from the middle of the table
(middletable teams) to the teams of the upper part (medal position
teams). It should also be noted that the use of deep learning
(increasing the number of layers to 5, and the number of
neurons to 128) does not lead to improved results.
Fig. 9. Comparison of clustering results.</p>
    </sec>
    <sec id="sec-5">
      <title>VI. CONCLUSION</title>
      <p>The paper studies data clustering algorithms using the
example of clustering football statistics. The clustering
algorithms based on the GMM and the neural network
algorithm are considered. A comparative analysis of the
accuracy of clustering showed that for the presented
example, both algorithms provide the same result. Moreover,
the clustering error is only 1.48%. However, the model of
Gaussian mixtures looks preferable for several reasons.
Firstly, it can determine the number of clusters by some
information criterion. Secondly, when training the neural
network, the data included in the data for which clustering
was performed was used. Thirdly, in the neural network
algorithm there were insignificant computational costs for
training. The results obtained indicate that with the use of
intelligent clustering algorithms it is possible to build a more
adequate team rating, since, for example, the FIFA rating
existing today does not reflect the actual strength of teams.
Thus, the use of GMM for data mining is currently advisable.
Moreover, in the future it is also planned to investigate the
operation of the DGMM.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENT</title>
      <p>This work was supported by the RFBR and the
Government of the Ulyanovsk Region Grant, Project No.
1947-730011 and partly RFBR Grant, Project No.
19-2909048.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.N.</given-names>
            <surname>Danilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Andriyanov</surname>
          </string-name>
          and P.T. Azanov, “
          <article-title>Ensuring the effectiveness of the taxi order service by mathematical modeling and machine learning</article-title>
          ,
          <source>” Journal of Physics: Conference Series</source>
          , vol.
          <volume>1096</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -6596/1096/1/012188.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Andriyanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.A.</given-names>
            <surname>Sonin</surname>
          </string-name>
          , “
          <article-title>Using mathematical modeling of time series for forecasting taxi service orders amount</article-title>
          ,
          <source>” CEUR Workshop Proceedings</source>
          , vol.
          <volume>2258</volume>
          , pp.
          <fpage>462</fpage>
          -
          <lpage>472</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.V.</given-names>
            <surname>Vorontsov</surname>
          </string-name>
          , “
          <article-title>Clustering and multidimensional scaling algorithms</article-title>
          ,” Lecture course. Moscow State University,
          <year>2007</year>
          . [Online]. URL: http://www.ccas.ru/voron/download/Clustering.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.A.</given-names>
            <surname>Rytsarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.V.</given-names>
            <surname>Kirsh</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Kupriyanov</surname>
          </string-name>
          , “
          <article-title>Clustering media content from social networks using BigData technology</article-title>
          ,”
          <source>Computer Optics</source>
          , vol.
          <volume>42</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>921</fpage>
          -
          <lpage>927</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179- 2018-42-5-
          <fpage>921</fpage>
          -927.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.B.</given-names>
            <surname>Nemirovsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.K.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , “Clustering face images,”
          <source>Computer Optics</source>
          , vol.
          <volume>41</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2017</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2017-41-1-
          <fpage>59</fpage>
          -66.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tarabalka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Benediktsson</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          , “
          <article-title>Spectral-spatial classification of hyperspectral imagery based on partitional clustering techniques</article-title>
          ,
          <source>” IEEE Transactions on Geoscience and Remote Sensing</source>
          , vol.
          <volume>47</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>2973</fpage>
          -
          <lpage>2987</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Andriyanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.E.</given-names>
            <surname>Dementiev</surname>
          </string-name>
          , “
          <article-title>Developing and studying the algorithm for segmentation of simple images using detectors based on doubly stochastic random fields,” Pattern Recognition and Image Analysis</article-title>
          , vol.
          <volume>29</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1134/ S105466181901005X
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Andriyanov</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.E.</given-names>
            <surname>Dement</surname>
          </string-name>
          <article-title>'ev, “Application of mixed models of random fields for the segmentation of satellite images</article-title>
          ,
          <source>” CEUR Workshop Proceedings</source>
          , vol.
          <volume>2210</volume>
          , pp.
          <fpage>219</fpage>
          -
          <lpage>226</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.K.</given-names>
            <surname>Vasiliev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.E.</given-names>
            <surname>Dementyiev</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.A.</given-names>
            <surname>Andriyanov</surname>
          </string-name>
          , “
          <article-title>Using probabilistic statistics to determine the parameters of doubly stochastic models based on autoregression with multiple roots</article-title>
          ,
          <source>” Journal of Physics: Conference Series</source>
          , vol.
          <volume>1368</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -6596/1368/3/032019.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.A.</given-names>
            <surname>Philin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Lependin</surname>
          </string-name>
          , “
          <article-title>Application of the Gaussian mixture model for speaker verification by arbitrary speech and counteracting spoofing attacks,” Multicore processors, parallel programming</article-title>
          ,
          <source>FPGAs, signal processing systems</source>
          , vol.
          <volume>1</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Viroli</surname>
          </string-name>
          and
          <string-name>
            <surname>G.J. McLachlan</surname>
          </string-name>
          , “
          <article-title>Deep Gaussian mixture models</article-title>
          ,
          <source>” Stat Comput</source>
          , vol.
          <volume>29</volume>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>51</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .1007/s11222-017-9793-z.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Guérin</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Boots</surname>
          </string-name>
          , “
          <article-title>Improving Image Clustering With Multiple Pretrained CNN Feature Extractors</article-title>
          ,” ArXiv Preprint:
          <year>1807</year>
          .07760.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Akaike</surname>
          </string-name>
          , “
          <article-title>A new look at the statistical model identification</article-title>
          ,
          <source>” IEEE Transactions on Automatic Control</source>
          , vol.
          <volume>19</volume>
          , pp.
          <fpage>716</fpage>
          -
          <lpage>723</lpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.S.</given-names>
            <surname>Bhat</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , “
          <article-title>On the derivation of the Bayesian Information Criterion” [Online]</article-title>
          . URL: https://faculty.ucmerced.edu/ hbhat/BICderivation.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>