<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explaining soccer match outcomes with goal scoring opportunities predictive analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harm Eggels</string-name>
          <email>h.p.h.eggels@student.tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruud van Elk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykola Pechenizkiy</string-name>
          <email>m.pechenizkiy@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>P.O. Box 513, NL-5600 MB, Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>PSV</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In elite soccer, decisions are often based on recent results and emotions. In this paper, we propose a method to determine the expected winner of a match in elite soccer. The expected result of a soccer match is determined by estimating the probability of scoring for the individual goal scoring opportunities. The outcome of a match is then obtained by integrating these probabilities. In our experimental study, we show that the probabilities of goal scoring opportunities accurately match reality.</p>
      </abstract>
      <kwd-group>
        <kwd>soccer analytics</kwd>
        <kwd>scoring opportunity</kwd>
        <kwd>predictive modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The use of advanced big data analytic in soccer starts showing its potential,
however, sports analytics as a research area is still only emerging. Some of the
problems are well-de ned, e.g. many studies have attempted to predict the
result of soccer matches before the match actually started. Various perspectives
have been used to tackle this problem. A common perspective to look at this
problem is the prediction of soccer matches from a betting perspective [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Both machine learning approaches, e.g. an ensemble of k-nn predictors [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
statistical approaches, e.g. considering goals scored by a team by Poisson
processes [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. However, based on the reported results, the practical applicability
of the obtained models is still rather limited.
      </p>
      <p>Other problem formulations are less straightforward to formalize, e.g.
providing insights into how well each of the teams or individual players did in the
match. Here the approaches range from plotting histograms to heatmaps aligned
with the playing eld, and from counting successful actions to computing
complex features based on domain knowledge.</p>
      <p>In our work, we take a complementary perspective. We consider the use of
predictive modeling to explain the outcome of a match based on the available
data from the match (rather than trying to predict the outcome of the game
before the game starts).</p>
      <p>By explaining the match outcome we mean accumulating evidence of which
team should have won the match based on the created goal scoring opportunities
and accounting for both the quantity and quality of such opportunities. The
demand for such an approach comes from soccer clubs themselves. These soccer
clubs often base their decisions on recent results, even if they do not completely
understand where these results come from.</p>
      <p>In this paper, we provide an empirical illustration that inducing a predictor
from the past soccer matches and applying it on the current match data provides
us with accurate probabilistic estimates of scoring opportunities to result in
goals.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>An important aspect of this paper is to lay the sound foundation for reasoning
about scoring opportunities. We shall be able to get insights into two kinds of
questions: \how can we quantify the value of a shot given the scoring
opportunity?" and \how can we quantify the value of a goal scoring opportunity created
by a team (disregarding whether it was realized or not)?" If we can provide
good estimates, then someone can see how many opportunities (and their
quality) each of the teams produced during a match and how many of them each
team realized.</p>
      <p>In particular, if we get probability estimates for each scoring opportunity, we
can simply sum up this estimates and get an expected number of goals as follows
from the Poisson binomial distribution. Thus, if we denote pi be the probability
that we scored a goal in the scoring opportunity i, and model each i as a Bernoulli
random variable yi Ber(pi) then the expected number of goals in the match of
n n n
n scoring opportunities is equal to: E[#goals] = E( P yi) = P E(yi) = P pi.
i=1 i=1 i=1</p>
      <p>
        The idea of applying predictive modeling for quantifying the quality of scoring
opportunities is not new. E.g. logistic regression was used in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to determine the
quality of individual goal scoring opportunities. In our approach, however, we
make sure that scores we obtain can be treated as probabilities of scoring a goal
in considered match situation. For this purpose, we ensured that our predictive
model learned from the data has good generalization performance and has low
variance.
      </p>
      <p>Formally, we learned a classi er that is functional mapping y = f (X),
where for each scoring opportunity Xi that is a feature vector describing it
and any additional contextual information, the classi er should accurately
predict yi 2 fgoal; no goalg. The classi er must generalize well to previously unseen
scoring opportunities and avoid over tting to the training data. We expect that
a classi cation technique that is not only accurate (i.e. has low error, bias, and
variance) but also can provide good con dence estimates for the predicted output
would work best for our purpose.</p>
      <p>One could argue that training the classi ers on only the best players of the
world would lead to a more accurate and insightful model of desirable player
performance. Such a model would, however, have only limited business value
since it would not be directly applicable to poorer players. Adding the player
quality to the model allows the model to learn the relations between player
quality and the value of a goal scoring opportunity. The scores could be corrected
accordingly. Furthermore, the quality of a goal scoring opportunity is in uenced
by the opposing team. Since the location of the defenders is already de ned, the
in uence of the defenders is likely to be limited. The goalkeeper, however, could
have signi cant in uence on the quality of a goal scoring opportunity.</p>
      <p>Consider two situations in which a player attempts to score with the only
di erence being the opposing goalkeeper. If one of these goalkeepers would be
the best of the league and the other goalkeeper would be an average goalkeeper.
Intuitively, these situations would not have the same probability of resulting in
a goal. Therefore, the quality of the opposing goalkeeper taking into account.</p>
      <p>
        Since goals are so rare in soccer, more non-goals than goals exist in the data
set. Therefore, a combination of over-sampling (SMOTE) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and cluster based
under-sampling [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are used before applying classi cation algorithms to deal
with class imbalance.
      </p>
      <p>
        We also apply calibration to make the scores obtained with the classi er to
be interpreted as a probability of scoring a goal given the scoring opportunity
situation. The classi cation algorithms provide class membership probabilities,
i.e. the con dence a sample belongs to a certain class. These class membership
probabilities can not be interpreted as the probability that a goal attempt results
in a goal. Calibrating the classi er ensures that its output can be interpreted as
a probability that a goal attempt results in a goal. Two main calibration
techniques exist: Platt's scaling [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Isotonic regression [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. Niculescu-Mizil and
Caruana show that Platt scaling outperforms Isotonic regression when the data
set is relatively small. When the size of the data set, however, increases (1000
samples or more) Isotonic regression outperforms Platt scaling [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Therefore,
we use Isotonic regression.
      </p>
      <p>With the use of classi cation algorithms and calibration techniques, point
estimates can be determined for the goal scoring opportunities. In order to avoid
misleading interpretation of the quality scores, we also estimate prediction
intervals. To determine the standard deviation similar goal scoring opportunities,
the samples are rstly clustered. The standard deviation of the samples in the
cluster is then used to determine the prediction intervals. Gaussian Mixture
clustering is used since this technique is often used in kernel density estimation.
Intuitively, if the variation of the point estimate is too large, no valid statements
about individual point estimates can be made. However, aggregating the data,
however, still valid statements can be made due to the law of the large numbers.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Study</title>
      <sec id="sec-3-1">
        <title>Data</title>
        <p>
          We had access to three di erent data sources are available: 1) data about the
main events during a match tracked by (employees of) ORTEC; 2) data about
the quality of players [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]; the data from the soccer game FIFA is extracted from
the web; and 3) spatiotemporal data about players tracked by Inmotio during
matches with the help of cameras.
        </p>
        <p>In total, we have data from seven di erent leagues over three seasons. This
leads to a total of 5017 matches in which 128667 goal attempts were performed.
Of these goal attempts, only 14109 resulted in a goal.</p>
        <p>It is worth noting that each data source has its own data quality problems
that can a ect classi ers and the conclusions derived from their outputs. The
data quality issues of the ORTEC data are related to the tracking of the events
by ORTEC employees who have to select the location of the event at the right
time and at the right location that is hard to do, especially in a near
realtime setting. The main data quality issue with the FIFA data comes from the
determination of the stats that is somewhat vague and could be incorrect for
some of the players. Finally, the data quality issues of the Inmotio data come
from the cases in which the cameras lose the correct player or accidentally selects
the wrong player. In this case, the location of the player is incorrect and it is
di cult get the correct location. We had too little data from the cameras and
hence did not use it for inducing classi ers.</p>
        <p>With the use of the considered data sources, various features can be
extracted. A list of the extracted features for each data source is provided in
Table 1.</p>
        <sec id="sec-3-1-1">
          <title>ORTEC</title>
          <p>
            Context
Part of body
Dist to goal
Angle to goal
Originates from
We experimented with four di erent classi cation techniques algorithms (as
implemented in scikit-learn [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]): Logistic Regression, Decision tree, Random
Forrest, and a decision tree boosted with Ada-boost. Inner 10-Fold cross validation
is used for parameter selection and generalization performance estimation. In
the inner fold, the best parameters are selected. The generalization performance
is then computed in the outer fold.
          </p>
          <p>Figure 1 illustrates two examples of probabilities of the individual goal scoring
opportunities obtained with our approach.</p>
          <p>The features for these examples and the probabilities are shown in Table 2.</p>
          <p>Since we want our classi ers to provide higher scores for better goal scoring
opportunities, we report AUC performance, but also provide the precision, recall,
(a) Example 1
(b) Example 2
(c) Example 3
and F-score value for reference. Table 3 summarizes the results. Also the standard
deviation of the AUC over the di erent cross validation phases is provided.
We can see from the table that Random Forest performs reasonably well and
outperforms other classi ers.</p>
          <p>
            Next, we perform the calibration step to make the scores more accurate. We
used the reliability graph introduced in [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] that show how close the predicted
values are to the actual ratios of goals scored. The obtained reliability graph is
shown in Figure 2.
          </p>
          <p>Figure 2 shows that the predicted values are indeed close to the actual ratios
of goal scoring opportunities resulting in goals. We also show the con dence
intervals of the predicted values in the bins. Since these con dence intervals</p>
          <p>Calibration plots (Brier score 0.0815)
1.0
0.8
iftssooonp
ive0.6
irF
tca0.4
0.2
are narrow, it is safe to consider the scores as the probability of a goal scoring
opportunity to result in a goal.</p>
          <p>
            We also report the Brier score [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] that determines the accuracy of
probabilistic predictions of a set of mutually exclusive discrete outcomes. The Brier
score of the calibrated scores is pretty low as well, which supports conclusions
drawn from the reliability graph.
          </p>
          <p>To determine whether the expected goals can be used to explain match
results, the predicted match outcomes are compared to the actual match outcomes.
Table 4 provides the number of matches that are correctly predicted with the
use of the expected goals model. Furthermore, the number of games where the
number of predicted goals was at most one goal o is shown, the number of
matches in which the result was correct (win team 1, draw, win team 2). Finally,
the Mean Squared Error (MSE) for the number of goals per match is provided.</p>
          <p>What stands out from Table 4 is that in only 1366 of the 5020 matches, the
exact score of the match was predicted based on the expected goals. If, however,
one goal di erence is accepted, 3443 of the 5020 matches have correctly predicted
scores. Therefore, it seems that the expected goals model is, in most cases, almost
correct. The MSE Match strengthens this statement. The MSE match shows that
the average MSE of the result of a match is 2:366. Therefore, the average number
of goals predicted di erence goals of both teams di ers p2:366 1:538 from the
actual di erence in goals.</p>
          <p>So far, just the exact results are examined. Maybe even more interesting, is
how often the expected goals model predicted the correct winner. This is given
by the number of correct results in Table 4. Obviously, the number of correctly
predicted matches is higher than the correctly predicted scores. What stands
out, however, that the number of correctly predicted matches is not close to the
number of scores predicted correctly where one goal di erence was allowed. This
shows that games where the model is one goal o in the match, this one goal
also in uences the result of the match. To evaluate in which cases the one goal
di erence most often in uences the result, the problem of predicting the winner
of a match is de ned as a three class problem where either Team 1 wins, Team 2
wins or the game ends in a draw. The confusion matrix of the tree-class problem
is provided in Table 5.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Potential Applications for Soccer Clubs</title>
      <p>We consider three immediate applications of the quality scores by soccer clubs:
1) performance evaluation over a given period of time; 2) analysis of matches and
8</p>
      <sec id="sec-4-1">
        <title>H. Eggels, R. van Elk, M. Pechenizkiy</title>
        <p>team performance; and 3) assessment of players and individual training sessions
management.
Soccer clubs tend to base decisions on results of a short period of times and
emotions. Since many factors in uence the results of soccer matches, these
results could not closely match the reality. These decisions could, therefore, be
based on misperceptions. A more objective metric of the quality of goal scoring
opportunities would provide a more objective decision-making strategy. Before
ring sta , for example, an expected league table could be created to determine
whether the team is actually performing badly.</p>
        <p>Furthermore, the results of matches could be plotted together with injuries,
suspensions, red coaches and many more factors to nd out the relation of the
events that happened during a season on the performance of the team.
Goals are very rare in soccer that leads to high in uence of a single goal on
the result of a match. By analyzing goal scoring opportunities instead of actual
goals scored, a more objective way of analyzing the result is obtained. Adding
the quality of the goal scoring opportunities makes this analysis even better.</p>
        <p>Figure 3 provides an illustrative example of simple visualization of match
scoring opportunities and whether they were realized. On the left side of this
gure, the progress of a match in terms of expected goals is provided. Here, one
could see that the upper team (represented by red), should have been in front
from the beginning of the match and had the upper hand during the match. The
right side of the gure shows from which players the expected goals (the bars)
and the actual goals (the numbers) were coming. Since this data is classi ed, the
names of the players on the x-axis are removed.</p>
        <p>Furthermore, similar to the analysis for periods of time, the expected result
of a match could be plotted over time. By adding important events such as goals
scored, cards, substitutions and many more factors, the in uence of these factors
could be researched in more detail.
5.3</p>
        <sec id="sec-4-1-1">
          <title>Player Evaluation, Training, and Acquisition</title>
          <p>A major advantage of determining the quality of the individual goal scoring
opportunities is that it generates more possibilities than only determining match
results. Aggregating over players, instead of matches, leads to insights into player
performance. These insights could be used to evaluate players, adjust training
programs or perform player acquisition.</p>
          <p>An example of interesting insights from the expected goals is when the
expected goals are plotted for di erent locations on the eld. This could, for
example, show that a player is often shooting from one speci c part of the eld but
never scores. If the probabilities of these goal scoring opportunities are high, the
player is obviously doing something wrong in these cases and his actions could
be analyzed in more detail. If, however, the probabilities are low for a player on
the eld, but that player shoots very often, someone could point out to him that
shooting might not be the best decision at that part of the eld.</p>
          <p>Another example comes from the case where players, especially strikers, score
many goals in one season (take for example Jamie Vardy of Leicester City during
the English Barclays Premier League in 2015-2016). Those strikers are often
bought by big clubs since they did score a lot. It could, however, be the case
that such a player did score a lot but had a much lower amount of expected
goals. This could suggest that the speci c player was lucky during that season.
Of course, more research has to be performed on that player's performance, but
the expected goals indicator could be a useful tool in player acquisition.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented a method with which the results of soccer matches
can be described in a more objective manner by evaluating the quality of the
goal scoring opportunities for both teams during that match. It is shown that
the proposed method performs well in terms of classi cation performance as well
as on calibration of probabilistic estimates. Further applications of the expected
goals are given by evaluating seasons, matches, and individual players.</p>
      <p>An important point to make when using the probability estimates of the goal
scoring opportunities is that these estimates may have a high standard deviation.
The scores for goal scoring opportunities, could, therefore vary quite a bit, even
thought the goal scoring opportunities are similar. Users should, therefore, be
very careful when making statements of individual goal scoring opportunities
with too few point estimates.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.C.</given-names>
            <surname>Constantinou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.E.</given-names>
            <surname>Fenton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neil</surname>
          </string-name>
          ,
          <article-title>"Pro ting from an ine cient Association Football gambling market: Prediction, Risk and Uncertainty using Bayesian networks." Knowledge-Based Systems 50 (</article-title>
          <year>2013</year>
          ):
          <fpage>60</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Langseth</surname>
          </string-name>
          ,
          <article-title>"Beating the bookie: A look at statistical models for prediction of football matches</article-title>
          .
          <source>" SCAI</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>V.</given-names>
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bison</surname>
          </string-name>
          , G. Eiben,
          <article-title>"Predicting football results with an evolutionary ensemble classi er</article-title>
          .
          <source>"</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Karlis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ntzoufras</surname>
          </string-name>
          ,
          <article-title>"Analysis of sports data by using bivariate Poisson models</article-title>
          .
          <source>" Journal of the Royal Statistical Society: Series D (The Statistician) 52.3</source>
          (
          <year>2003</year>
          ):
          <fpage>381</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Heuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Rubner</surname>
          </string-name>
          ,
          <article-title>"Soccer: Is scoring goals a predictable Poissonian process?."</article-title>
          <source>EPL (Europhysics Letters) 89.3</source>
          (
          <year>2010</year>
          ):
          <fpage>38007</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Lucey</surname>
          </string-name>
          et al.
          <article-title>"quality vs quantity: Improved shot prediction in soccer using strategic features from spatiotemporal data</article-title>
          .
          <source>" Proc. 8th Annual MIT Sloan Sports Analytics Conference</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          et al.
          <article-title>"Scikit-learn: Machine learning in</article-title>
          <source>Python." Journal of Machine Learning Research</source>
          <volume>12</volume>
          .
          <string-name>
            <surname>Oct</surname>
          </string-name>
          (
          <year>2011</year>
          ):
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Platt</surname>
          </string-name>
          ,
          <article-title>"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods</article-title>
          .
          <source>" Advances in large margin classi ers 10.3</source>
          (
          <year>1999</year>
          ):
          <fpage>61</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>B.</given-names>
            <surname>Zadrozny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          ,
          <article-title>"Obtaining calibrated probability estimates from decision trees and naive Bayesian classi ers</article-title>
          .
          <source>" ICML</source>
          . Vol.
          <volume>1</volume>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.</given-names>
            <surname>Zadrozny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          .
          <article-title>"Transforming classi er scores into accurate multiclass probability estimates." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</article-title>
          .
          <source>ACM</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. A.
          <string-name>
            <surname>Niculescu-Mizil</surname>
          </string-name>
          , C. Rich,
          <article-title>"Predicting good probabilities with supervised learning</article-title>
          .
          <source>" Proceedings of the 22nd international conference on Machine learning. ACM</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K</given-names>
            <surname>Stuart</surname>
          </string-name>
          .
          <article-title>Why clubs are using football manager as a real-life scouting tool</article-title>
          . The Guardian,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>N.V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          et al.
          <article-title>"SMOTE: synthetic minority over-sampling technique</article-title>
          .
          <source>" Journal of arti cial intelligence research 16</source>
          (
          <year>2002</year>
          ):
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S.J.</given-names>
            <surname>Yen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yue-Shi</surname>
          </string-name>
          ,
          <article-title>"Cluster-based under-sampling approaches for imbalanced data distributions</article-title>
          .
          <source>" Expert Systems with Applications 36.3</source>
          (
          <year>2009</year>
          ):
          <fpage>5718</fpage>
          -
          <lpage>5727</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>G.W.</given-names>
            <surname>Brier</surname>
          </string-name>
          ,
          <article-title>"Veri cation of forecasts expressed in terms of probability."</article-title>
          <source>Monthly weather review 78.1</source>
          (
          <year>1950</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Anderson</surname>
            , Chris, and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Sally</surname>
          </string-name>
          .
          <article-title>The numbers game: Why everything you know about football is wrong</article-title>
          .
          <source>Penguin UK</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>