<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analyzing and predicting NCAA volleyball match outcome using machine learning techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dhvanil Sanghvi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Priya Deshpande</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suhas Shanbhogue</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vishwa Shah</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>BITS Pilani</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
      </contrib-group>
      <fpage>99</fpage>
      <lpage>116</lpage>
      <abstract>
        <p>In this paper, we perform a thorough match prediction analysis of our newly mined NCAA (National Collegiate Athletic Association) volleyball data set. We also investigate the comparative power of two distinct yet comparable models, namely team aggregates and player aggregates, to predict the outcome of an NCAA volleyball match. The dependent variables for both models are mainly hitting rates of serves, recepts, attacks, and assists. The output variable is the winning team. Apart from the features specific to volleyball, we also incorporated a few general match statistics. Among the multitude of Machine Learning models available for classification, the study finalizes on three primary ones viz Logistic Regression, Decision trees, and Neural Networks. Results show that Decision trees and Neural Networks perform considerably well in both the team and player models on the ROC metric and accuracy, with Neural Networks giving marginally better results. Logistic regression on team aggregates performs only slightly better than randomized outcomes, whereas, for the player model, it performs way better. In terms of model structure, player aggregates give much better classification than team aggregates with a maximum ROC of 0.98. This shows that volleyball, despite being a team sport, is intrinsically more impacted by players who make the team than the team as a whole. Our model accuracy suggests that this model can be successfully used to predict the outcome of a NCAA volleyball match.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;NCAA</kwd>
        <kwd>Data Mining</kwd>
        <kwd>Volleyball</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        One of the most common task in Supervised Machine Learning is the Classification task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
This is mainly due to the use-case and entailment it has in sports predictions . Sports prediction
is a part of an enormous market and forms the crux of a team’s analyst. In some sports getting
the strategy and team build right can make the diference of winning and losing the whole
tournament . Stakeholders of a team like owners , coaches and analysts rely on computer
simulations and models to predict the team performance with respect to strategies and tactics.
Also large monetary rewards in betting further elucidate the necessity of good accurate models
to predict sport matches. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], states that betting markets are highly volatile and are subject to
negative returns in the long run. In most literature historical stats, player performance stats
and opposition information have been traditionally used as features [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], in their paper
’Using Bookmaker Odds to Predict the Final Result of Football Matches’ state that bookmakers’
odds correlate significantly with match predictions and can be used for predicting matches. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
look into numeric predictions where the authors have dwelled into winning margins in college
football. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], use a unique ranking method to predict and model the English Premier League.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],also make a keen observation that treating the prediction as a classification problem rather
than a regression based classification gave higher accuracy.
      </p>
      <p>Our paper focuses on match prediction as a classification problem of win and loss. While
most existing literature focus on mainstream sports, we sought to draw our attention to another
popular sport Volleyball and fill the void and gap in the current literature. Additionally we also
compare how using holistic features of team stack up against amalgamated individual player
features.</p>
      <p>In recent times, Volleyball has gained immense popularity in the world of both professional
sports and recreational leagues. The sport is played at the Olympics and also has many European
and American leagues associated with it.Volleyball is played both on turf and the beach. It is
important to note that these are entirely diferent sports and our focus in this paper is turf
volleyball of the popular NCAA (National Collegiate Athletic Association) league.</p>
      <sec id="sec-1-1">
        <title>1.1. Volleyball:Rules and Regulations</title>
        <p>
          Before we dive deep into Machine Learning Research aspects, lets try to learn more about
volleyball to get an intuition of the game. We briefly explain the structure and terminologies of
Volleyball. The volleyball rules as stated in the NCAA Women’s Volleyball Handbook are as
follows:
A typical volleyball game consists of 6 members on each side of the court. The team however
consists of 10-12 players with rotating substitutions. Every time a particular side serves the ball,
there is a rotation among the 6 players so that no player serves twice continuously. The sport
can be played indoors as well as outdoors. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
        </p>
        <p>Most of the basic rules remain although the size of the court and position of boundaries
changes slightly. The rules for volleyball are the same for women and men with the only
exception that the oficial height of the net is shorter for women. The two ends of the net must
be at the same height and it cant exceed the oficial height by more than 2cm. Some basic rules
of volleyball include:
• A team cannot touch the ball more than 3 times before it crosses the net.
• A particular player cannot touch the ball more than once.</p>
        <p>• The ball may not be lifted, held or carried.</p>
        <p>Two forms of scoring are observed in volleyball. A serving team loses serve if it makes a mistake,
a point is only given to the serve team if the non serving team makes an error, in the other case
it leads to a service change. In the formally adopted scoring mechanism, each serve results in a
point to either team, its sort of a rallying mechanism. Matches are played up to 25 points and
3 games. In order to win a game a team must have a 2 point score lead. Else the score keeps
accumulating even beyond 25 until either team wins. The match ends when a team wins the
majority of the games i.e. 2. For this research we have used historical data for features of both
teams and players to predict the winning team using the features mentioned in the following
sections. Important terms and definitions used in this text can be found in the appendix section.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Review of existing literature</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] two models were developed for forecasting point spread for Women’s Volleyball, one for
predicting point spread using a regression model and a second model to predict the probability
of winning using a logistic regression model. Diference between the averages of the in-game
statistics was calculated between the two teams and placed in the model. The score margin
model had an accuracy of 68% when the diferences in the averages of the in-game statistics were
used. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] uses Logistic Regression for predicting Football results from Barclay’s Premier League
and sofifa.com. They highlight the most significant features used by previous researchers which
include Home Ofense, Home Defense, Away Ofense, and Away Defense. The paper gives
additional insight into the coeficients obtained Logistic regression, concluding that the most
significant variables as Home Defense and Away Defense.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] they discuss using a machine learning approach, ANN, to predict the outcomes of one
week, specifically applied to the Iran Pro League (IPL) 2013-2014 football matches. The data
obtained from the past matches in the seven last leagues are used to make better predictions for
the future matches. Some unique features that have been used as input to ANNs involve Quality
of Opponents in last matches, Condition of Teams in Recent Matches, Condition of Teams in
the Overall League. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] gives a good insight into our dataset and features of the box statistics
of a volleyball match. The paper used the 1994 NCAA Women’s volleyball tournament and
calculated mean and standard deviation across a division along with the correlation coeficients.
The authors use multiple regression to predict a match using the aforementioned features. The
paper found that attack coeficient correlates the best to a team’s success. Blocking stats were
next important for division I and II while serve was next most important stat for division III.
This simple regression method predicted 60% of the variance of the team’s success across the
divisions.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a review paper which gave us a bird’s eye view of the existing research on Match
predictions in the form of most popular sports which researchers choose for prediction and
what is the frequency of ML algorithms used in existing literature. The paper shows that ANN
are most frequently used in existing literature for predicting Team sport matches. The paper
states 65% of the papers consider ANN models as part of their experiments and 23% of the
papers solely use ANN in their work. But the authors say that using ANN models does not
lead to high accuracy in prediction and it’s unclear why historically ANN models have been so
widely used. The authors say ANN models are black boxes and its very dificult for analysts
and coaches to reverse engineer the outcomes of the ANN model predictions.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses multiple algorithms and even multiple sports in order to make sports analytics
more accurate. Although volleyball is not a part of the sports discussed, the paper was very
instrumental in giving us a direction to use individual player statistics to predict the final
winning team. The paper also confirms our belief that individual player feature estimations are
very much correlated with the team features.
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] provides the framework and basis for the usage of Artificial Neural Networks to predict
male volleyball professional league rankings. It also gives some insights on the features that can
be used to predict the rankings such as wins, defeats, home/away etc. The paper concludes by
suggesting that the best kind of ANN is one with a single hidden layer 4-neuron model which
had “logsig” transfer function, “trainlm” training function, and “learngmd” adaptive learning
function. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposes a Bayesian hierarchical model to predict the rankings of the volleyball
national teams. The model also allows the estimation of results of each match played in the
league. The model consumes eficiencies in four categories - Serve, Attack, Defense and Block.
The eficiencies are calculated as follows:
      </p>
      <p>−  
   =</p>
      <p>
        [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] adopts a Logistic Regression model based on eficiencies of the players at the diferent
positions. The eficiency is calculated in a similar manner as Andrea Gabrio (2020). The diferent
eficiency variables used were Libero player eficiency, Middle blocker eficiency, Setter eficiency,
Middle blocker eficiency, Outside hitter eficiency and Universal hitter eficiency.
(1)
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <sec id="sec-3-1">
        <title>3.1. Raw data</title>
        <p>
          • Attacks
• Assists
• Serves
• Blocks
The data was obtained from National Collegiate Athletic Association’s (NCAA) oficial website
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. It includes data of all the Division 1 Women’s Volleyball matches played from 2011 through
2015. Generally, volleyball statistics are split into the following 6 categories : Attacking, Serving,
Setting, Passing, Defending and Blocking [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The analysis in this paper consumes statistics
from the following four categories:
        </p>
        <p>First we made a web crawler to crawl the NCAA website to extract all the match links of
the statistics for individual matches. Then the data was scraped from these links of the NCAA
website using BeautifulSoup, a package available in Python. The code for this scraper is available
in this repository. We use a HTML parser to parse the stats page and all the diferent tables were
extracted to a Pandas dataframe and stored into a Dictionary object. The data on the website
was organized match-wise. So, first the links of all matches were extracted and then they were
iterated over to obtain the data for each match. We made a dictionary to hold the matches
statistics of all the matches until the current match. The dictionary was used as our database
for generating prior and features for the current match prediction. The key of the dictionary
was either a team name to hold team stats or a tuple of (team name, player ID , player name ) to
hold the player stats.</p>
        <p>For the given match we would look into our Dictionary database and engineer the features
for the current match. This ensures that there is no data leakage whatsoever into our Model
predictions. After feature engineering done, we feed the current match’s data to the dictionary
so that its stats can also be used as prior for next upcoming match. We discuss about the feature
engineering in the next section.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Generating Priors for the Current Match</title>
        <p>It is the generation of features from the primary dataset to make it ready to be consumed by the
machine learning model. It is natural to assume that to predict  ℎ (Match number i of the
tournament), the data that we have available to us is only till  ℎ− 1.</p>
        <p>Phase 1: In phase 1, only the overall team statistics from the past matches are used to predict
the outcome of the match. Therefore, a weighted average of the diferent statistics available
was taken. A weight of one suggests a similar weight to recent matches than older matches.
The decay factor was taken to be 0.9 so as to incorporate a decaying efect with age/duration of
the match stat. This enables giving the performance in recent matches more weight than the
performances in matches played quite some time back. This can be understood simply with the
help of the following expression:
= =
=− 1 + 0.9 * =− 2 + 0.92 * =− 3...0.9 * =0
1 + 0.9 + 0.92 + ... + 0.9
(2)
Here, ’F’ refers to a particular feature corresponding to that match and ’m’ refers to the match
number. Therefore, for a match number i played by a team, let’s say Arizona, the features for
that match would be a weighted average from the Arizona’s previous match to Arizona’s first
match in that season.</p>
        <p>Phase 2: In phase 2, player-wise statistics were used for the study. For each player of a
particular team, the features of that player were engineered from their performance in the
previous matches playing for the same team. The features were weighted in the same way as in
phase 1.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology and Feature Engineering</title>
      <p>
        In order to predict the wins in a volleyball match, we decided upon two diferent paradigms:
First, A structure that lays emphasis only on the team features and decides the match outcomes
on the team statistics without giving much of a direct focus on the players. The other approach
is to look at the players being the strength of the team and incorporate more directly the impact
of the stronger players, rather than averages. The final features of both the approaches are
calculated as historical averages. There are broadly six skills in volleyball that are considered
crucial to a player/team’s strength. Four of these are considered in this paper due to their relative
importance: Attack, Assist, Serve, and Recept [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The concept of PCT ( a term popularly used
in volleyball to abbreviate percentage ) is used to calculate both teams’ and players’ features [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Now, we are going to select the machine learning models on which we are going to train the data.
Figure 1, adapted from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] captures the frequency of usage for the diferent machine learning
models applied to predict matches in the existing literature. We will train the same machine
learning models for both the phases so that we can compare and contrast their performances
      </p>
      <p>Start
Crawl across NCAA
website and scrape
all match links  for</p>
      <p>years 2011-15
Match Links Dataset</p>
      <p>from 2011-15
Iterate across and
links and scrape the
tables with Beautiful
soup's HTML parser</p>
      <p>Current Match</p>
      <p>Features</p>
      <p>Generated?
In our dataset 0 implies
team1 won and 1 implies
team2 won. We can make
total 1s equal to total 0s
by switching the teams
whenever one of the class
is imbalanced. This is
done by us using a
random number algorithm</p>
      <p>Update the</p>
      <p>Database
after features
generated for
current match
yes</p>
      <p>Historical dataset stores all</p>
      <p>match stats about a
particular player and team.</p>
      <p>We use this historical
information to make
features for current match</p>
      <p>Historical Database
Decay data proportional
to how old the match is
and take weighted
average</p>
      <p>Do Feature
Engineering for Team</p>
      <p>Data</p>
      <p>Decay data proportional
to how old the match is
and take weighted
average</p>
      <p>Do Feature
Engineering for</p>
      <p>Player Data
Team Feature Data in csv
format</p>
      <p>Player Feature Data in</p>
      <p>csv format
Balance the classes
in  dataset</p>
      <p>Balance the classes</p>
      <p>in  dataset
Balanced Team
features data</p>
      <p>Balanced Player
features data
later. The first model that we choose is Logistic Regression. Our label is a binary variable and
hence, the Logistic Regression model calculates the probability in the following way:
  =
  −</p>
      <p />
      <p>1
1 + − ( 0+ 1* 1+...+ * )</p>
      <p>Here,  is the parameter set for the model. The probability of Y belonging to label 0 can be
just calculated by  ( = 0|,  ) = 1 −  ( = 1|,  ).</p>
      <p>
        As our second machine learning model, we choose Artificial Neural Networks also called
feed-forward neural networks to be more precise. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] suggests that it is the most widely used
model for prediction of matches.
      </p>
      <p>The third model is Decision Trees. A minimal cost complexity approach is used for pruning
in the model. The splitting criterion used is - randomly initiate the threshold for all features
and then iterative to find the best split. This helps reduce over fitting.</p>
      <p>Overcoming class imbalance
Due to some cultural or circumstantial reasons, the NCAA data inherently had stats where the
second team on the list won 90% of the matches. Upon further analysis we found no reasonable
argument for the same. Also, most of the NCAA matches were played in neutral venues. To
remove this bias from our data set we balance the number of 0’s and 1’s by randomly shufling
the ordering of the teams so that there is an equal probability of either team winning. The
shufling has lead to no loss of generality and the data set ends up being balanced.
4.1. Phase 1
In phase 1, we focus on team aggregates as a whole to predict wins. Individual match data was
collected from the NCAA website. The train set consisted of six features which are calculated
using the match statistics.</p>
      <p>The data is cumulative in nature, in the sense that every following year contains the
information cumulatively up to, but not including that year.</p>
      <p>
        The following features are used:
• Attack PCT : Attack pct measures the average attacking power of the team. Attacks
are the most important way for an ofensive team to win points. Attack pct is directly
proportional to the win rate [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
  =
  −
      </p>
      <p>• Serve PCT : Serve PCT is a measure of the number of serve aces by a team.</p>
      <p>= (5)
 
• Assist PCT : A player is awarded an assist if he/she passes the ball to a teammate who
then closes in on a kill or attack.
(3)
(4)
(6)</p>
      <p>• Recept PCT :How well a team handles a potential serve ace is measured by the recept
PCT.</p>
      <p>=
 
• Set Win Ratio : It is the ratio of total set wins by all sets played.</p>
      <p>=</p>
      <p>(7)
(8)</p>
      <p>
        One important thing to note is that the features mentioned above show a high level of
correlation with each other as shown in Figure 2 . This can be attributed to the fact that a
proficient player might possess more than one skill to a reasonable extent [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. A similar
reasoning applies to team features as well. Therefore, we have considered these as individual
features and maintain that they would individually add soundness and robustness to the model
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
4.2. Phase 2
In phase 2, player-wise statistics were used for the study.To represent the eficiency and strength
of the player we have used AttackPCT, AssistPCT, ServePCT. These features are measured
similar to what we did for teams. It is considered that the squad playing a given match is not
known a priori. For each match played, we use the 10 players of each team. For each player of
a particular team, the features of that player were engineered from their performance in the
previous matches playing for the same team. The features were generated with a decay for the
past performances (similar to Phase 1).
      </p>
      <p>
        Similar to [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] we have defined Block Eficiency of a player as a ball that has been touched
by blockers and then played by the defence.
      </p>
      <p>=
 +  +  −  − 
 +  +  +  + 
where BD is Blocking digs, BS is Block solos, BA is Block Assists, BE is Block Errors, BHE is
Ball handling errors. Xij refers to the player j of a particular team i.</p>
      <p>Ace is a serve which lands in the opponent’s court without being touched, or is touched, but
unable to be kept in play by one or more receiving team players, resulting in a point for the
team serving. Since Ace is a special kind of serve, this ratio is a representative of the player’s
serving skills and hence used as a feature.</p>
      <p>ℎ =</p>
      <p>(9)
(10)</p>
      <sec id="sec-4-1">
        <title>4.2.1. Overcoming high dimensional data</title>
        <p>
          The authors collected data of 299 matches spread across five years. For a particular year, there
are fifty to sixty matches. But, using player-wise data leads to an explosion in the number of
features because there are five features each for the twenty players who are going to play in
that match (Ten per team) summing up to a total of 100 features. This is a severe problem [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
because the number of data points in a given year are much lesser than the number of features.
Hence, it is imperative that measures are taken to reduce the dimension of the data [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>It can be observed that all the five features for the players are eficiency ratios. Therefore, they
are already normalized between zero to one. To reduce the number of features for prediction
and to not lose important information, a single metric is allotted to every player. The metric
(player-Score), acts as a proxy of the player strength and provides information on how valuable
is the player to the team. It is calculated in the following way:
 =  + + +ℎ+ (11)
5</p>
        <p>Here, similar to conventions followed above, j represents a particular player of a team i.
Hence, it is just a mean of all those five features. These scores are generated for every player
who is (predicted to) play in that match. After this is done, the data set is transformed to a
new lower-dimensional data set. The new data set has the five featured mentioned above (of
which the mean is taken) for three players of each team. The three players are said to be the
representatives of the team for that match.</p>
        <p>The first representative player’s stats are an average of the best three players of that team on
the basis of the calculated playerScore. Similarly, the second representative is an average of the
three players who have medium strength - The fourth, fifth and the sixth best players for that
team. And the third representative is an average of the poor performing teammates, having the
least four playerScores. These three representatives are calculated in the same manner for the
second team. Now, for each match we have a total of 30 features containing the features of 3
representatives for each team.</p>
        <p>This would allow us to capture important relationships between the match winner and the
3 representatives. Using the coeficients and the importance of features in case of logistic
regression and decision tree respectively, we can empirically conclude relations between the
match winner and the best players, the average players and the worst players.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2.2. Model construction</title>
        <p>In this section we give details about the architecture of our models. These are the three models
that were constructed for training on data set for our prediction. Each model was tuned for the
best hyperparamters.</p>
        <p>1. Logistic Regression model
2. Decision Tree Classifier
3. Artificial Neural Network</p>
        <p>Logistic Regression A tuned Logistic regression was used as a baseline for model training. A
limit on the max iterations was set to 100000. The optimizer is set to limited memory BFGS(lbfgs)
for our model.</p>
        <p>Decision Tree Classifier Decision Tree Classifier from the sklearn library was used to
implement the decision tree classifier. A critical factor is that with such limited data points and
without any hyperparameter tuning, the tree over-fits the training set completely. We set a
particular value for ccp_alpha, which is the hyperparameter for cost complexity pruning in
Decision Tree provided by the sklearn library. To observe the variation in impurities in the
leaves with the changing ccp_alpha, we plot the following graph.</p>
        <p>We set ccp_alpha at 0.06 in our final model so that the model does not overfit and is robust.
The model uses entropy as the criteria to judge the quality of split. The class weight is set to
"balanced" so as to learn both the class labels equally.</p>
        <p>Artificial Neural Networks</p>
        <p>After firm experimentation on regularization in our ANN models, we zeroed in on the
architectures which have been mentioned in the results. We use Adam optimizer and disable
shufling to prevent data leakage as our data is time series in nature.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Since this is a classification task we have taken the ROC-AUC metric to analyze our model.
Second metric we have used is F1-score, F1 Score is the harmonic mean of precision and recall.
F1 score conveys the balance between the precision and the recall. We have used data spanning
from 2011 to 2015 and since our data is dependent on time and the sequence in which matches
were played, we have used the following train test split:</p>
      <p>Eg: here, data from matches played in 2011, 2012 would be used to predict for matches in
2013. We take the average of ROC-AUC score obtained in each of the above 4 cases. We have
taken care that within the same season also the match data should not be shufled and is ordered
according to the dates when they took place.</p>
      <sec id="sec-5-1">
        <title>5.1. Phase 1 - Team Data</title>
        <p>In this section we compare the results of the models developed using Team Data and the features
mentioned in section 4.1.</p>
        <sec id="sec-5-1-1">
          <title>5.1.1. Logistic Regression</title>
          <p>We develop a Logistic Regression Model for Binary Classification. We set the max iterations as
100 and set the class weight parameter as "balanced" to automatically adjust weights inversely
proportional to class frequencies in the input data. The value of C has been chosen by tuning is
across various values to get appropriate regularization(200 values on logarithmic scale). The
output labels would be 1 or 0 depending on which team is predicted to win(0: team1 wins, 1:
team2 wins)</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>5.1.2. Decision Tree</title>
          <p>We use the Decision Tree Classifier and set the splitting criterion as gini and set the splitter as
’random’, so that all features are sampled randomly according to the feature importance. As
we have limited data, we want to ensure that the decision tree does not overfit. To ensure this
we have tuned the max_depth and min_samples_leaf as 10 and 20 respectively. The ccp_alpha
parameter here has been chosen as 0.01.</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>5.1.3. Artificial Neural Networks</title>
          <p>We use Sequential Model provided by Keras Library for the Artificial Neural Network. The 5x2
features from both teams i.e. 10 units form the input layer. We use the ReLU (Rectified Linear
Unit) activation function at both stages to learn a non-linear mapping for classification task.
The final output is passed through a Sigmoid function- which will give an output in the range
(0,1) denoting the probability of the team winning for our binary classification task. We train
the Neural Network for 100 epochs. We set shufle = FALSE as we want prevent data leakage as
our data is time series in nature.</p>
          <p>Input Dense Dense Output
10 units → 15 units,Actn:ReLU → 25 units,Actn:ReLU → 1 unit,Actn:Sigmoid</p>
          <p>In the above 3 models, there is a similar trend in the ROC/F1-score vs the Split being trained
on. As the training data increases, the metrics of the model improve as the model has learned
on more data.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Phase 2 - Player-wise Data</title>
        <p>In this section, we document the results obtained from the diferent models trained on
playerwise statistics for a given Volleyball match.</p>
        <sec id="sec-5-2-1">
          <title>5.2.1. Logistic Regression</title>
          <p>Similar to Phase 1 Logistic regression on Team data, we tuned and trained a Logistic Regression
Model for Players data to set a baseline . We set the class weight parameter as ”balanced” to
automatically adjust weights inversely proportional to class frequencies in the input data. We
tuned across multiple C values (2000 of them divided on a logarithmic scale) to get the best
regularization .</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.2. Decision Tree</title>
          <p>The Decision Tree is trained with ccp_alpha at 0.06. The class weights are set to balanced so
that both the classes are learned equally. The criterion to decide the quality of a split is taken to
be entropy.</p>
        </sec>
        <sec id="sec-5-2-3">
          <title>5.2.3. Artificial Neural Networks</title>
          <p>We used player vectors of 30 features as input to the Neural Network and set shufle = FALSE
as we want prevent data leakage as our data is time series in nature. We trained for 10 epochs
with Adam optimizer. Below is the architecture which gave the best results after tuning.</p>
          <p>Input
20 units →</p>
          <p>Dense
20 units,
Actn:ReLU ,
Dropout:0.25
→</p>
          <p>Dense
25 units,
Actn:ReLU,
Dropout:0.25
→</p>
          <p>Dense
30 units,
Actn:ReLU ,
Dropout:0.25
→</p>
          <p>Output
1 unit,
Actn:Sigmoid</p>
        </sec>
        <sec id="sec-5-2-4">
          <title>Split</title>
          <p>Split1
Split2
Split3
Split4
Mean</p>
        </sec>
        <sec id="sec-5-2-5">
          <title>ROC AUC Score F1 Score</title>
          <p>In the above three models there is a clear trend of better metrics with more and more data
with every passing year.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Conclusion</title>
      <p>The results that volleyball despite being a team sport , results are intrinsically more impacted
by players who make the team than the team as a whole. In other words , models trained on
Player data contain finer statistics than model trained on team data. This shows that in sports
prediction bifurcated chunks of features which make the entity which is team gives better
information to the model and is a better predictor.</p>
      <p>Our classification results achieved better performance than strict guessing in all cases, with
prediction ROC’s ranging from 0.74 to almost 0.98 in some cases. In phase 1, the roc scores
are 0.64, 0.75, 0.84 for LR, DT and NN’s. Similarly for Phase 2 the corresponding scores are
0.98, 0.95, 0.98. Neural Networks perform better in both phases. The diferences in accuracy are
primarily due to the diferent approaches that we have chosen for this classification task. The
aggregate player approach seems to have picked up the key causes that result in a team win as
it incorporates significantly more information in its features. Although our model parameters
were finalized through a series of experiments, we are aware that more specialized models could
result in higher accuracy. Some general features such as average age of players, average age of
the team, number of new players etc can be used as well. We were unable to use the same due
to a lack of relevant data.</p>
      <p>The metric for accuracy used was ROC AUC, as it is not biased towards the size of test or
evaluation data. While accuracy is measured on predicted classes, roc auc is measured on
predicted scores which makes roc scores and f1 scores better for classification tasks. A high
accuracy moreover could be due to over-fitting.</p>
      <p>In phase 1, the mean ROC is highest for Artificial Neural Networks as they have the ability to
learn and model non-linear and complex relationships between inputs and outputs. We observe
Decision Tree performs much better in split4 when it is supplied the maximum test data.</p>
      <p>Further extension of this model could use the extensive NCAA data available to make the
current model more robust and versatile. Moreover, it could also incorporate a home and away
team feature. In the current literature, we were unable to do so because the NCAA games are
not necessarily conducted in the home or away grounds. We believe this is a crucial factor that
must be used for sports predictions. We can also experiment with Recurrent Neural Network
(RNN) architectures to learn from the temporal property of the data.</p>
      <sec id="sec-6-1">
        <title>6.1. Usage of priors</title>
        <p>
          An essential point of consideration for any probabilistic model is the inclusion of prior
probabilities for all possible outcomes. One way to measure priors as mentioned in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] is to use
historical data to come up with some reasonable value. The paper uses the previous output to
determine the new probabilities for the current year. This approach, however, poses several
questions for time series data. How many years of data to use?, what if a new team/player
joins the tournament?, with only five years of data and a few hundred matches, will the priors
be biased to the train data? These questions require extensive research and are beyond the
scope of this literature. Moreover, ncaa data pertains to university/college level matches. This
implies that the players in a particular team may change considerably over time. Thus using
priors on teams would be rendered useless. On players as well, priors would have to take into
consideration their increased experience and skill level, the data to which we did not have
access. After much discussion and deliberation, we come to a conclusion to use equally probable
priors, i.e., we assume initially that both the teams are equally likely to win, and the only factors
that afect the match outcome are the posteriors. An enhancement of the model could take into
consideration these factors and estimate the relevant priors to further improve the classification
accuracy.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Generating a network of players</title>
        <p>
          The model that we have trained do not capture the synergistic relations between the diferent
players. Although, Artificial Neural Networks might capture these relations implicitly, no
inferences can be made from the trained neural network on these synergistic relations between
the players. [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] uses an edge-centric multi-view network analysis to predict the performances
of a given Basketball lineup in the NBA. The nodes of the networks are the players and the
weights on the edges of the graphs represent the performance inhibitors/boosters due to the
other players in the match. Using this technique could significantly understand the game of
volleyball. For instance, if we find that the setter and outside hitter have a significant impact on
each other, the team manager could choose to not replace these players in the current lineup.
Similarly, calculating the centralities and eigen vectors of the network could help us get insights
into the impact of the players in the team performance.
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], in their paper propose the analysis of these player-interactions via the social network
theory. They re-conceptualize the sports team as a social network and hence the relations
between the nodes capture the interactions between these players. For eg., if the sport concerned
was of Basketball, the network could be a ball-passing network.
        </p>
        <p>These network-based approaches move away from the conventional machine learning
methods, and include richer information into the model. Useful features can be generated after
network analysis which can be helped generate robust models for the same conventional
machine learning algorithms.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Using K means for Clustering and Merging Similar Players</title>
        <p>
          While we used an algorithm of sorting and merging players in sections of 3, 3, 4 K means could
have been employed to cluster similar players together and replacing the vectors of clustered
Player with one single vector. [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], in their paper merge diferent players into relevant clusters
to find a beta player which increased their model accuracy. The issue with this method is though
its dificult to predict the number of players in each cluster. Many clusters may have more
players and need to be weighted diferently than clusters with lesser players to prevent bias.
Also finding the optimal K value is another dificult task.
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Decay Factor</title>
        <p>We have chosen a decay factor of 0.9 with respect to the previous matches. But, technically
the selection of the decay factor is in itself a search problem. Further research can be done
to understand how the decay factor should vary within earlier matches of same season and
matches from previous season. Here we have considered a geometrical decay factor. There
are other questions like should this decay factors be varying for the past years? For example,
should matches from 2 years back have the same decay factor as the matches from the past year.
Most teams in NCAA play only for about 1-2 matches in a season. Hence, impacts of past 2-3
years can be seen in the data if we just use a simple decay factor without analyzing these things.
The decay factors can also be adjusted on the basis of the volatility of players’ skill. There are a
lot of possible future directions this paper can be extended to and it will be interesting to see
the impacts of them on the predictability.</p>
      </sec>
      <sec id="sec-6-5">
        <title>6.5. Concluding remarks</title>
        <p>In conclusion, we believe taking into consideration our limiting factors such as limited data
availability, class imbalance, and inability to use cross-validation or shufling as we were
constrained by time-series nature of the data, the aggregative player model implements a sound
classification task of predicting a volleyball win and can be used successfully for the given task.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Bunker</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. Thabtah,</surname>
          </string-name>
          <article-title>A machine learning framework for sport result prediction</article-title>
          ,
          <source>Applied Computing and Informatics</source>
          <volume>15</volume>
          (
          <year>2019</year>
          )
          <fpage>27</fpage>
          -
          <lpage>33</lpage>
          . URL: http://www.sciencedirect.com/ science/article/pii/S2210832717301485. doi:https://doi.org/10.1016/j.aci.
          <year>2017</year>
          .
          <volume>09</volume>
          .005.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wilkens</surname>
          </string-name>
          ,
          <article-title>Sports prediction and betting models in the machine learning age: The case of tennis</article-title>
          ,
          <source>SSRN Electronic Journal</source>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .2139/ssrn.3506302.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Odachowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grekow</surname>
          </string-name>
          ,
          <article-title>Using bookmaker odds to predict the final result of football matches</article-title>
          , volume
          <volume>7828</volume>
          ,
          <year>2012</year>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>205</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -37343-5_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. C. D.</given-names>
            <surname>Delen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kasap</surname>
          </string-name>
          ,
          <article-title>A comparative analysis of data mining methods in predicting ncaa bowl outcomes</article-title>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baboota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <article-title>Predictive analysis and modelling football results using machine learning approach for english premier league</article-title>
          ,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1016/j.ijforecast.
          <year>2018</year>
          .
          <volume>01</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>NCAA</given-names>
            <surname>,</surname>
          </string-name>
          <article-title>Women's volleyball rules of the game</article-title>
          ,
          <year>2020</year>
          . URL: http://www.ncaa.
          <article-title>org/ playing-rules/womens-volleyball-rules-game.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Forecasting point spread for women's volleyball</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Prasetio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harlili</surname>
          </string-name>
          ,
          <article-title>Predicting football match results with logistic regression</article-title>
          ,
          <source>2016 International Conference On Advanced Informatics : Concepts</source>
          ,
          <source>Theory And Application (ICAICTA)</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .1109/ICAICTA.
          <year>2016</year>
          .
          <volume>7803111</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Arabzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Araghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-N.</given-names>
            <surname>Soheil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghofrani</surname>
          </string-name>
          ,
          <article-title>Football match results prediction using artificial neural networks; the case of iran pro league</article-title>
          ,
          <source>International Journal of Applied Research on Industrial Engineering</source>
          <volume>1</volume>
          (
          <year>2014</year>
          )
          <fpage>159</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N. L.</given-names>
            <surname>Estabrook</surname>
          </string-name>
          ,
          <article-title>The relationship between ncaa volleyball statistics and team performance in women's intercollegiate volleyball, Kinesiology, Sport Studies, and Physical Education Master's</article-title>
          <string-name>
            <surname>Theses.</surname>
          </string-name>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bunker</surname>
          </string-name>
          , T. Susnjak,
          <article-title>The application of machine learning techniques for predicting results in team sport: A review</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1912</year>
          .11762.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Apostolou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tjortjis</surname>
          </string-name>
          ,
          <article-title>Sports analytics algorithms for performance prediction</article-title>
          ,
          <source>in: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/IISA.
          <year>2019</year>
          .
          <volume>8900754</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Tümer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koçer</surname>
          </string-name>
          ,
          <article-title>Prediction of team league's rankings in volleyball by artificial neural network method</article-title>
          ,
          <source>International Journal of Performance Analysis in Sport 17</source>
          (
          <year>2017</year>
          )
          <fpage>202</fpage>
          -
          <lpage>211</lpage>
          . URL: https://doi.org/10.1080/24748668.
          <year>2017</year>
          .
          <volume>1331570</volume>
          . doi:
          <volume>10</volume>
          .1080/24748668.
          <year>2017</year>
          .
          <volume>1331570</volume>
          . arXiv:https://doi.org/10.1080/24748668.
          <year>2017</year>
          .
          <volume>1331570</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gabrio</surname>
          </string-name>
          ,
          <article-title>Bayesian hierarchical models for the prediction of volleyball results</article-title>
          ,
          <source>Journal of Applied Statistics</source>
          <volume>0</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . URL: https://doi. org/10.1080/02664763.
          <year>2020</year>
          .
          <volume>1723506</volume>
          . doi:
          <volume>10</volume>
          .1080/02664763.
          <year>2020</year>
          .
          <volume>1723506</volume>
          . arXiv:https://doi.org/10.1080/02664763.
          <year>2020</year>
          .
          <volume>1723506</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Akarçeşme</surname>
          </string-name>
          ,
          <article-title>Is it possible to estimate match result in volleyball: A new prediction model</article-title>
          ,
          <source>Central European Journal of Sport Sciences and Medicine</source>
          <volume>19</volume>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .18276/cej.
          <year>2017</year>
          .3-
          <fpage>01</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] ncaa.org,
          <source>Women's volleyball statistics</source>
          ,
          <year>2020</year>
          . URL: http://www.ncaa.org/championships/ statistics/womens-volleyball-statistics.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Papageorgiou</surname>
          </string-name>
          , 6 basic skills in volleyball,
          <year>2020</year>
          . URL: https://www. strength
          <article-title>-and-power-for-volleyball.com/basic-volleyball-skills</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>L. M.</surname>
          </string-name>
          e. American Volleyball Coaches Association, Bonnie Johnson,
          <year>2020</year>
          <article-title>women's volleyball statisticians' manual, 2020</article-title>
          . URL: http://fs.ncaa.org/Docs/stats/Stats_Manuals/VB/
          <year>2020</year>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>G. B. e. a. Ferraz R</surname>
          </string-name>
          ,
          <article-title>Pacing behaviour of players in team sports: Influence of match status manipulation and task duration knowledge</article-title>
          ,
          <source>PLoS One</source>
          <volume>13</volume>
          (
          <year>2018</year>
          ). URL: https://doi.org/10. 1371/journal.pone.
          <volume>0192399</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Senthilnathan</surname>
          </string-name>
          ,
          <article-title>Usefulness of correlation analysis</article-title>
          ,
          <source>SSRN Electronic Journal</source>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .2139/ssrn.3416918.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gabrio</surname>
          </string-name>
          ,
          <article-title>Bayesian hierarchical models for the prediction of volleyball results</article-title>
          ,
          <source>Journal of Applied Statistics</source>
          (
          <year>2020</year>
          ). URL: https://doi.org/10.1080/02664763.
          <year>2020</year>
          .
          <volume>1723506</volume>
          . doi:https: //doi.org/10.1080/02664763.
          <year>2020</year>
          .
          <volume>1723506</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Feature selection for high-dimensional data: A fast correlation-based filter solution</article-title>
          ,
          <source>Proceedings of the 20th international conference on machine learning</source>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Feature selection for high-dimensional data</article-title>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T. I. M.</given-names>
            <surname>Laura</surname>
          </string-name>
          Hervert-Escobar,
          <article-title>Neil Hernandez-Gress and, Bayesian based approach learning for outcome prediction of soccer matches</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmadalinezhad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Makrehchi</surname>
          </string-name>
          ,
          <article-title>Basketball lineup performance prediction using edgecentric multi-view network analysis</article-title>
          ,
          <source>Social Network Analysis and Mining</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <article-title>72</article-title>
          . URL: https://doi.org/10.1007/s13278-020-00677-0. doi:
          <volume>10</volume>
          .1007/s13278-020-00677-0.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Duarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Davids</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garganta</surname>
          </string-name>
          ,
          <article-title>Team sports performance analysed through the lens of social network theory: Implications for research and practice</article-title>
          ,
          <source>Sports Medicine</source>
          <volume>47</volume>
          (
          <year>2017</year>
          )
          <fpage>1689</fpage>
          -
          <lpage>1696</lpage>
          . URL: https://doi.org/10.1007/s40279-017-0695-1. doi:
          <volume>10</volume>
          . 1007/s40279-017-0695-1.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , W. Zamri,
          <article-title>Sports competition stressors based on k-means algorithm</article-title>
          ,
          <source>Malaysian Sports Journal</source>
          (
          <year>2019</year>
          )
          <fpage>04</fpage>
          -
          <lpage>07</lpage>
          . doi:
          <volume>10</volume>
          .26480/msj.01.
          <year>2019</year>
          .
          <volume>04</volume>
          .07.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>