Comparison of Off the Shelf Data Mining Methodologies in
             Educational Game Analytics
           David J. Gagnon                                              Erik Harpstead                                                  Stefan Slater
   University of Wisconsin-Madison                              Carnegie Mellon University                                     University of Pennsylvania
             Madison, WI                                             Pittsburgh, PA                                                Philadelphia, PA
     david.gagnon@wisc.edu                                      eharpste@cs.cmu.edu                                       slater.research@gmail.com


ABSTRACT                                                                                  on-task and working productively [2]. Prediction and modeling of
In this paper we compare the accuracy of nine common machine                              content understanding in students enables game designers and
learning algorithms in predicting quitting and performance on                             educators the opportunity to generate additional opportunities for
knowledge assessment tests in the context of two middle school                            a student to practice a given skill, or correct specific
science learning games. The games being studied, the Crystal                              misconceptions that might exist about that content. While such
Cave and Wave Combinator, are both short duration (played for                             techniques have been employed in intelligent tutoring systems via
an average of 25 and 28 minutes respectfully), web-based games                            knowledge tracing and knowledge inference methods [14], the
designed for use in classroom contexts. We used samples of 1,254                          open-ended structure of many educational games makes these
and 5,308 anonymous internet players respectively collected                               methods difficult to employ successfully.
during Fall of 2018. We recorded raw clickstream data and used
feature engineering methods to calculate simple descriptive
                                                                                          1.1 Games Being Studied
features such as average timings between events and the number                            The two games used in this study, Crystal Cave and Wave
and types of player moves. We then used these features to model                           Combinator, are available online for free public use and are short
players quitting the game at each level, as well as content                               duration experiences, played for an average of 25 and 28 minutes
knowledge measured by subsequent assessment. We found that                                respectfully. They are primarily used in classroom contexts.
logistic regression produced the best models overall and model                            In Wave Combinator, players must manipulate the amplitude,
quality was influenced by specific game levels and assessment                             frequency and offset of a wave in order to match the shape of a
items. We conclude by discussing future work to improve                                   target wave (Figure 1). Once the player’s wave is within a certain
predicting player quitting and player knowledge assessment.                               range of the target wave, they are allowed to continue to the next
                                                                                          level. At key points of the game, a multiple-choice question
Keywords                                                                                  appears on screen that assess the vocabulary used in the game
                                                                                          (Figure 2). While these assessment items are presented as being
Feature engineering, digital games, videogames, modeling,
                                                                                          asked by in-game characters, they are not situated within a
prediction, quitting, assessment
                                                                                          broader narrative context, but were retrofitted into the game for
1. INTRODUCTION                                                                           the sake of this research. This study will be examining play data
Digital games are increasingly being used to support learning in                          from the first 7 levels of the game and the 2 multiple choice items
educational contexts across a wide variety of subjects, including                         that follow.
social studies [4], mathematics [3], physics [9], and history [10].
Beyond content knowledge, games have also been used to support
the development of cognitive and noncognitive skills, such as
persistence and spatial reasoning [8]. As video games see
increasing use in classroom contexts, the need to analyze the rich
interaction data that they produce for meaningful behavioral and
learning indicators from play becomes greater as well.
Educational data mining (EDM) is well-suited to the problem of
analyzing digital games which feature rich interaction data, and
methods common to EDM have been frequently deployed to
better understand data produced by digital games. For instance,
EDM techniques have been used to model quitting behavior
among students playing an educational physics simulation game
[2], problem-solving in a game-based programming task [5], and
computational thinking skills in Zoombinis [7, 14].
In this paper, we use EDM techniques to predict quitting behavior
and content knowledge within two middle school science games,
Crystal Cave and Wave Combinator [1,11]. We sought to model
these outcomes because of their relevance for the use of these                                          Figure 1. Initial levels of Wave Combinator.
games in educational contexts. The identification of quitting
behavior affords game designers and educators the opportunity to
intervene with scaffolds or feedback that can help keep students


                   Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
   Figure 2. A multiple choice question embedded in Wave                 Figure 4. A multiple choice question embedded in Crystal
                         Combinator.                                                               Cave.
                                                                       These two games were chosen because they represent different
In Crystal Cave, players assemble differently shaped molecules to
                                                                       archetypes of educational games. Wave Combinator provides
form crystals of varying stabilities (Figure 3). Stability is
                                                                       players with controls that manipulate the outputs of a simple
determined by the density of the resultant molecular pattern as
                                                                       simulation in real-time. Players have to construct meanings about
well as the proper alignment of the positive and negative charges
                                                                       the purpose of each control in order to find a solution. Crystal
on portions of the molecules. For each level, different thresholds
                                                                       Cave is a more constructive task that delays feedback for several
of stability for the players’ molecular design will result in
                                                                       moves and requires players to apply simple chemistry rules to
completing the level with 1 to 3 stars. Each level unlocks when
                                                                       develop reasonable strategies. Our goal in looking at two different
the player has achieved a certain number of stars, leading to a
                                                                       games was to explore the degree to which similar minimal feature
semi-structured progression through the game that allows students
                                                                       engineering approaches would perform across differing game
to repeat challenges to find optimal molecular arrangements or to
                                                                       structures and design attributes.
progress to new challenges. As with the Wave Combinator,
multiple choice quiz items are presented by game characters, but       2. METHODS
without meaningful integration into the game context. These
questions appear after completing specific levels (Figure 4). This     2.1 Process of Data Collection &
study will be examining play data from the entire set of 7 levels of   Instrumentation
the game and the 3 multiple choice items that are intermixed.          Data is collected from the games using both Google Analytics
                                                                       (GA) as well as a researcher developed event logging system. GA
                                                                       were used to quickly record and visualize overall game metrics
                                                                       such as number and location of player sessions, session length,
                                                                       and high-level progression through each game (i.e., completed
                                                                       level 5, then quit). GA were primarily used during development
                                                                       and to understand audiences but are not included in the current
                                                                       analyses beyond understanding the audience and usage patterns.
                                                                       Multiple choice knowledge assessment measures were designed
                                                                       by the researchers for both games. Each item was aligned with the
                                                                       documented learning goals of the game. The instruments were
                                                                       designed to use a similar visual style as the rest of the game play
                                                                       (See Figures 2 and 4). Players completed the assessment measures
                                                                       after finishing gameplay; the assessment items were not
                                                                       embedded into the game itself.
                                                                       Our analyses focused on two labels – quitting behavior, when a
                                                                       player quits a level before it is completed and leaves the game,
                                                                       and performance on the post-test assessment measures. Population
                                                                       and Sampling Process
    Figure 3. Assembling simple molecules to create stable             Based on GA, 93% of the games’ usage was from United States,
                  crystals in Crystal Cave.                            based on IP addresses. Gameplay sessions were primarily
                                                                       recorded during school hours and on weekdays, leading
                                                                       researchers to believe that the games are used primarily in
                                                                       classroom contexts. Gameplay sessions were recorded
                                                                       anonymously making it impossible to tell if a session represented
                                                                       a new or returning player. While we acknowledge this limitation,
our analysis assumes that an individual session represents a           correct answers for the three Crystal Cave questions while an
unique player. During the data collection period of September 1        average of 65.8% of sessions selected correct answers for the two
through December 31, 2018 the Crystal Cave was played 20,963           Wave Combinator questions. As with quitting, we use each
times with an average of 24.68 minutes/session. The Wave               question’s distribution as the baseline model.
Combinator was played 23,353 times with an average of 28.78
minutes/session. Of these sessions, 1,254 of the Crystal Cave
sessions and 5,308 Wave Combinator sessions are included in this             Table 1. Gameplay features for Wave Combinator and
study based on the availability of the logging system and data                                  Crystal Cave.
exclusion rules described below.                                                      Wave Combinator            Crystal Cave
2.2 Data Logging                                                       Move           Total Slider Moves         Total Molecule Moves
Within each game, a JavaScript based logging client captures and       Counts         Total Offset Moves         Total Molecule Rotates
transmits clickstream events to a server for storage. Events are
recorded for all discrete player actions such as starting a level,                    Total Amplitude Moves      Total Stamp Rotates
making a move, and completing a challenge. Each event is time-                        Total Wavelength Moves
coded using the client browser’s native time, an automatically
                                                                                      Total Move Type
generated session identifier, and details about the event that took
                                                                                      Changes
place. The events are encoded as JSON and sent via an HTTP
POST request. These requests are scheduled for delivery to the         Averages       Av. Slider Moves           Av. Molecule Moves
backend logging server using a first-in-first-out queue and are        / Level        Av. Offset Moves           Av. Molecule Rotates
only dismissed after delivery is confirmed.
                                                                                      Av. Amplitude Moves        Av. Stamp Rotates
The backend server is comprised of a researcher built, open
source, PHP-based web service. Client requests are parsed,                            Av. Wavelength Moves
appended with the server’s system time, and inserted as individual                    Av. Move Type Changes
records into a MySQL database. As each clickstream event is sent
                                                                                      Av. % Offset Moves
as a seperate network request and recorded as a individual row,
the system is easily parallelized for large numbers of clients. For                   Av. % Amplitude Moves
this study, a single quad core Apache / PHP server Virtual                            Av. % Wavelength
Machine and a single quad core MySQL Virtual Machine server                           Moves
were provisioned in a University data-center.
                                                                       Timing /       Slider St. Dev.            Av. Molecule Move
2.3 Feature Engineering and Distribution                               Move           Slider Min / Max           Time
We designed features that describe the actions players are able to     Attributes                                Total Time
take in each game. We intentionally explored basic features that                      Av. Slider St. Dev. /
could conceivably be extended to other educational games. We                          Level                      Av. Time / Level
developed features that describe the counts of each (game                             Av. Slider Max / Min       Total Time in
specific) move type, the average number of each move type in                                                     “Museum”
                                                                                      Total Time
each level of play, timings and attributes of each move, the
scoring the game provided to the player in each level, and                            Av. Time / Level
attributes of re-starting and replaying challenges by players.         Scoring                                   Av. Score / Level
Features were calculated using data collected chronologically
before the outcome being modeled. For example, features to             Resets /       Av. Valid / Invalid        Av. Resets / Level
model quitting in level 5 were calculated using play data derived      Replays        Transitions                Av. Completes / Level
from levels 1 through 4, and any available data from level 5 play,                    Total Valid / Invalid
but not level 6 or greater. This was done to preserve the predictive                  Transitions
nature of the research and to create a model that could
conceivably be used to make predictions in real-time within a
gameplay session. Play sessions with less than 10 total moves          2.4 Modeling process
were excluded from the final dataset. The table below describes        We modeled the data using several algorithms provided by
the features used for each game and attempts to align them across      RapidMiner, a multiplatform data science tool [6]. This tool was
games when appropriate.                                                chosen for ease of use and free use in educational contexts across
We defined “quitting” on a given level as a session where the          the vast majority of computational platforms (OSX, Windows,
session ends (log events halt abruptly) before the current level is    and Linux). Individual models were generated for quitting at
completed. Using this definition, each session is labeled with         levels 1, 3, 5 and 7 for each game. Individual models were also
either a “quit” or a “complete” for each level. The distribution for   generated for each knowledge assessment, where the results of the
these events leans strongly toward completing, with an average of      assessment were represented as a binomial indicating a correct vs.
70.3% and 81.4% of sessions completing each level in Crystal           incorrect response. Models that were used included RapidMiner’s
Cave and Wave Combinator respectfully. We use each level’s             implementations of Naive Bayes, Generalized Linear Model,
quitting distributions as our baseline model.                          Logistic Regression, Fast Large Margin, Deep Learning1,
We defined “incorrect” answers as a session selecting any of the 3
options that were not correct for each assessment item. As with
quitting, the distribution for the assessments leans toward correct
                                                                       1
answers being provided. An average of 65.6% of sessions selected           RapidMiner’s Deep Learner component is based on a multi-layer
                                                                           feed-forward artificial neural network. For more details see:
Decision Tree, Random Forest, Gradient Boosted Trees and
Support Vector Machine. RapidMiner’s default hyperparameters         Baseline calculations were quite high for predicting quitting at
were used for all models, including a preprocessing step to          each of the different levels across both games. For example,
standardize all values to have zero mean and unit variance as well   91.1% of players who start level 7 in Wave Combinator also
as the option to use a single thread to ensure reproducibility.      complete it. By predicting that all players will complete level 7, a
Model specific hyperparameters are seen in Appendix A. A single      model will have a 0.911 accuracy, leaving very little room for
60/40 split process randomly divided the source data into a 60%      improvement. Across all levels, a baseline model that always
training set and 40% validation set. Accuracy percentage of each     predicts completing the level will have an average accuracy of
model was determined, along with baseline accuracies for quitting    0.703 for Crystal Cave and 0.814 for Wave Combinator.
and knowledge assessment. The same initial feature space was
used to predict both quitting behavior and post-test performance.    For predictions of quitting at each level in Crystal Cave, Deep
                                                                     Neural Networks performed best on average. The most accurate
3. RESULTS                                                           prediction was 0.908 for level 1, followed by 0.786 for level 5,
 Table 3. Performance for predicting instances of quitting at        0.737 for level 3 and 0.707 for level 7. The largest improvement
               each level within Crystal Cave.                       over the baseline was for level 7, with the model performing
                                                                     22.7% more accurately. This was followed by level 5 with a
                        Accuracy                                     15.8% improvement over baseline, and level 1 with a 15.1%
                        LV1       LV3      LV5     LV7      Av.      improvement over baseline. The model performed slightly worse
                                                                     than baseline (0.958) predicting quitting for level 3. On average,
Baseline                0.789     0.769    0.679   0.576    0.703
                                                                     the Deep Neural Networks predicted quitting with an accuracy of
Naive Bayes             0.819     0.749    0.685   0.687    0.735    0.784. All models performed better than the baseline model
Generalized Linear                                                   except for Fast Large Margin.
Model                   0.883     0.772    0.785   0.687    0.782    For predictions of quitting at each level in Wave Combinator,
Logistic Regression     0.897     0.763    0.786   0.667    0.778    Logistic Regression was the most accurate, but offered little
                                                                     improvement over baseline for most levels. The most accurate
Fast Large Margin       0.863     0.771    0.666   0.460    0.690    prediction was 0.999 for quitting in level 1 followed by 0.914 for
Deep Learning           0.908     0.737    0.786   0.707    0.784    level 7, 0.819 for level 3 and 0.630 for level 5. The largest
                                                                     improvement over baseline was seen in level 1 with the model
Decision Tree           0.863     0.767    0.666   0.649    0.736    performing 11.2% more accurately. This advantage quickly
Random Forest           0.861     0.762    0.676   0.660    0.740    dissolves with only a 0.8% improvement in level 5, a 0.3%
                                                                     improvement for level 7 and a 0.1% improvement for level 3. On
Gradient Boosted
                                                                     average, Logistic Regression predicted quitting with an accuracy
Trees                   0.850     0.762    0.777   0.686    0.769
                                                                     of 0.841. Deep Learning and Gradient Boosted Tree algorithms
Support Vector                                                       failed to perform better than the baseline model for this
Machine                 0.762     1.091                              prediction.
                                                                       Table 5. Performance for predicting incorrect answers for
  Table 4. Performance for predicting instances of quitting                     each assessment question in Crystal Cave.
                 within Wave Combinator.                                                       Accuracy
                        Accuracy                                                               Q0          Q1         Q2         Av.
                        LV1      LV3      LV5      LV7     Av.       Baseline                  0.588       0.724      0.574      0.656
Baseline                0.899    0.819    0.625    0.911   0.814     Naive Bayes               0.594       0.732      0.575      0.634
Naive Bayes             0.956    0.819    0.629    0.914   0.830     Generalized Linear
Generalized Linear                                                   Model                     0.594       0.732      0.575      0.634
Model                   1.000    0.818    0.624    0.914   0.839     Logistic Regression       0.603       0.745      0.600      0.649
Logistic Regression     0.999    0.819    0.630    0.914   0.841     Fast Large Margin         0.585       0.732      0.625      0.647
Fast Large Margin       1.000    0.819    0.619    0.914   0.838     Deep Learning             0.500       0.591      0.450      0.514
Deep Learning           0.999    0.657    0.425    0.717   0.700     Decision Tree             0.581       0.732      0.600      0.638
Decision Tree           0.997    0.819    0.629    0.914   0.840     Random Forest             0.585       0.732      0.500      0.606
Random Forest           0.974    0.819    0.630    0.914   0.834     Gradient Boosted
Gradient Boosted                                                     Trees                     0.543       0.706      0.525      0.591
Trees                   n/a      0.699    0.582    0.888   n/a       Support Vector
Support Vector                                                       Machine                   0.568       0.691      0.650      0.637
Machine                 0.999    0.818    0.621    0.908   0.836


 https://docs.rapidminer.com/latest/studio/operators/modeling/pr
 edictive/neural_nets/deep_learning.html
  Table 6. Performance for predicting incorrect answers for           That said, there is room for improvement in the performance of
       each assessment question in Wave Combinator.                   these models. More complex, move sequence features may lead to
                                                                      more meaningful descriptors of the player’s thinking. While the
                                     Q0          Q1         Av.
                                                                      features that were used in this paper were certainly grounded in
Baseline                             0.540       0.776      0.658     the interactions afforded to the player, they were only computed
Naive Bayes                          0.446       0.718      0.582     in terms of simple counts and averages. One possible next step
                                                                      would be to use sequential pattern mining to first identify
Generalized Linear Model             0.588       0.771      0.680     common sequences of moves that correlate with outcomes of
Logistic Regression                  0.590       0.776      0.683     interest [12]. The presence of these patterns could then be used as
                                                                      an engineered feature to train the models.
Fast Large Margin                    0.453       0.774      0.614
                                                                      The extreme accuracy (0.999) of level 1 quitting predictions for
Deep Learning                        0.489       0.585      0.537
                                                                      the wave combinator invites speculation for the usefulness of
Decision Tree                        0.549       0.774      0.661     building models based only on very recent events. The approach
Random Forest                        0.546       0.774      0.660     used here was to use all player actions leading up to the quitting
                                                                      or assessment event. This may have the unintended consequence
Gradient Boosted Trees               0.578       0.728      0.653     of diluting player moves that may immediately lead to a success
Support Vector Machine               0.550       0.776      0.663     in a specific level, with moves from much earlier in the gameplay
                                                                      that are now irrelevant to the challenge at hand. A next step would
                                                                      be to modify the feature generating scripts to experiment with
Baseline predictions for the assessment items were lower than for     different time windows for modeling.
quitting, but still much higher than a fair coin toss. Averaging
                                                                      Another limitation of this work is that accuracy may not be the
across the 3 items in the Crystal Cave and the 2 items in the Wave
Combinator, a baseline model that always predicts a correct           best measure of the effectiveness of the predictions. In future
                                                                      work, the performance of the models should be reported by
answer will have an accuracy of 0.656 and 0.658, respectfully.
                                                                      providing precision, recall and F1 scores. This issue is
For predicting incorrect answers on the 3 assessment items in the     compounded by the fact that baseline predictions, based only on
Crystal Cave, Logistic Regression was the most accurate. The          the percentages of players that complete a level or correctly
model best predicts the outcome of question 1 with an accuracy of     answer a quiz item, are quite high, leaving very little room for
0.745, followed by question 0 with an accuracy of 0.603 and           improvement. The authors are unable to conclude that the models
question 2 with an accuracy of 0.600. Compared to the baseline,       are deriving their accuracy from the strength of the features and
the greatest improvement was 4.4% on question 2. The model            not simply the unbalanced distribution of the phenomena.
demonstrated a 2.8% improvement on question 1 and 2.5%
improvement on question 0. On average, the model predicted            Finally, the validity of the answers provided for the multiple
incorrect answers with an accuracy of 0.649.                          choice assessment items could be studied. These items are not
                                                                      standardized measures, but reasonable assessments designed by
For predicting incorrect answers on the 2 assessment items in the     the researchers. Further evaluating their validity and reliability
Wave Combinator, Logistic Regression was the most accurate.           may highlight insights as to why they are harder to predict.
The model has an accuracy of 0.776 for question 1 followed by an      Additionally, by modifying the system to record the time spent
accuracy of 0.590 for question 0. This translates to an               answering each assessment would help identify obvious issues
improvement of 9.2% for question 0 and identical accuracy to          such as spending less than 1 second before answering, not nearly
baseline for question 1. On average, the model predicted incorrect    enough time to read and decide on a correct answer.
answers with an accuracy of 0.683.
                                                                      5. SUMMARY
4. DISCUSSION                                                         In summary, logistic regressions performed better than all
This paper compares the accuracy of 9 common modeling                 competing algorithms for quitting in Wave Combinator and
algorithms for predicting quitting and knowledge assessment in        content knowledge tests in both games. Deep Learning models
two different learning games using the simplest possible feature      performed best in predicting quitting in the Crystal Cave game.
engineering. We found that, on the whole, these models were able      Level quits can be predicted with an average accuracy of 0.784 for
to successfully predict quitting behavior and correct answers in      Crystal Cave and 0.841 for Wave Combinator, an improvement of
our two games and their associated post-tests. This is a promising    12.4% and 3.1% over baseline, respectfully. Correct answers
finding for continuing to deploy educational data mining methods      across the embedded knowledge assessment items can be
in order to capture and identify learning and behaviors of interest   predicted with an average accuracy of 0.649 for Crystal Cave and
within digital games.                                                 0.683 for the Wave Combinator. The models provided a 3.3% and
Accurate prediction of quitting behaviors and post-test               4.6% improvement over baseline for these games.
performance has a number of practical applications within             These results show that educational data mining techniques can
educational settings. For instance, players who are identified as     provide some predictive value to different kinds of educational
being at-risk for quitting a level may be given targeted behavioral   games even with relatively minimal feature engineering. We hope
or affective scaffolds to keep them on-task and working               that other researchers can be encouraged to apply similar methods
productively. Players who have a low predicted score for a post-      to their own games given our results.
test assessment can be given additional practice opportunities on-
demand, based on the specific misconceptions or difficulties they     6. ACKNOWLEDGMENTS
are having.                                                           The authors gratefully acknowledge partial support of this
                                                                      research by NSF through the University of Wisconsin Materials
Research Science and Engineering Center (DMR-1720415) and             8. APPENDIX A: Hyperparameters used for
the Wisconsin Department of Public Instruction.
                                                                      each model
7. REFERENCES
[1] Crystal Cave [Computer Software]. (2017). Madison: Field          Model                      Hyperparameters
    Day.
                                                                      Naive Bayes                n/a
[2] Karumbaiah, S., Baker, R.S., Shute, V. (2018) Predicting
    Quitting in Students Playing a Learning Game. Proceedings                                    Family = binomial
    of the 11th International Conference on Educational Data          Generalized Linear Model   Solver = L_BFGS
    Mining, 21-31.
                                                                      Logistic Regression        Solver = L_BFGS
[3] Kiili, K., Devlin, K., Perttula, A., Tuomi, P., & Lindstedt, A.
                                                                      Fast Large Margin          Strategy = 1 against all
    (2015). Using video games to combine learning and
    assessment in mathematics education. International Journal                                   Activation = rectifier
    of Serious Games, 2(4), 37-55.                                                               Hidden layer sizes = 50,50
[4] Maguth, B., List, S., & Wunderle, M. (2015). Teaching             Deep Learning              Epochs = 10.0
    Social Studies with Video Games. The Social Studies,
    106(1), 32-36.                                                                               Criterion = gain_ratio
[5] Malkiewich, L., Baker, R.S., Shute, V., Kai, S., Paquette, L.                                Maximal depth = 2
    (2016) Classifying behavior to elucidate elegant problem                                     Apply Pruning
    solving in an educational game. Proceedings of the 9th
    International Conference on Educational Data Mining, 448-                                    Confidence = 0.1
    453.                                                                                         Minimal Gain = 0.05
[6] Mierswa, I., & Klinkenberg, R. (2019). RapidMiner Studio          Decision Tree              Minimal Leaf Size = 2
    (9.2) [Data science, machine learning, predictive analytics].                                Trees = 20
    Retrieved from https://rapidminer.com/
                                                                                                 Criterion = gain_ratio
[7] Rowe, E., Asbell-Clarke, J., Baker, R., Gasca, S., Bardar, E.,
    & Scruggs, R. (2018). Labeling Implicit Computational                                        Max Depth = 7
    Thinking in Pizza Pass Gameplay. In Extended Abstracts of                                    Apply Pruning
    the 2018 CHI Conference on Human Factors in Computing
                                                                                                 Confidence = 0.25
    Systems.
                                                                                                 Minimal gain = 0.05
[8] Shute, V. J., Ventura, M., & Ke, F. (2015). The power of
    play: The effects of Portal 2 and Lumosity on cognitive and                                  Minimal Leaf Size = 2
    noncognitive skills. Computers & Education, 80, 58-67.                                       Guess subset rratio
[9] Shute, V. J., Ventura, M., & Kim, Y. J. (2013). Assessment        Random Forest              Voting Strategy = confidence vote
    and learning of qualitative physics in newton's
    playground. The Journal of Educational Research, 106(6),                                     Trees = 60
    423-430.                                                                                     Max Depth = 2
[10] Watson, W. R., Mong, C. J., & Harris, C. A. (2011). A case                                  Min Rows = 10
     study of the in-class use of a video game for teaching high
                                                                                                 Min Spilt Improvement = 0
     school history. Computers & Education, 56(2), 466-474.
                                                                                                 Bins = 20
[11] Wave Combinator [Computer Software]. (2017). Madison:
     Field Day.                                                                                  Learning Rate = 0.1
[12] Wallner, G. (2015). Sequential Analysis of Player Behavior.      Gradient Boosted Trees     Sample Rate = 1.0
     In CHI PLAY ’15 Proceedings of the 2015 Annual                                              Type = C-SVG
     Symposium on Computer-Human Interaction in Play, 349–
     358. https://doi.org/10.1145/2793107.2793112                                                Kernel = rbf
[13] Pavlik Jr., P.I., Cen, H., & Koedinger, K.R. (2009).                                        Gamma = 1.0E-4
     Performance Factors Analysis – A New Alternative to                                         C = 100.0
     Knowledge Tracing. In V. Dimitrova & R. Mizoguchi (Eds.),
                                                                      Support Vector Machine     Epsilon = 0.001
     Proceedings of the 14th International Conference on Artificial
     Intelligence in Education. Brighton, England.
[14] Zoombinis [Computer Software]. (2015). TERC.