<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How Long is Enough? Predicting Student Outcomes with Same-Day Gameplay Data in an Educational Math Game</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rachel Harred</string-name>
          <email>rlharred@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Preya Shabrina</string-name>
          <email>pshabri@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christa Cody</string-name>
          <email>cncody@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiffany Barnes</string-name>
          <email>tmbarnes@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehak Maniktala</string-name>
          <email>mmanikt@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Collin Lynch</string-name>
          <email>cflynch@ncsu.edu</email>
          <email>ynch@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State, University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Curriculum-integrated games can provide teachers with data to help them decide when and how to intervene with individual students. Based on our prior work observing teachers using ST Math, teachers may not be able to attend to a dashboard or student screens to determine who might need intervention. We therefore set out to determine how much data we need from the current ST Math gameplay session to predict performance. Based on the available log data that tracks student performance over SETS of puzzles, we performed two experiments to predict performance. The rst uses data from one game level, which is about 3 minutes long, to predict the performance on the next level, and the second uses the rst 6 minutes of gameplay to predict how many levels a student can complete in 20 minutes, a typical class length. Our results show that our data are not negrained enough to allow for paired level prediction, but that 6 minutes of gameplay can be used to rank students in order of performance for a class session. These results can be used as a basis for an alert system that could help teachers prioritize their time in the classroom.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Educational games can be a useful tool for teachers to
provide additional practical learning for students [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As more
educational games become curriculum-integrated, a signi
cant portion of a students time can be spent in these
systems. However, teachers cannot monitor and assist each
student at the same time, struggling to identify students
who need help the most. In previous work, we observed
teachers assistance often was in uenced by things such as
classroom layout and disruptive behavior rather than learner
pro ciency or needs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Furthermore, the work identi ed
that students who \struggled quietly" often went unnoticed.
      </p>
      <p>
        In other work, the authors found that when students
possibly need intervention but do not receive it, they might
get frustrated and give up or replay an easier game instead
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Other research has also shown that teachers can often
unintentionally favor or give assistance to certain types of
students due to di erences in perceptions or help-seeking
behaviors [
        <xref ref-type="bibr" rid="ref22 ref4 ref5">4, 22, 5</xref>
        ]. Therefore, providing teachers with
information to help them determine who needs assistance the
most may be crucial to some low-performing students.
      </p>
      <p>Despite the amount of data gathered with each playthrough,
teachers in our system are only provided with a student's
current progress in the curriculum and a feature that
allows a student to \raise" their hand through the system.</p>
      <p>However, this is only visible on the student's screen via a
purple hand indicator and often goes unnoticed. Therefore,
we sought to determine if there was a way to provide
teachers with knowledge regarding students projected progress
as fast as possible, so that the teachers can determine who
to help from there. With the machine learning techniques
that can process such data and help predict outcomes, we
wanted to nd the correct technique to answer our question.</p>
      <p>
        Machine learning and educational data mining techniques
have been successfully used in educational game research
for many years [
        <xref ref-type="bibr" rid="ref10 ref11 ref18 ref21">11, 21, 10, 18</xref>
        ].
      </p>
      <p>In this paper, we tried to determine the smallest amount of
time needed to predict student outcomes for one gameplay
session by investigating multiple feature selection algorithms
and prediction models on student gameplay data for an
educational game, Spatial Temporal Math (ST Math). We tried
two methods of prediction using data analysis and machine
learning: 1) Trying to predict student outcomes for
playing one level of a game using gameplay data from only the
previous level, 2) Using the least amount of time of a
student's gameplay data to predict the number of levels they
will pass in the next twenty minutes of gameplay. To
accomplish this, we tried various machine learning and feature
selection methods to nd the most signi cant features needed
to predict student outcomes in this educational game. In
this study our intention was to give insight to the teachers
of ST Math by indicating our best guess for which students</p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
could most bene t from teacher intervention on a single day,
provided early enough in the gameplay session to allow the
teacher to help as many as possible.</p>
    </sec>
    <sec id="sec-2">
      <title>1.1 Spatial Temporal Math (ST Math)</title>
      <p>
        ST Math is a curriculum-integrated supplemental
mathematics game for 2nd-4th-grade students that uses spatial
puzzles to teach basic math concepts [
        <xref ref-type="bibr" rid="ref12 ref14 ref15 ref19">19, 12, 14, 15</xref>
        ]. The
puzzles do not contain any textual instruction. The games
are grouped at the highest level by objective which indicates
the broad math concept. Each objective contains a number
of games, the gameplay under an objective varies but
concerns the same content inside an objective. The games
usually have between 3 to 5 levels each, and the gameplay across
levels is similar but increases in di culty. There are usually
between 6 and 8 puzzles per level. The puzzles are either
randomly generated using a template or randomly selected
from pre-designed puzzles depending on the level. Each
puzzle requires the student to do the correct action to indicate
their answer. Animated feedback is presented to the student
following the puzzle solving attempt that shows the student
if they are correct or incorrect. For example, in the game
\Fair Sharing" under \Division Concepts" a student is asked
to distribute boxes equally among animals to construct a
straight bridge and will show the bridge blocked o or with
gaps that make it impossible to cross in case of an incorrect
answer. A level begins with a set number of lives, usually
2, that resets at the beginning of each level. If a student's
response is incorrect, the student loses a life. If the student
loses all of their lives before completing the level, they do
not pass the level and must retry it. To pass a level, the
student must complete all puzzles without losing all their
lives. After a student passes a level, they may move on to
the next level in the game or objective, or backtrack and
play a previously passed level. We refer to this backtracking
as replay. A level attempt includes passing a level, failing
a level, and replay. Each student has the option to "raise
their hand" if they want help from their teacher by clicking
on a hand icon on the screen. The teacher also has access
see which objectives and levels a student has passed. See
Figure 1 for a breakdown of ST Math.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. RELATED WORK</title>
      <p>
        Educational games in classrooms are helpful to teachers
because the students can receive individualized attention and
learning from the game while the teacher gives one-on-one
attention to students who need it.[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, teachers
have limited time and try to prioritize their attention to the
students who need it most. It has been shown that students
who are given attention by teachers have increased student
learning[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; therefore, teachers who are able to focus their
attention to low-performing students should see them
benet.
      </p>
      <p>
        Unfortunately, there are many reasons that students who
need help do not receive it. One study found that
middleclass students seek help more directly than working-class
students and end up getting more help as a result[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In
a recent study of classroom observations using ST Math,
researchers found di erences in classroom format have an
in uence on who receives help[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Furthermore, this work
found teachers in free-seating classrooms could not easily see
the raised hand indicator on student screens and so those
help-seeking students went unnoticed. In classrooms that
use rotation-seating, teacher attention was only given if the
ST Math group was being disruptive. In general, the
students who directly asked for help or were obviously o -task
received more teacher intervention than the students who
were less vocal [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        The task of automatically identifying students who need
help has been explored [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. With machine learning
methods that can process large amounts of data and make
predictions, many have been using these methods in attempt
to solve the problem. Ahadi et al. explored machine
learning techniques and were able to use the rst week of data
in a programming course to predict student performance
with accuracy ranging from 71-80%. Additionally, Jiang
et al. employed logistic regression models to predict the
type of certi cate a learner received in Massive Open Online
Courses (MOOCs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In a study at the Open University,
decision tree models were implemented on users' current and
previous activity to predict if they were at risk of failing a
module [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Another Open University study explored using
Bayesian models to build real-time predictive models from
student data and found little di erence among the types of
models but that the accuracy increased with the addition
of data throughout the progression of the learning module
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. These studies show that machine learning can be used
to predict and understand student behavior, but are not
being used to directly aid in student learning.
      </p>
      <p>
        Predictive models are now being integrated into teacher
dashboards or alert systems to enable this aid. In a survey of
K-12 teachers who used intelligent tutoring systems in class,
Holstein et al. found that they were very interested in
having real-time classroom monitoring tools that would help
them decide which students most needed their attention [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
In another study, Holstein et al. developed a teacher alert
system using smart glasses that showed real-time indicators
of student behaviors oating above their heads [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Their
early ndings suggest that this helped direct the teachers to
the students who needed intervention the most [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. DATA</title>
      <p>The data is collected by MIND Research Institute, who
created ST Math. This study was conducted on data from
3rd grade students who played ST Math during the
20162017 school year. The data contains 31 objectives, 154
games, and 669 levels which equals 5,186,269 total level
attempts by 8,983 students from 111 schools and 636
teachers. We excluded students who completed objectives not
contained in the 3rd grade objectives which removed 21,544
level attempts. These are students who might have been
erroneously included from other grades, as the system is
used for grades 3-5. For the purposes of our study, we
ltered out level attempts where the student only completed
one level attempt in our 30 minute gameplay session cuto .
This removed 101,849 level attempts, leaving us with a
nal dataset of 5,062,876 unique level attempts. The initial
data we were provided with had 6 features for each level
attempt: STMathID (unique ID for each student), Level,
Objective Code, Timestamp, Number of Correct Puzzles, and
Total Number of Puzzles. For the purpose of our analysis,
we created additional features shown in 1.</p>
    </sec>
    <sec id="sec-5">
      <title>4. METHOD</title>
      <p>We wanted to explore di erent ways of providing teachers
with information about student progress, so that they could
intervene and monitor progress according to their own
classroom goals. Therefore, our intention is to predict the
projected progress of a student using the least amount of
information and using only same-day gameplay data. Here, we
are comparing two methods of data segmentation:
predicting the time spent passing the next level based on the time
spent passing the current level, and using the least amount
of gameplay to predict how many levels would be passed
in the next 20 minutes of gameplay. We are attempting to
see if a student's performance in the beginning of a
gameplay session is a good prediction for their later performance.
A gameplay session is de ned as subsequent level attempts
that are separated by less than 30 minutes. If two level
attempts are separated by 30 minutes or longer, we count the
second attempt as a new gameplay session. We decided on a
30 minute cuto because a pause in the game of 30 minutes
or more might indicate the student was working on
something else in between and we cannot say that the previously
played level will have any e ect on the performance of next
level. However, we still want to include students who may
be truly struggling, receiving help from a teacher or giving
help to another student during the playthrough of a level.
Only 3.5% of the data for the gap between play sessions was
between 30 minutes and 20 hours, while 11.6% of the data
was 20 hours or longer between gameplay sessions.</p>
    </sec>
    <sec id="sec-6">
      <title>5. EXPERIMENT 1: PAIRWISE PREDIC</title>
    </sec>
    <sec id="sec-7">
      <title>TION</title>
      <p>This section details our attempt at using previous level data
to predict student outcomes for the next level attempt.</p>
    </sec>
    <sec id="sec-8">
      <title>5.1 Pairwise Method</title>
      <p>
        We wanted to see if we can predict how well a student will
do on the next level by using only the data from the
previous level in the prediction. Our aim in this experiment
was to investigate if the features of a single level for a
student can predict whether the student would have needed an
intervention for the next level. This would be bene cial to
teachers because it would give an alert immediately after a
student nishes a level that would tell them that the
student might need help on the next level. We used pairwise
prediction, so the data was grouped into Level A and Level
B, with the constraint that Level A and Level B had to be
in the same objective and the same gameplay session. We
considered Level A to be the rst attempt on a new level, all
subsequent attempts (retries) until the level was passed, and
any replayed levels that happened before or after the level
was passed. Any gameplay that happened between the rst
attempts of two consecutive levels would be counted as Level
A data. Level B is the next attempt on a new level after
Level A, and contains the same information as Level A:
number of attempts, retries, replays before and after passing, up
until the next new level attempt. We expected the
number of attempts to pass a level to provide information about
how many attempts they will need for the next level because
the levels increase in di culty inside objectives and this is
consistent through ST Math. Also, research has shown that
replay that happens before passing a level results in a
negative e ect on performance while replay that happens after
passing a level has been shown to have a positive e ect[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ];
therefore, we expected this data to also be useful for the
prediction.
      </p>
      <p>Due to time constraints and the complexity of the feature
creation, we used a subset of our total dataset for this
analysis. The dataset includes 830 students, and 665 unique
objective-level pairs. Objective-level pairs are level pairs
within an objective. However, some students did not
complete all the objective-levels resulting in a total of 277,975
unique student objective-level pairs.
5.1.1 Pairwise Feature Generation
Since the raw data only included gameplay aspects per
attempt such as time taken, attempt performance and the kind
of attempt (retries, replay, etc.), we engineered 33 additional
features for every student objective-level pair. There are 7
attempt categories and 5 metrics per each category. The
7 attempt categories are as follows: overall level attempts,
total retry and replay attempts, retry attempts, total replay
attempts, total replay attempts before passing the current
level, total replay attempts after passing, replay attempts of
the same level (current) after passing, and replay attempts
of other levels after passing. The 5 metrics for each
category are as follows: whether an attempt category occurred
(except for overall attempts category), total number of
attempts (except for overall attempts category), total time,
average time, and average performance.</p>
      <p>
        Next, for each student, we identi ed the consecutive levels
(Level A, Level B ) within each objective that were
completed in the same session. We found a total of 222,258 such
level pairs for 830 students. We explored four di erent ways
to de ne an intervention: if the total time was greater than
the 75th percentile (I-TotalTime), if the average time was
greater than the 75th percentile (I-AvgTime), if the average
performance was less than 25th percentile (I-AvgPerf), and
if the student could not nish a level in the rst attempt
(I-FirstAttempt). Each of these intervention types were
intended to capture a di erent aspect of a student's ability to
complete a level.
5.1.2 Pairwise Feature Selection and Prediction
Models
The analysis was carried out in Python. We normalized the
time-related features and then explored three feature
selection techniques in the scikit-learn[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] package. We used a
pipeline of a feature selection wrapper method,
SelectFromModel, with models such as LinearSVC (Linear Support
Vector Classi er with L1 loss), LassoCV (Lasso linear model
with 3-fold Cross Validation), and Logistic Regression. We
used 7 classi ers KNN (n = 3), LinearSVC (Linear Support
Vector Classi er), Decision Tree (using Gini index),
Random Forest (using Gini index), MLP (Multi-layer
Perceptron classi er with a 2-layer (100,100) neural network using
a learning rate of 0.001 and reLU activation function),
ADAboost, Naive Bayes and measured the prediction accuracy
using 10-fold Cross Validation.
      </p>
    </sec>
    <sec id="sec-9">
      <title>5.2 Pairwise Results &amp; Discussion</title>
      <p>The feature selection based on LinearSVC with L1 loss and
the Random Forest classi er provided the most optimal
prediction accuracy. Table 3 shows the results of using a
Random Forest classi er for each intervention type. We observed
that the average time spent on Level A was selected for each
intervention type based target for Level B. The distribution
parameters of the features selected are shown in Table 2.
We observed that very few features related to the replays
were selected. This may be because the consecutive level
pair dataset recorded very few rows with retries or replays
(18.20%) and even lower just considering replays (0.12%).
Such a high degree of sparsity in replay made any replay
related features not signi cant enough to contribute towards
the predictions. Another interesting observation is that the
performance over all the attempts (avgPerfTotal) in Level A
was not a signi cant predictor of the intervention for Level B
in any of the models primarily because of the small variance
recorded for this feature. The small variance is due to the
granularity of the data only recording passed level attempts
with failed puzzles as 100% performance. On the other hand,
the average time in Level A (avgTimePerLevel) was a
signi cant predictor of the intervention for Level B for every
intervention type based target. We recorded few replays in
general and a low variance in the performance related
features, so only the time related features were varied enough
to capture the relationship between Level A and Level B.
The results suggest that the average time spent on an
attempt in a level is the most signi cant predictor of whether a
student may need assistance in the next level. However, the
classi er models for each intervention type did not perform
signi cantly better than a baseline classi er that would
predict all the observations to be the Majority Class (the class
containing more students) as shown in Table 3. This
suggests that the relation between the behavior of students in
two consecutive levels may be highly varied and that it is
di cult to generalize whether an intervention is needed in a
level based on only one previous level. It may also suggest
that such a prediction may be dependant on how far along
students are in their academic year. To investigate the rst
scenario, we added a feature for a student's previous
performance average, an average of every level attempt until now,
in attempt to help distinguish low-performing students from
the rest. Previous performance average was selected for each
intervention type prediction but had lower feature
importance (0.03%) because of the low variance and, therefore,
did not a ect the prediction accuracy. To investigate if the
time of the academic year had any impact on the
prediction, we added a feature to indicate the month in which
the sessions occurred. Similar to the previous performance
average, this feature was selected but had a low feature
importance (0.05%) leading to an insigni cant di erence in the
prediction accuracy.</p>
      <p>Since only time related features were varied enough to
capture the variance in the student behavior in two consecutive
level pairs, we explored ways other than feature generation
to perform the pairwise prediction. We sliced the data based
on aspects, such as replay type or month of the year but,
again, obtained similar prediction accuracies; however, the
replay related features did get selected and had high
importance for the prediction in the data sliced by replay type.
To investigate if the variance in the content of the
objectives may be a ecting the prediction results, we performed
the prediction for each intervention type within a single
objective and observed that the prediction accuracy decreased
slightly. This suggests that even within one objective, the
behavior of a student in one level, as captured in its
current granularity, may not accurately predict if they need an
intervention in the next level.</p>
      <p>The pairwise prediction models may not have generated
desirable results because we may need more than just one
previous level's data to predict if an intervention is needed.
There is not su cient data about each level in this dataset to
accurately represent the student's performance and to
create good predictions. Therefore, having more ne-grained
details about level attempts, including knowing more about
how the levels compare to each other within each objective,
may improve the prediction accuracy.</p>
    </sec>
    <sec id="sec-10">
      <title>6. EXPERIMENT 2: LEVELS COMPLETED</title>
    </sec>
    <sec id="sec-11">
      <title>IN 20 MINUTES</title>
      <p>
        We chose to determine if we could predict the number of
levels completed in 20 minutes using only information from
the current session. Using only information from the
current session will allow an easier integration with the current
system with minimal changes needed. Di erent schools and
classrooms have unique ways of using ST Math[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. As a
result, there is a variety of session times ranging from very
short (less than 5 min) to sessions lasting over an hour, with
an average of 23 minutes spent in a session. Therefore, we
decided to predict how many levels a student would
complete in a 20 minute session. This information could be
used by teachers to identify students who will not be able
to complete the number of levels the teacher expects for
that session and the teacher can intervene to assist or
encourage. With this prediction, the system could provide a
teacher with each student's predictions and order the
students by the lowest predicted number of levels to complete
in the next 20 minutes. Then, the teacher can easily look at
the slowest students and make the judgment, based on their
knowledge of each student and what goals the teacher has for
that lesson, and determine who they need to assist. Studies
have shown that teachers may focus assistance on students
with better help-seeking behaviors because they are often
more persistent or better in requesting help[
        <xref ref-type="bibr" rid="ref20 ref4">4, 20</xref>
        ].
Providing this information this early in the session could be crucial
for low-performing student with who are not asking for help
or not doing so e ectively.
      </p>
    </sec>
    <sec id="sec-12">
      <title>6.1 Levels Completed Prediction: Methods</title>
      <p>
        The data we used consisted of 787949 session observations
from 8978 unique students, 111 schools, and 636 teachers. A
session represents a period of time that the student spends
working on ST Math without taking longer than a 30 minute
break (see Section 4 for full de nition). For accurate
predictions, we chose to use the rst 6 minutes of gameplay due to
the average level attempt taking approximately 3 minutes.
We refer to this segment of data used for prediction as a
time \slice". Since our goal was to use the least amount of
information to do the prediction, we wanted this time to be
as short as possible. We initially attempted to use shorter
time slices, but due to a level attempt taking on average
3 minutes, this did not provide a su cient amount of data
to represent the students' gameplay behavior and, in some
cases, eliminated slower students data for that time slice.
Next, we removed sessions under 10 minutes (242750 obs.
- 169863 obs. under 6 min) and sessions over 75 minutes
(10,257 obs.). We chose these cuto s to eliminate short
sessions where predictions would not be useful and long
sessions, in some cases over 4 hours, that were likely
anomalous.
6.1.1 Levels Completed Prediction: Feature
Generation
From the level data, the data was segmented into the rst 6
minute time slice for prediction. Features were aggregated
from this 6 minute time slice to capture what each student
was able to do, such as complete a level, fail a level, retry a
level, or engage in replay. The features generated are based
on performance, time, level attempts/replay features, the
objective the student was in within that slice. The
performance features were: average performance (numeric),
percentage of levels passed out of all attempts (numeric),
and percentage of levels completed out of all attempts
(numeric). The time features from the slice were: the total
time (numeric), the average level time (numeric), number
of passed levels per time (numeric, scaled), number of
completed levels per time (numeric, scaled), and the month of
the session. The level attempt features were: total
number of replays (numeric), total number of levels failed
(numeric), total number of levels passed (numeric), total
number of levels completed (numeric), total puzzles attempted
(numeric), total puzzles completed (numeric), whether they
engaged in replay (binary), whether they re-attempted a
level (binary). Then, there were 31 binary features
representing the objective the student was playing when the
session started.
6.1.2 Levels Completed Prediction: Model Selection
and Feature Selection
For this prediction, we tried a variety of models, tuning of
these models, and alteration of the target variable. Model
and feature selection was accomplished by using scikit-learn[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
For the models, we tried both classi cation and regression.
For classi cations, multiple groupings of number of
completed levels in 20 minutes were chosen using balanced classes,
the best accuracy (77% using a 2-layer neural network)
being a split to determine if a student could complete at least
an average number of levels in 20 minutes. However, we
decided that regression, providing ner-grained predictions,
would provide more useful information to the teachers and
allow them to have more autonomy in deciding which
students need help. For regression, we tried to predict how
many levels a student could complete in 20 minutes, which
derived from taking the total number of levels completed
and the total time of the session and scaling.
      </p>
      <p>We tried multiple models, including decision trees, neural
networks, and random forests after normalizing the features.
Intrepretable machine learning methods are more important
because knowing which features are more in uential to
predicting performance can give insights to how students learn
in games. In the results, we show the 2 best models
compared to a baseline. The baseline model is created by always
predicting the mean of the completed levels per 20 minutes.
The best models were created by testing multiple models
and ne-tuning the parameters. The two best models are
created from scikit-learn: a 3-layer (50,30,20) neural
network (MLPRegressor) using a learning rate of 0.001 and
reLU activation function, and a Random Forest
(RandomForestRegressor) using mean squared error as the criterion
function and setting the minimum samples for a split to be
20.</p>
      <p>To evaluate each model, we used the following metrics: Mean
Absolute Error (MAE), Root Mean Squared Error (RMSE),
adjusted R-squared score, and explained variance (EV). We
choose these metrics to evaluate, on average, how accurate
each prediction rate was to determine if the error is small
enough to still provide a good estimate of the students'
projected progress. Both R2 and EV were used to evaluate the
variance of these errors and check for biases within the
models. All models were evaluated with 10-fold cross validation.
We attempted feature selection using scikit-learn lter
methods, such as feature importance from tree regression, and a
wrapper method (SelectFromModel) with each model.
Feature selection did not improve any models, and resulted in
signi cantly worse predictions in most cases. This is most
likely due to the limited amount of features available.
Because our data is not ne-grained, we have a limited amount
of information about each student for a level attempt. This
indicates that each feature could be providing key
information regarding their current progress. Therefore, for the best
models the whole feature set was used (see Feature
Generation).</p>
    </sec>
    <sec id="sec-13">
      <title>6.2 Levels Completed Prediction: Results &amp;</title>
    </sec>
    <sec id="sec-14">
      <title>Discussion</title>
      <p>This section discusses the results of the Experiment 2
regressions.</p>
      <p>
        Table 5 shows the results of the evaluation metrics for the
two best regression models compared to the baseline model.
The NN and the Random Forest both perform similarly,
both outperforming the baseline model. Although the MAE
does not have a large di erence, the RMSE is much lower.
This indicates that the variance of the errors is signi cantly
smaller for our predictive models. The MAE of 1.2 for our
predictive models means that on average the prediction will
only be around 1 level o for a speci c student, which still
provides a good estimation for the teacher to use. The
adjusted R-squared and explained variance are almost identical
for both models, which happens when the mean of the errors
is approaching zero. Although these scores are not perfect,
in the context of educational data from a system used with
multiple teaching styles, this is a highly meaningful result[
        <xref ref-type="bibr" rid="ref1 ref17">1,
17</xref>
        ].
Table 6 shows the top ve most important features for
decisions in the Random Forest. All of these top features focus
on the number of completed levels, the total time, or a
combination of these features. This is not surprising because the
amount of levels a student can complete in the rst 6
minutes should be a good indication of how they will perform
over the whole session. However, this is assuming that the
students remain seated and playing the game in the same
manner.
      </p>
      <p>ve most important features from
Figure 2 shows a density plot of the predicted values versus
the actual values, the yellow/lightest color being the highest
density. Both gures show the highest density areas occurs
closely to the actual values. The Neural Network appears
to have a higher density closer to the line and the points
appear to be more compact, although both models show
similar predictions. Both gures are zoomed in to focus on
the lower level number predictions, although few points have
values higher than 10. One note is both models are less tted
for the higher values and tends to predict around 10 after the
actual value is 10+. However, we are mostly concerned with
students who are completing very few levels. If a student
falls into the 10+ range of levels completed, the actual value
becomes less important due to how much above the average
it is. A teacher will still be able to use this information to
identify over-performing students and ensure they don't get
too far ahead of the class.</p>
      <p>
        With the variability of how this system is used, the models
evaluations are a positive result. For example, during eld
observations of the system, we found many teachers asked
students ahead in the curriculum to help students next to
them during a gameplay session. Therefore, a student may
spend part of the session working as normal, then, after the
teacher has identi ed a struggling student, the teacher may
ask the student next to them to help. This could result in
much higher predicted values than what the student
actually completes. Furthermore, we observed students in some
classrooms initially talking and working at a slower pace in
the rst few minutes of a session as they settled in, then
shortly being asked to focus. This could result in much
lower predictions of the projected number of levels that
student could complete. Since the data does not only include
the sessions where students quietly work by themselves for a
continuous period of time, accurate predictions are di cult.
Furthermore, the data we used focused on only the same-day
gameplay data, not containing any information regarding
how a student has previously performed in other sessions.
This decision was made to limit the changes required to
implement this system in the game. However, including prior
information may improve predictions. One possible way to
control for the e ect of the di erent teaching styles is to
include teacher or school information in the model. However,
this would create very sparse features due to the large
number of teachers and schools that use the system. A future
attempt could identify and categorize the teachers or schools
based on similar styles and add those features to the models.
This prediction can be used in two main ways: identifying
the lowest performing students who may need assistance,
but may not be requesting help, and identifying students
who may be working too fast and getting ahead in the
curriculum. The second usage may not seem like an issue, but
having a large knowledge gap between students may make
a classroom harder to manage and teach. This is a problem
teachers seek to avoid in ST Math that they have
remedied by asking those students to help others or by allowing
them to play games while others catch up[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For ease of
use, these predictions could be provided in a simple list with
each student's name and the predicted number of levels they
will complete in 20 minutes. Furthermore, the top 5 lowest
and highest predictions could be presented at the top of the
interface so teachers could quickly have an idea of who is
struggling and who may need to be slowed down. Because
teachers already have access to where each student is in the
curriculum, the teacher can use their expertise and
knowledge of the students to make judgement calls on what to do
from there. A mock interface of how this could be presented
can be seen in Figure 3.
      </p>
    </sec>
    <sec id="sec-15">
      <title>7. OVERALL DISCUSSION</title>
      <p>The results for the levels completed experiment were more
promising than the pairwise experiment. For the pairwise
prediction, the lack of ne-grained puzzle level data made
it di cult to predict whether a student may need
intervention based only on their previous level's data. We believe
the results for this method of pairwise prediction might
improve with more data about how the objectives, games, and
levels relate to each other. On the other hand, the
prediction model from the levels completed experiment had decent
results with the MAE and RMSE indicating that the
predictions are generally within 1-2 levels of the actual completed
levels for the 20-minute time period. Having additional
information, including ner-grained puzzle-level data, should
also improve this prediction.</p>
      <p>Providing the teachers with a projected completed amount
of levels allows us to give the teachers a list of the
students ranked by the number of levels they are predicted to
complete. This allows the teachers to use their expertise to
distinguish the higher- and lower-performing students
during that game session, and, importantly, the teachers have
the ability to make judgments about interventions
according to their discretion. Currently, the teachers only have
information on student progress in the overall game
curriculum (which objectives each student has nished and how
many levels have been completed). Additionally, the only
method currently used to support students in seeking help
is the raised hand indicator, which has been shown to not
always get the teachers' attention due to its location on the
students' screens. We believe that incorporating this
prediction into the system will be a valuable tool for teachers that
will suggest which students are struggling and allow them
to decide if they need intervention. Giving teachers these
suggestions after only 6 minutes of gameplay time means
that the teachers will have more control over the classroom
progress because they will have more time to help students
get back on track instead of being behind for the entire
session and be able to slow down students who are getting too
far ahead of the class.</p>
    </sec>
    <sec id="sec-16">
      <title>7.1 Limitations</title>
      <p>To reduce the amount of time processing the data, we used a
representative subset for the pairwise prediction. However,
we compared multiple numerical and categorical features
between this subset and the entire dataset and determined that
it contained almost identical distributions of data points.
We created histograms for the distributions of performance,
level play time, levels in session, time of session, performance
session, and compared the number of schools and teachers
represented in the subset to the totals. We were only
missing 6 out of 111 schools and we had students from almost
half of the teachers (291 out of 636) included in our subset.
We do not have ne-grained interaction data, which means
we cannot tell exactly how many puzzles a student gets
wrong. This lack of information causes our data to be skewed
by having many performance scores of 100%, without
capturing the full gameplay. However, there are other features
that we can use tease out this information, like level time, as
students who pass a level while also getting puzzles wrong
will most likely take longer because they are doing more
problems. We have ner-grained puzzle level data, but it
does not match up accurately with our level data. This
means that while we can do studies on these datasets
separately, we cannot combine them to have the full picture of
what a student is doing during the level: which puzzles they
see, if any puzzles are repeated during a level, how many
puzzles right and wrong, and the time spent on each
individual puzzle in a level. These ner granularities could o er
valuable information on what a student is doing during a
level and their performance compared to the whole student
set.</p>
    </sec>
    <sec id="sec-17">
      <title>8. CONCLUSION</title>
      <p>This study aimed to use the least amount of student
gameplay data possible to predict which students would bene t
from teacher intervention during the remainder of the
gameplay session. We tried two granularities of prediction for
our analysis. We hypothesized that we could use one level's
data (average of 3.5 minutes of gameplay) to predict the
next level's outcomes, as this controls for content and
difculty, but this hypothesis was not con rmed. The lack of
ne-grained level attempt data might not allow us to make
a good prediction. Our second hypothesis was that we could
use the rst 6 minutes of gameplay (about 2 levels) to
predict how many levels the student could complete in the next
20 minutes. This had a reasonable outcome with a MAE
of 1.2 and RMSE error of 1.6, meaning that, on average,
the prediction is only o by 1-2 levels, which is a good
estimation of how many levels a student will complete. We
believe this can provide a valuable resource for the teachers
who use ST Math in their classrooms, to help them
concentrate their time and energy on the students who need it
the most. Furthermore, this method allows the teachers to
have a certain level of judgment in regards to who needs the
assistance, which is imperative in a system that is used in
multiple styles. Future work could investigate how this
affected the students' performance if we gave this information
to teachers.</p>
    </sec>
    <sec id="sec-18">
      <title>9. ACKNOWLEDGMENTS</title>
      <p>This research is made possible by support of the National
Science Foundation under Grant No. 1544273.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Abelson</surname>
            ,
            <given-names>R.P.:</given-names>
          </string-name>
          <article-title>A variance explanation paradox: when a little is a lot</article-title>
          .
          <source>Psychological bulletin 97(1)</source>
          ,
          <volume>129</volume>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ahadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lister</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haapala</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vihavainen</surname>
            ,
            <given-names>A.:</given-names>
          </string-name>
          <article-title>Exploring machine learning methods to automatically identify students in need of assistance</article-title>
          .
          <source>In: Proceedings of the eleventh annual International Conference on International Computing Education Research</source>
          . pp.
          <volume>121</volume>
          {
          <fpage>130</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Backlund</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendrix</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Educational games-are they worth the e ort? a literature survey of the e ectiveness of serious games</article-title>
          . In:
          <year>2013</year>
          <article-title>5th international conference on games and virtual worlds for serious applications (VS-GAMES)</article-title>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Calarco</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          <article-title>: \I need help!" Social class and children's help-seeking in elementary school</article-title>
          .
          <source>American Sociological Review</source>
          <volume>76</volume>
          (
          <issue>6</issue>
          ),
          <volume>862</volume>
          {
          <fpage>882</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Good</surname>
            ,
            <given-names>T.L.</given-names>
          </string-name>
          :
          <article-title>Which pupils do teachers call on?</article-title>
          <source>The Elementary School Journal</source>
          <volume>70</volume>
          (
          <issue>4</issue>
          ),
          <volume>190</volume>
          {
          <fpage>198</fpage>
          (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Holstein</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tegene</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLaren</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aleven</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The classroom as a dashboard: co-designing wearable cognitive augmentation for k-12 teachers</article-title>
          .
          <source>In: Proceedings of the 8th International Conference on Learning Analytics and Knowledge</source>
          . pp.
          <volume>79</volume>
          {
          <fpage>88</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Holstein</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLaren</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aleven</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Intelligent tutors as teachers' aides: exploring teacher needs for real-time analytics in blended classrooms</article-title>
          .
          <source>In: Proceedings of the seventh international learning analytics &amp; knowledge conference</source>
          . pp.
          <volume>257</volume>
          {
          <fpage>266</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schenke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warschauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'dowd</surname>
          </string-name>
          , D.:
          <article-title>Predicting mooc performance with week 1 behavior</article-title>
          . In:
          <article-title>Educational data mining</article-title>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Karumbaiah</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shute</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Predicting quitting in students playing a learning game</article-title>
          .
          <source>In: EDM</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.E.</given-names>
            ,
            <surname>Popovic</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Learning individual behavior in an educational game: a data-driven approach</article-title>
          .
          <source>In: Educational Data Mining</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butler</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andersen</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>O'Rourke</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brunskill</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popovic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Predicting player moves in an educational game: A hybrid approach</article-title>
          . In: EDM. pp.
          <volume>106</volume>
          {
          <fpage>113</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cody</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lynch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>The antecedents of and associations with elective replay in an educational game: Is replay worth it</article-title>
          ? In: EDM (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Peddycord-Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catete</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lynch</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A eld study of teachers using a curriculum-integrated digital game</article-title>
          .
          <source>In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems</source>
          . p.
          <fpage>428</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Peddycord-Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cody</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kessler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lynch</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Using serious game analytics to inform digital curricular sequencing: What math objective should students play next?</article-title>
          <source>In: Proceedings of the Annual Symposium on Computer-Human Interaction in Play</source>
          . pp.
          <volume>195</volume>
          {
          <fpage>204</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Peddycord-Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harred</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karamarkovich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lynch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Learning curve analysis in a large-scale, drill-and-practice serious math game: Where is learning support needed?</article-title>
          <source>In: International Conference on Arti cial Intelligence in Education</source>
          . pp.
          <volume>436</volume>
          {
          <fpage>449</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , et al.:
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
          <source>Journal of machine learning research 12(Oct)</source>
          ,
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Prentice</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          :
          <article-title>When small e ects are impressive</article-title>
          .
          <source>Psychological bulletin 112(1)</source>
          ,
          <volume>160</volume>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Educational data mining: a review of the state of the art</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews)
          <volume>40</volume>
          (
          <issue>6</issue>
          ),
          <volume>601</volume>
          {
          <fpage>618</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Rutherford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duncan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burchinal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kibrick</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duran</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , et al.:
          <article-title>A randomized trial of an elementary school mathematics software intervention: Spatial-temporal math</article-title>
          .
          <source>Journal of Research on Educational E ectiveness 7(4)</source>
          ,
          <volume>358</volume>
          {
          <fpage>383</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ryan</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gheen</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Midgley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Why do some students avoid asking for help? an examination of the interplay among students' academic e cacy, teachers' social{emotional role, and the classroom goal structure</article-title>
          .
          <source>Journal of educational psychology 90(3)</source>
          ,
          <volume>528</volume>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Sabourin</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shores</surname>
            ,
            <given-names>L.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          :
          <article-title>Understanding and predicting student self-regulated learning strategies in game-based learning environments</article-title>
          .
          <source>International Journal of Arti cial Intelligence in Education</source>
          <volume>23</volume>
          (
          <issue>1-4</issue>
          ),
          <volume>94</volume>
          {
          <fpage>114</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Skinner</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belmont</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Motivation in the classroom: Reciprocal e ects of teacher behavior and student engagement across the school year</article-title>
          .
          <source>Journal of educational psychology 85(4)</source>
          ,
          <volume>571</volume>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Wol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zdrahal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herrmannova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuzilek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hlosta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Developing predictive models for early detection of at-risk students on distance learning modules</article-title>
          .
          <source>In: Machine Learning and Learning Analytics Workshop at The 4th International Conference on Learning Analytics and Knowledge (LAK14)</source>
          . p.
          <volume>24</volume>
          {
          <issue>28</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Wol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zdrahal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pantucek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment</article-title>
          .
          <source>In: Proceedings of the third international conference on learning analytics and knowledge</source>
          . pp.
          <volume>145</volume>
          {
          <fpage>149</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>