<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Challenge Outcomes for Students in a Digital Game for Learning Genetics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ziwei Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bradford Mott</string-name>
          <email>bwmott@ncsu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wookhee Min</string-name>
          <email>wmin@ncsu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Taylor</string-name>
          <email>rgtaylor@ncsu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danielle Boulden</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trudi Lord</string-name>
          <email>tlord@concord.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frieda Reichsman</string-name>
          <email>freichsman@concord.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chad Dorsey</string-name>
          <email>cdorsey@concord.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Wiebe</string-name>
          <email>wiebe@ncsu.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Lester</string-name>
          <email>lester@ncsu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Concord Consortium</institution>
          ,
          <addr-line>Concord, MA, 01742</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, North Carolina State University</institution>
          ,
          <addr-line>Raleigh, NC, 27695</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of STEM Education, North Carolina State University</institution>
          ,
          <addr-line>Raleigh, NC, 27695</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, digital games for learning have shown significant potential for creating engaging and effective student learning experiences. A common gameplay design used by many digital games for learning is providing students with a series of challenges featuring varying levels of difficulty. Identifying whether students will struggle on certain challenges is a key task in these environments because it could support adaptively adjusting difficulty levels and providing immediate assistance to students. In this paper, we present a data-driven approach to modeling students' gameplay behaviors with challenges in an open-ended learning environment for introductory genetics. Challenge outcome prediction models utilize students' observed gameplay behaviors with previous challenges to classify students' performance on the next challenge into two categories: quit or complete. We build machine learning models for predicting students' gameplay performance by taking advantage of a corpus of 633 students' ingame behaviors from Geniventure, a digital game for learning genetics. We compare the accuracy of the models to gain insights into which models perform best for this prediction task. Results show that support vector machine (SVM) models produce the overall best performance in predicting gameplay outcomes for challenges.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Recent years have seen growing interest in digital games for
learning because of their potential for creating engaging and
effective student learning experiences [7][
        <xref ref-type="bibr" rid="ref28">27</xref>
        ]. Researchers have
investigated digital games for learning in a wide array of
educational domains, including mathematics [
        <xref ref-type="bibr" rid="ref29">28</xref>
        ][
        <xref ref-type="bibr" rid="ref19">18</xref>
        ],
computational thinking [4][
        <xref ref-type="bibr" rid="ref15">14</xref>
        ], and science [
        <xref ref-type="bibr" rid="ref31">30</xref>
        ][
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref13">12</xref>
        ].
Welldesigned digital games for learning must carefully balance students’
engagement in the gameplay experience with a focus on the overall
learning objective [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ]. A common gameplay design used by many
digital games for learning is providing students with a series of
challenges featuring varying levels of difficulty.
      </p>
      <p>
        Successful game-based learning experiences that simultaneously
promote engagement while improving learning outcomes are
possible, but they must be carefully designed [
        <xref ref-type="bibr" rid="ref30">29</xref>
        ]. Researchers
suggest that well-designed educational games should provide
students’ with just-in-time support while focusing their gameplay
at the edge of their abilities, ensuring students remain challenged
throughout the learning experience [9]. To achieve this goal, digital
games for learning should have the ability to detect when students
are struggling and take action to tailor their learning experience to
provide appropriate levels of challenge [
        <xref ref-type="bibr" rid="ref33">31</xref>
        ]. With recent advances
in machine learning techniques, data-driven approaches using
students’ in-game behaviors have enabled the automatic
assessment of students’ evolving competence [
        <xref ref-type="bibr" rid="ref26">25</xref>
        ][
        <xref ref-type="bibr" rid="ref37">35</xref>
        ] and the
modeling of important learning phenomenon, including mind
wandering [6][
        <xref ref-type="bibr" rid="ref16">15</xref>
        ] and wheel spinning [3]. One interesting avenue
of research in student modeling is to examine students’ quitting
behaviors associated with negative learning outcomes [
        <xref ref-type="bibr" rid="ref25">24</xref>
        ][
        <xref ref-type="bibr" rid="ref18">17</xref>
        ]. It
is particularly important to design robust predictive models for
students’ quitting behaviors, since a digital game with this
functionality can, in advance, guide students from undertaking a
challenge that is beyond the learners’ capabilities at that moment.
The goal of this paper is to detect whether a student is likely to quit
an upcoming challenge in a digital game for learning. In this work,
we present a data-driven approach to modeling students’ quitting
behaviors in an open-ended digital game for learning genetics,
Geniventure [
        <xref ref-type="bibr" rid="ref24">23</xref>
        ]. In Geniventure, students learn genetics by
engaging in challenges of varying difficulty levels. Within the
game, although students are encouraged to solve challenges in a
linear manner, they can autonomously choose which challenge they
play as well as to leave a challenge prior to completing it. During
gameplay, students’ gameplay trajectory and their detailed in-game
actions were recorded as trace data logs. Fine-grained and
descriptive features were engineered to effectively capture salient
learning trajectories which are useful to predict quitting behaviors
per challenge. To incorporate students’ gameplay trajectory
information into the model, n-gram features, which are a
contiguous subsequence of n actions from a sequence of actions,
were employed. A suite of machine-learned predictive models was
trained using the extracted features to better understand which
approach offers the best predictive performance. We compare the
performance of the different machine learning algorithms under
two different feature sets in order to gain insights into which
learning algorithms and features perform best for this task.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        Engagement is a key component of successful learning. Detecting
disengagement has been of great interest to educational researchers
[6][
        <xref ref-type="bibr" rid="ref16">15</xref>
        ][
        <xref ref-type="bibr" rid="ref25">24</xref>
        ][
        <xref ref-type="bibr" rid="ref18">17</xref>
        ]. Engagement is often viewed as encompassing
three primary components: emotional, behavioral, and cognitive [8].
Disengagement detectors usually target elements of these three
components, by making inferences about students’ emotional or
cognitive state based on their behaviors. In contrast to models of
detecting disengagement whose ground truth labels are collected by
humans in various manners such as field observations, self-reports,
and retrospective judgement, students’ quitting behaviors can be
directly identified from learning environment trace data [2].
There is considerable research on predicting students’ quitting
behaviors in the context of massive open online courses (MOOCs).
In MOOCs, students’ quitting behavior is usually referred to as
dropout, which indicates situations where a student registers for a
course and makes initial effort on the course activities but
eventually quits before completing the course [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ]. Much of the
work on dropout prediction in MOOCs focuses on developing
features from students’ behaviors and engagement patterns to help
improve prediction [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ]. For instance, Kloft et al. predict dropout
from only click-stream data using a support vector machine (SVM)
[
        <xref ref-type="bibr" rid="ref21">20</xref>
        ]. Halawa et al. study early dropout prediction using student
activity features capturing lack of ability or interest [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ]. Taylor et
al. employed crowd-sourced feature engineering from raw trace
data collected from thousands of students to predict dropout using
logistic regression [
        <xref ref-type="bibr" rid="ref35">33</xref>
        ]. In more recent studies, deep learning
techniques have been employed, achieving high performance on
predicting dropout by taking advantage of large amounts of student
data. Yuntao et al. proposed a composite model to infer students’
dropout behaviors based on a historical log of their learning
activities, including interaction with video lectures, participation in
discussion forums, and performance on assignments. The authors
employed a stacked sparse autoencoder model combined with a
recurrent neural network model to learn high-level representations
of input features and implemented an SVM for final classification
of dropout [
        <xref ref-type="bibr" rid="ref22">21</xref>
        ].
      </p>
      <p>
        Beyond dropout prediction in MOOCs, previous work has also
explored various data-driven approaches to predicting students’
quitting behavior in learning environments. Mills et al. developed
detectors to predict students’ behavioral disengagement through
their quitting behaviors while reading instructional texts.
Supervised machine learning algorithms were used to predict
whether students would quit reading an upcoming text based on
features extracted from reading behaviors on previous texts [
        <xref ref-type="bibr" rid="ref25">24</xref>
        ].
Karumbaiah et al. presented a quitting prediction model for
students playing an educational game called Physics Playground.
Gradient boosting classifiers were trained using a set of engineered
features from students’ interaction data. The features were of
different levels of granularity and used to train both level-specific
models and level-agnostic models to predict students’ quitting
behavior on levels within the game. Level-agnostic models were
found to provide better predictive performance [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ]. Similar to the
digital learning environment in [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ], our digital learning
environment also contains different challenge levels. In our work,
we build a single integrated model to predict students’ quitting
behaviors for all challenges. Different from the level-agnostic
model in [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ], our model uses n-gram features to incorporate
students’ historical gameplay information, while they calculated
accumulated features to summarize historical data. Our work also
investigates the performance of different machine learning
algorithms for this task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. EXPERIMENTAL SETUP</title>
      <p>We investigate our approach of modeling students’ outcomes on
challenges with data collected from high school and middle school
students. In this section, we describe the digital game for learning,
its problem-solving challenges, and the dataset generated from
students’ interactions with the learning environment.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Geniventure</title>
      <p>
        Geniventure is a digital game environment developed for middle
school and high school students (11 ~ 18 years old) to learn genetics.
The design of the game was guided by core ideas in genetics and
science practices aligned with the Next Generation Science
Standards [
        <xref ref-type="bibr" rid="ref34">32</xref>
        ], a set of science education standards developed in
the United States. In the game, students learn concepts in genetics
by completing problem-solving challenges centered around
breeding dragons [
        <xref ref-type="bibr" rid="ref24">23</xref>
        ]. The game consists of 6 levels and over 60
problem-solving challenges of varying levels of difficulty. Each
challenge is designed around one or more genetic concepts with the
same concept potentially appearing across multiple challenges.
Problem-solving challenges within the game appear in a variety of
types. When students launch the game, they have the option to
decide which challenge to begin with and they are free to quit a
challenge at any time during the game. If students finish a challenge,
the game rewards them with a colored crystal based on their
efficiency in solving the challenge. Students can then decide
whether to try the same challenge again or move on to another
challenge.
      </p>
      <p>The goal of this work is to build models that can accurately predict
students’ outcomes on a problem-solving challenge before they
begin the challenge. We focus on challenges from the first two
levels in Geniventure, which test four fundamental concepts in
genetics: simple dominance, recessive traits, sex determination,
and genotype-to-phenotype mapping. These four concepts are
critical for students to understand more complex genetic
phenomena covered in later challenges. In this work, four distinct
challenges are noted as Challenge A, Challenge B, Challenge C,
and Challenge D, respectively (Figure 1). Each of these challenges
cover all of the four fundamental concepts. Because of different
task settings, there are differences in challenges with respect to the
difficulty of solving them. We observed that, Challenge A and
Challenge B are relatively easier than Challenge C and Challenge
D based on students’ success rate of completing the challenges
(Table 1).</p>
      <p>In Challenge A and Challenge B (Figure 1, Top), students are
shown a target dragon with certain traits on the right side of the
screen. On the left side of the screen, the game provides students
with options to manipulate the alleles of the dragon they are
creating. Students have options to set alleles to a dominant gene or
recessive gene that determine the traits of their dragon. The goal of
these two challenges is to create a dragon with the same traits as the
target dragon. Both Challenge A and Challenge B follow this
mechanic, but the visibility of the dragon being created varies
between the two. In Challenge A, students immediately see the
changes to the dragon they are creating as the alleles are
manipulated. However, in Challenge B, the dragon they are
creating is hidden until students have selected the alleles and
requests the dragon to be hatched. To successfully complete these
problem-solving challenges, students must understand several
genetic concepts and be able to infer the phenotype of their dragons
from its genotype. At the start of the challenge, the game randomly
generates an initial set of alleles that require the student to make at
least one selection for each allele to achieve the target trait. The
“Moves Left” indicator in the lower right corner of the game’s
display is initialized with the minimum number of allele changes
needed to generate the target dragon from the initial configuration
of alleles. The indicator will decrement each time a student makes
a change to the alleles. Once students feel they have the correct
genotype, they click the “Check” or “Hatch” button to submit their
answer. If the dragon they create matches the target dragon, the
challenge is successfully completed. Otherwise, the game provides
the student with feedback and allows them to continue to make
further changes to the alleles until they quit or successfully
complete the challenge. In Challenge C and Challenge D, students
must sort eggs into the correct basket based on their traits. Students
can receive information about the genotype of each egg using the
scope on the right side of the screen.</p>
      <sec id="sec-4-1">
        <title>Challenge C</title>
      </sec>
      <sec id="sec-4-2">
        <title>Challenge D</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2 Dataset</title>
      <p>In this work, we analyzed data from 654 students (299 female, 305
male, and 50 unreported) from seven high schools (six public
schools, one private school) and one public middle school located
in the Middle to Northern Atlantic coast of the United States.
Among the students, 100 of them reported being in 6th to 8th grade,
544 reported being in 9th to 12th grade, and 10 students did not
report their grade level. This data was conducted during a
teacherled classroom implementation of Geniventure where students
played the game during class over the course of several days.
Before playing the game, students took a pre-test consisting of 24
questions related to the genetic concepts covered in the game. Five
of these questions assessed the genetic concepts in the four types of
challenges described earlier. Once gameplay concluded, students
took a post-test which was identical to the pre-test. Both the
pretest and post-test were online surveys accessible through the same
online portal as the game. We focus on students’ performance on
the five questions aligned to the four previously identified concepts
being examined. Results from a paired t-test on students’
knowledge pre-test (M=2.971, SD=1.52) and post-test (M=3.878,
SD=1.40) revealed a significant improvement from pre-test to
posttest (t(653) = 15.85, p &lt; 0.001, Cohen’s d = 0.621).</p>
    </sec>
    <sec id="sec-6">
      <title>4. METHODS</title>
      <p>We first describe students’ quitting behaviors that occur in the
game and then discuss our feature engineering process.</p>
    </sec>
    <sec id="sec-7">
      <title>4.1 Students’ Quitting Behaviors</title>
      <p>We define “quitting a challenge” as being anytime a student leaves
a challenge without successfully completing it. In Geniventure, a
challenge is considered as successfully completed only when the
crystal awarding screen appears. For Challenge A and Challenge B,
students will be directed to the crystal awarding screen only after
they reach a correct answer. For Challenge C and D, it will be after
they sort each of the 8 eggs into a basket no matter whether the
basket-egg match is correct or not. We identified three types of
quitting behaviors: (1) A student starts a challenge but leaves the
challenge before making any moves; (2) A student starts a
challenge, makes a few moves, but leaves the challenge prior to
submitting an answer; (3) A student starts a challenge, makes some
moves, and submits at least one wrong answer before leaving the
challenge.</p>
      <p>Out of 654 students, we removed 21 students who played less than
two challenges for our analysis, since they would not provide
sufficient details to infer students’ quitting behaviors. In the dataset,
the challenges were played 6,568 times by 633 students in total (M
= 10.38). Among all these attempts, Challenge A was played 1,983
times, Challenge B was played 2,795 times, Challenge C was
played 1,116 times, and Challenge D was played 674 times. The
overall class label distribution in the dataset is highly imbalanced,
having 17.1% quit and 82.9% completed. Table 1 shows the
summary statistics for challenges.</p>
      <p>We examined students’ trace data logs, and observed that it is
common for a student to play the same challenge repeatedly or
revisit an easier challenge after a few unsuccessful attempts on a
more difficult challenge. To represent a student’s trajectory of
playing a sequence of challenges, we use " = {&amp;, ( , … , *},
where N denotes the number of challenge attempts, and each
, (1 ≤  ≤ ) in the trajectory (") is the k-th challenge student
i played. Note that , ∈ {, , , } where A, B, C, and D are
the four types of challenges and the subscript k denotes the order
the challenges were played. The length N of the trajectory varies
between students. Likewise, there is a series of outcome labels
" = {&amp;, ( , … , *} that correspond to the trajectory of student i,
where , ∈ {, } denotes whether the student quit
or completed challenge , . Our goal is to induce predictive models
which can dynamically predict students’ quitting or completing
behavior for the next possible challenges they will play, utilizing
their observed previous gameplay trajectory (e.g., predicting E
utilizing &amp;, ( , F, &amp;, ( , F ). In other words, as soon as the
student finishes , , our model performs predictions of whether the
student will quit or complete challenge , G&amp;.</p>
    </sec>
    <sec id="sec-8">
      <title>4.2 Feature Engineering</title>
      <p>Feature engineering is a critical step in building models to predict
students’ quitting behaviors. Feature engineering converts students’
raw, low-level interaction data, and pre-learning measures into a
trainable format. In our predictive model, each student’s
problemsolving trajectory for a Challenge C is represented with a set of M
features I = {I&amp;, I( , … , IK} that captures (1) the challenge
type, (2) the student’s pre-test score on the concept knowledge, and
(3) a sequence of actions the student took while interacting with the
challenge. All of these features are designed to be generalizable to
other digital games for learning.</p>
      <p>First, one feature is created to represent the challenge, which in this
work is one of the four challenge types. Each challenge is
characterized by a different task objective, available actions, and
difficulty. The challenge type feature plays a pivotal role in
interpreting other game interaction-related features accordingly.
For example, suppose a student spent two minutes to finish a
challenge. Two minutes may suggest poor performance in
completing Challenge A, since it is a relatively easy task compared
to other challenge types; in the meantime, two minutes may indicate
good performance for Challenge C, because it often requires more
actions and thus takes longer time to finish. We use a
fourdimensional one-hot encoding feature vector to represent the four
challenge types.</p>
      <p>
        Second, the students’ pre-knowledge feature is designed as a
measure of students’ prior knowledge about the relevant genetic
concepts. Previous work has shown that students’ pre-knowledge
can serve as a significant predictive feature for student modeling
[
        <xref ref-type="bibr" rid="ref36">34</xref>
        ]. Students with a better understanding of the concepts covered
in the game are more likely to demonstrate higher performance on
this feature. We use five questions extracted from the pre-test
questionnaire, which are highly related to the genetic concepts in
those challenges we focus on in this work. For each student, we use
the ratio of correct answers as a predictive feature to represent
students’ initial knowledge.
      </p>
      <p>
        Third, we design four game performance features associated with
each challenge. Previous research suggested that demotivation and
quitting are related to students’ self-efficiency [
        <xref ref-type="bibr" rid="ref23">22</xref>
        ]. Since students’
prior poor performance may negatively impact their self-efficacy
related to genetic achievement, these poor performance could be
predictive of future quitting behaviors. Therefore, students’
ingame performance features could play an important role for
modeling students’ quitting behaviors. The four game performance
features are formulated as follows:
•
•
•
•
      </p>
      <p>Current challenge outcome: This feature represents whether
students quit the current challenge or not. We use a
twodimensional binary vector to represent it.</p>
      <p>Time spent on the current challenge: This feature captures
the duration from when students started the challenge to when
they leave it, where leaving is either quitting or finishing the
challenge. Instead of using the absolute time, we use the
Zscores of the times calculated per challenge type, which
enables the capture of challenge type-specific time
information.</p>
      <sec id="sec-8-1">
        <title>Ratio of wrong submission counts to total submission</title>
        <p>counts: This feature measures students’ overall performance
on the challenge. In contrast to Challenge A and Challenge B,
where students are allowed to make multiple submissions, in
Challenge C and Challenge D, we consider the action of
“putting an egg into a basket” as a submission (i.e., an implicit
way of submitting an answer). We calculate the feature value
based on the number of wrong egg-basket matchings divided
by the total number of egg-basket matching attempts. It is
worth noting that for those students who quit the challenge
without making any submissions (i.e., 0 divided by 0 cases),
we set the value of the feature to 1, which means maximum
error rate.</p>
        <p>Efficiency: This feature is used to measure students’
efficiency at completing the challenge. As mentioned in
Section 3.1, there is a “goal move” for each challenge which
indicates the minimum number of actions needed to
successfully solve the challenge. For Challenge A and
Challenge B, the minimum moves for a challenge is
determined depending on the generated problem. For
Challenge C and Challenge D, the minimum moves is fixed to
8, since 8 sorting actions are required per challenge. The
efficiency feature is calculated as the ratio of the “goal action”
number divided by the number of students’ actions actually
taken in cases where students completed a challenge (i.e.,
higher is better). On the other hand, when students quit a
challenge, the negative of this ratio is used. This ensures that
larger ratio values are better for this efficiency score, even for
students who quit the challenge (i.e., giving a higher penalty
to students who quit the challenge in earlier phase than
students who quit in later phase).</p>
        <p>In addition to these six features, we create one additional
explanatory variable representing the type of challenge the student
will play next. It is hypothesized that students’ quitting behaviors
will not be homogeneous across all challenge types due to
differences in the task objective, available actions, and difficulty.
As a result, it is observed from Table 1 that the percentages of
quitting class vary among challenges. In a runtime implementation
of our framework, since the next challenge type is unobservable
until the student chooses one, the predictive model can make
inferences on quitting across all available challenge types, and then
the framework can make scaffolding decisions based on
probabilities of quitting across all challenge types. In our offline
evaluation, our dataset allows us to get the next challenge type and
use it as an additional explanatory variable. We investigate whether
knowing the challenge type students will play next might serve as
a strong predictor by exploring two different feature sets: one
feature set enhanced with this next challenge information and the
other feature set without considering it. This variable is represented
as a four-dimensional one-hot encoded vector to represent the four
types of challenge.</p>
        <p>Moreover, to take advantage of students’ historical gameplay
information (Figure 2), we use n-gram models with a varying value
of n. N-gram concatenates information from the observed sequence
of n consecutive challenges (including the current and n-1 previous
challenges) students played as the final set of features for a
prediction. As discussed, for the unigram model, we use the set of
M features(I = {I&amp;, I( , … , IK}) based on the current challenge
LM to predict student’s challenge outcomes for the next challenge,
, G&amp;. For the bigram model, a variant of n-gram when n=2, 2M
predictive features ( LM and LMNO ) are used from the two
challenges ( , and , P&amp;) for predicting whether students will quit
on , G&amp; . The bigram-based concatenated features can be
represented as follows:</p>
        <p>[LMNO, LM ] = {I&amp;MNO, I(MNO, … , IKMNO, I&amp;M, I(M, … , IKM}
In more general, for the n-gram setting, M*n features are created
based on the previous n challenges, which can be represented as
[LMNSTO,… ,LM]. If the observed challenges in the trajectory are less
than n, we pad zeros for the features to represent the missing
challenges. In this study, we explore n from 1 to 3 for the n-gram
model to investigate if leveraging temporal information captured
from a time-series challenge interaction is useful in predicting
students’ quitting behaviors.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>5. EXPERIMENTS</title>
    </sec>
    <sec id="sec-10">
      <title>5.1 Machine Learning Models</title>
      <p>
        We explore a suite of machine learning algorithms for our
prediction task [
        <xref ref-type="bibr" rid="ref35">33</xref>
        ][
        <xref ref-type="bibr" rid="ref18">17</xref>
        ][
        <xref ref-type="bibr" rid="ref26">25</xref>
        ][
        <xref ref-type="bibr" rid="ref21">20</xref>
        ]. We implement our predictive
models using six machine learning algorithms, including logistic
regression (LR), decision tree (DT), naive Bayes (NB), support
vector machine (SVM), random forest (RF), and feed-forward
neural network (FFNN). To develop our models for the first five,
we use Python 3.6 with scikit-learn, a Python machine learning
library [
        <xref ref-type="bibr" rid="ref27">26</xref>
        ]. For the FFNN model, we use Keras with the
Tensorflow backend [5].
      </p>
      <p>
        Hyperparameter values for the first five machine learning
algorithms are set to the default values specified by scikit-learn,
which are briefly described below. For logistic regression, we used
L2 norm regularization with a weight of 1.0, while the optimization
method used is stochastic gradient descent. For Naive Bayes, we
assume the likelihood of our features to follow a Gaussian
distribution. For SVM models, we investigate linear SVMs with a
radial basis function (RBF) kernel and a polynomial kernel. For the
RBF-based SVMs, the regularization parameter C is set to be 1. The
gamma changes along with the feature set used, which is calculated
using 1 divided by the number of features. For example, the
dimension of unigram BFS is 10, gamma is set to 0.1 in this case.
Similarly, the dimension of bigram BFS is 20, thus gamma is set to
0.05. For the decision tree models, the scikit-learn library uses
optimized version of the CART algorithm. The splitting criterion
used is the GINI index. For the random forest models, we explored
one hyperparameter to search the optimal number of trees to be
generated in the model from {10, 50, 100}. Each decision tree in
the random forest adopts the same settings used in the independent
decision tree model. The number of features considered for splitting
a node of a subtree is set to be n, where n is the number of all
features in the original dataset. For example, for unigram BFS
feature set, n = 10 (The dimension of unigram BFS feature set is
10). The number of features considered for splitting in each tree is
3 ( √10 ≈ 3 ). Neural network hyperparameters are often
empirically determined. There are several categories of
hyperparameters to consider, including optimization (e.g.,
optimizer, learning rate), model structure (e.g., the number of
hidden units, initialized weights), and training criterion (e.g.,
regularization terms, loss function) [
        <xref ref-type="bibr" rid="ref27">26</xref>
        ]. When implementing the
FFNN models, we adopted grid-search on structure-based
hyperparameters, number of hidden layers and number of hidden
units, which has significant influence on predictive performance.
We tried the number of hidden layers from 1 to 5 and the number
of hidden units in each hidden layer from {32, 64, 128}. For other
hyperparameters, we utilize categorical cross entropy for the loss
function and Adam stochastic optimizer [
        <xref ref-type="bibr" rid="ref20">19</xref>
        ]. We use the Glorot
uniform initializer to generate initial weights [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ]. The learning rate
of training is set to 0.01 and dropout rate is set from {0.25, 0.5,
0.75}. We adopt a mini-batch gradient descent with the mini batch
size of 128 when training. We trained all of these models on our
two feature sets, feature sets without the next challenge information
and feature set with the next challenge information under different
n-gram settings. Another common model used for comparison is
majority class-based method. In our dataset, the majority class is
“completed”, which accounts for 82.90%. This majority model
predicts all students’ outcomes for the next challenge as
“completed”.
      </p>
      <p>
        We conduct student-level five-fold cross-validation to evaluate
models’ performance. In our problem, the quitting class is the
category that we are most interested in, since our goal is to identify
students at risk (i.e., students who might quit the challenge) and
support their learning. Therefore, we evaluate the model
performance with respect to recall rates of predicting the quitting
class. Recall is calculated by True Positive / (True Positive + False
Negative), which refers to the percentage of the relevant class being
correctly classified [
        <xref ref-type="bibr" rid="ref17">16</xref>
        ]. For algorithms for which we conducted
grid-search on some hyperparameters, we chose those
hyperparameters which helped algorithms achieved highest
performance. For random forest model, the number of trees is set
to 10. For FFNN model, we finally chose a 5-layer architecture with
128 hidden units with 0.25 dropout rate on each hidden layer.
Since the overall class distribution in our dataset is highly
imbalanced, predictions by machine learning models trained with
the dataset will likely be inclined towards the majority group [
        <xref ref-type="bibr" rid="ref17">16</xref>
        ].
In other words, models trained with the dataset would achieve a
high predictive accuracy by predicting the majority class label (i.e.,
completed) for most of the data instances, but would suffer from
inferring the incorrect labels for instances that belong to the
minority class label (i.e., quit). To accurately recognize the ‘quit’
class, we conduct oversampling of the instances with the minor
class label only for the training set, while the recall is measured for
the intact, original test set. In the next section, we compare the
results of models trained with the original dataset without
oversampling and the dataset with oversampling applied.
      </p>
    </sec>
    <sec id="sec-11">
      <title>5.2 Results</title>
      <p>Table 2 shows the recall rates of the models induced with two
variants of the feature set without conducting oversampling. As
clarified in the previous section, we call the feature set which
contains six features as Base Feature Set (BFS). As an alternative,
we call the feature set that includes the next challenge feature (NC)
type BFS + NC. Since the majority-class model classifies all
instances into the “completed” class, the accuracy of this model is
83.90%. However, the recall rate for quitting of the majority-class
model is 0.00%. A high accuracy value in our problem could not
reflect model’s ability to recognize quitting class, which is what we
are interested in this work. Other models are able to correctly
predict some quitting class, but the recall rates of quitting for most
of the models explored in this work are not significant. It is not
surprising because machine learning models are optimized to
minimize the loss defined in an objective function and achieve a
high predictive accuracy during training. In the experiment with the
imbalance data, naive Bayes models show the highest recall rates
outperforming other baseline models. The predictive accuracy of
the best naive Bayes model is 73.63%, which is still less than the
accuracy of the majority-class method. In addition, we find that the
models’ predictive performance for the quitting class does not
generally improve when using n-gram representations (n &gt; 1)
which include students’ historical gameplay information.
Table 3 reports the recall rates for predicting quitting of the next
challenge, after we conducted oversampling on the dataset. It
should be noted that all the experimental settings are identical,
except that the results reported in Table 3 are obtained from a
training set oversampled to have an equal distribution between the
‘quit’ class and the ‘completed’ class, while the same test set is used
in the two experimental settings. Comparing without-oversampling
to with-oversampling, nearly all models demonstrate significant
improvements with respect to the recall rates of quitting class.
Linear SVM models show an average of 57.30% improvement in
the recall rates for all the conditions in pairwise comparisons.
SVMs with the polynomial kernel and SVMs with the RBF kernel
exhibit improvements of 60.23% and 60.64% on average,
respectively. LRs, RFs and FFNNs demonstrate improvements on
average of 12.65%, 48.84% and 50.47% respectively. The only
exception is DT, which does not show improvements when
utilizing bigram and trigram features.</p>
      <p>Since most of the predictive models show improved performance
after oversampling, further discussions are made based on the
results of the models induced with the oversampled training dataset
(Table 3). The best models for predicting quitting are SVMs with
the polynomial kernel using the unigram feature set with the next
challenge (NC) information. These models achieve a 76.91% in
recall rate for quitting and an accuracy of 67.97%. SVMs with the
RBF show the highest recall rates with bigram and trigram features.
We compare the average recall rates for each feature set under the
unigram, bigram and trigram settings. Overall, SVMs with the
polynomial kernel achieves the highest recall rates (74.31% on
average). SVMs with the RBF also show relatively high
performance, which achieves 74.05%. Thus, we conclude that
SVM models are the most robust machine learning method for our
task. One distinguished advantage of SVM is the kernel tricks,
which is a technique to project our original data into another feature
space that offers enhanced capacity for models to classify data
instances. Although deep feed-forward neural networks also learn
salient features from the dataset through multi-level non-linear
transformation, the models usually require a large amount of
training data to successfully extract meaningful features. Our</p>
      <p>DT
LR
NB</p>
      <p>RF</p>
      <sec id="sec-11-1">
        <title>Linear SVM</title>
      </sec>
      <sec id="sec-11-2">
        <title>SVM_poly</title>
        <p>SVM _rbf</p>
      </sec>
      <sec id="sec-11-3">
        <title>FFNN</title>
        <sec id="sec-11-3-1">
          <title>Feature Set</title>
          <p>BFS</p>
          <p>BFS
BFS + NC
BFS + NC</p>
          <p>BFS
BFS+NC
BFS
BFS
BFS
BFS
BFS</p>
          <p>BFS
BFS + NC
BFS + NC
BFS + NC
BFS + NC
BFS + NC
0.00%
0.00%
27.47%
33.12%
11.86%
19.67%
49.84%
50.64%
22.85%
28.50%
0.00%
0.00%
6.45%
6.29%
6.37%
6.45%
12.10%
DT
LR
NB</p>
          <p>RF</p>
        </sec>
      </sec>
      <sec id="sec-11-4">
        <title>Linear SVM</title>
      </sec>
      <sec id="sec-11-5">
        <title>SVM_poly</title>
        <p>SVM_rbf</p>
      </sec>
      <sec id="sec-11-6">
        <title>FFNN</title>
        <p>dataset is not large enough for deep learning models to effectively
learn the intermediate features and achieve high performance.
Moreover, the results also support our hypothesis that the next
challenge type feature improves model performance. These results
support our hypothesis that students’ quitting behaviors are highly
influenced by the challenge type that they will choose next, rather
than being generally predictable regardless of the challenge type.
Moreover, we find that students’ historical gameplay information
obtained from the previous one or two challenges they interacted
with does not seem to improve model performance, as
demonstrated by models using the unigram features achieve the
highest recall rates. The results suggest that students’ historical
gameplay information on previous challenges before the current
induce more noise rather than adding more predictive power. This
should be an interesting area that required further investigation in
future work.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>6. CONCLUSION AND FUTURE WORK</title>
      <p>In this work we present a data-driven approach to modeling
students’ performance on challenges within an open-ended learning
environment for genetics. We build an integrated model for all
challenge levels, which can dynamically predict students’ quitting
behaviors on future challenges in their gameplay trajectory. In
practice, the learning environment could use the predicted results
from these models to decide on specific interventions to take. For
instance, if the learning environment recognizes that a student is
likely to quit at a challenge before starting, it could suggest another
challenge to smooth their learning experience. To implement the
predictive models, we engineered fine-grained features to describe
student gameplay actions from their interaction log data and
investigated the performance of different machine learning
algorithms. The results show that SVM machine learning algorithm
achieves the highest recall rate with respect to predicting students’
quitting behaviors for our problem. We also find that using the next
challenge information offers improved predictive capabilities for
the models.</p>
      <p>During our analysis of quitting behaviors in Geniventure, we
identified two key situations. One type of quitting occurs
immediately after students open a challenge, while the second
occurs after extended struggle on the challenge. For the second
type, it likely occurs because the difficulty of the challenge in the
game does not match the students’ current abilities. In this case,
students’ mastery of knowledge and their problem-solving skills in
the game provide good evidence for predicting whether students
will quit or not. However, for the first type, the reason for quitting
is harder to ascertain. It could be caused by many factors that are
not easily observable during the learning process. It could be a case
that students are gaming the system or they might just want to take
a look at challenge before deciding which challenge to play. The
occurrence of this type of quitting behavior is less related to their
in-game performance. Thus, differentiating these two types of
quitting behaviors may help improve models’ predictive
performance and their abilities to support effective interventions.
In the future, we may need to build models for more fine-grained
types of quitting behaviors. Moreover, we may need to investigate
more features that could reflect students’ affective and cognitive
states and dynamic progress of their mastery of content knowledge.
In addition, we are also interested in investigating how students’
learning gains are affected by interventions driven by our predictive
models.</p>
    </sec>
    <sec id="sec-13">
      <title>7. ACKNOWLEDGMENTS</title>
      <p>This research was supported by the National Science Foundation
under Grant DRL-1503311. Any opinions, findings, and
conclusions expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science
Foundation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Asbell-Clarke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sylvan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Working through impulse: assessment of emergent learning in a physics game</article-title>
          .
          <source>Games+ Learning+ Society 9.0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Assessing the disengaged behaviors of learners</article-title>
          .
          <source>Design Recommendations for Intelligent Tutoring Systems</source>
          .
          <volume>1</volume>
          ,
          <fpage>153</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Wheel-spinning: Students who fail to master a skill</article-title>
          .
          <source>In Proceedings of the International Conference on Artificial Intelligence in Education</source>
          . Springer, pp.
          <fpage>431</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Buffum</surname>
            ,
            <given-names>P. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frankosky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyer</surname>
            ,
            <given-names>K. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
            ,
            <given-names>E. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B. W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Collaboration and gender equity in game-based learning for middle school computer science</article-title>
          .
          <source>Computing in Science &amp; Engineering</source>
          <volume>18</volume>
          (
          <issue>2</issue>
          ),
          <fpage>18</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2015</year>
          . Keras. https://github.com/fchollet/keras D'Mello,
          <string-name>
            <given-names>S. K.</given-names>
            ,
            <surname>Mills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Bixler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , and
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Zone out no more: Mitigating mind wandering during computerized reading</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Educational Data Mining</source>
          . pp.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Eck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Van</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Digital game-based learning: It's not just the digital natives who are restless</article-title>
          .
          <source>EDUCAUSE review 41 (2)</source>
          ,
          <fpage>16</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Fredricks</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blumenfeld</surname>
            ,
            <given-names>P. C.</given-names>
          </string-name>
          , and Paris,
          <string-name>
            <surname>A. H.</surname>
          </string-name>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>School engagement: Potential of the concept, state of the evidence</article-title>
          .
          <source>Review of Educational Research</source>
          <volume>74</volume>
          (
          <issue>1</issue>
          ),
          <fpage>59</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Gee</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>What video games have to teach us about learning and literacy</article-title>
          .
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Glorot</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Understanding the difficulty of training deep feedforward neural networks</article-title>
          .
          <source>In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics</source>
          . pp.
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Halawa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and Mitchell,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Dropout prediction in MOOCs using learner activity features</article-title>
          .
          <source>In Proceedings of the Second European MOOC Stakeholder Summit</source>
          <volume>37</volume>
          (
          <issue>1</issue>
          ),
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Hamari</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shernoff</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>AsbellClarke</surname>
          </string-name>
          , J., and
          <string-name>
            <surname>Edwards</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Challenging games help students learn: An empirical study on engagement, flow and immersion in game-based learning</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>54</volume>
          ,
          <fpage>170</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bailey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubinstein</surname>
            ,
            <given-names>B. I.</given-names>
          </string-name>
          , and Zhang,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Identifying at-risk students in massive open online courses</article-title>
          .
          <source>In Proceedings of the 29th AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>1749</fpage>
          -
          <lpage>1755</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peddycord</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Building games to learn from their players: Generating hints in a serious game</article-title>
          .
          <source>In Proceedings of the 12th International Conference on Intelligent Tutoring Systems</source>
          . Springer, pp.
          <fpage>312</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Hutt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hardey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bixler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Risko</surname>
            ,
            <given-names>E. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>D'Mello</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Gaze-based detection of mind wandering during lecture viewing</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Educational Data Mining</source>
          . pp.
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jeni</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Torre</surname>
            ,
            <given-names>F. D.</given-names>
          </string-name>
          <string-name>
            <surname>La</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Facing imbalanced data--recommendations for the use of performance metrics</article-title>
          .
          <source>In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction</source>
          . pp.
          <fpage>245</fpage>
          -
          <lpage>251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Karumbaiah</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shute</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Predicting quitting in students playing a learning game</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Educational Data Mining</source>
          . pp.
          <fpage>167</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Kiili</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perttula</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuomi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lindstedt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Using video games to combine learning and assessment in mathematics education</article-title>
          .
          <source>International Journal of Serious Games</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <fpage>37</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Kloft</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiehler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pinkwart</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Predicting MOOC dropout over weeks using machine learning methods</article-title>
          .
          <source>In Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs</source>
          . pp.
          <fpage>60</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and Zhang,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>When and who at risk? Call back at these critical points</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Educational Data Mining</source>
          . pp.
          <fpage>168</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Margolis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>McCabe</surname>
            ,
            <given-names>P. P.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Self-efficacy: A key to improving the motivation of struggling learners</article-title>
          .
          <source>The Clearing House: A Journal of Educational Strategies, Issues and Ideas</source>
          <volume>77</volume>
          (
          <issue>6</issue>
          ),
          <fpage>241</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <surname>McElroy-Brown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Reichsman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Genetics with Dragons: Using an online learning environment to help students achieve a multilevel understanding of genetics</article-title>
          . Retrived from http://concord.org/.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graesser</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>D'Mello</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>To quit or not to quit: predicting future behavioral disengagement from reading patterns</article-title>
          .
          <source>In Proceedings of the 12th International Conference on Intelligent Tutoring Systems</source>
          . Springer, pp.
          <fpage>19</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frankosky</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
            ,
            <given-names>E. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyer</surname>
            ,
            <given-names>K. E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Inducing stealth assessors from game interaction data</article-title>
          .
          <source>In Proceedings of the 9th International Conference on Artificial Intelligence in Education</source>
          . Springer, pp.
          <fpage>212</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , et al.
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          .
          <volume>12</volume>
          (Oct). pp.
          <fpage>2585</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Prensky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>Digital game-based learning</article-title>
          .
          <source>Computers in Entertainment (CIE) 1</source>
          (
          <issue>1</issue>
          ),
          <fpage>21</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koedinger</surname>
            ,
            <given-names>K. R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Corbett</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          . Cognitive Tutor: Applied research in mathematics education.
          <source>Psychonomic Bulletin &amp; Review</source>
          <volume>14</volume>
          (
          <issue>2</issue>
          ),
          <fpage>249</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McQuiggan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robison</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Crystal island: A narrative-centered learning environment for eighth grade microbiology</article-title>
          .
          <source>In Workshop on Intelligent Educational Games at the 14th International Conference on Artificial Intelligence in Education</source>
          . pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shores</surname>
            ,
            <given-names>L. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B. W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Integrating learning, problem solving, and engagement in narrative-centered learning environments.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>International Journal of Artificial Intelligence in Education</source>
          <volume>21</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>115</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Shute</surname>
            ,
            <given-names>V. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Games, learning, and assessment. Assessment in game-based learning</article-title>
          . Springer, pp.
          <fpage>43</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [32]
          <string-name>
            <surname>States</surname>
            ,
            <given-names>N. L.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Next Generation Science Standards</article-title>
          . Washington.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Taylor</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veeramachaneni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Reilly</surname>
            ,
            <given-names>U. O.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Likely to stop? Predicting stopout in Massive Open Online Course</article-title>
          .
          <source>arXiv preprint arXiv:1408.3382</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>J. B.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Considering the influence of prerequisite performance on wheel spinning</article-title>
          .
          <source>In Proceedings of the 8th International Conference on Educational Data Mining</source>
          . pp.
          <fpage>125</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mott</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Simulating player behavior for Data-Driven interactive narrative personalization</article-title>
          .
          <source>In Proceedings of the Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference</source>
          . pp.
          <fpage>255</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>