<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Investigating Elements of Student Persistence in an Introductory Computer Science Course</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan D. Pinto</string-name>
          <email>jdpinto2@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yingbin Zhang</string-name>
          <email>yingbin2@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luc Paquette</string-name>
          <email>lpaq@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aysa Xuemo Fan</string-name>
          <email>xuemof2@illinois.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Illinois at</institution>
          ,
          <addr-line>Urbana-Champaign</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We explore how different elements of student persistence on computer programming problems may be related to learning outcomes and inform us about which elements may distinguish between productive and unproductive persistence. We collected data from an introductory computer science course at a large midwestern university in the U.S. hosted on an open-source, problem-driven learning system. We defined a set of features quantifying various aspect of persistence during problem solving and used a predictive modeling approach to predict student scores on subsequent and related quiz questions. We focused on careful feature engineering and model interpretation to shed light on the intricacies of both productive and unproductive persistence. Feature importance was analyzed using SHapley Additive exPlanations (SHAP) values. We found that the most impactful features were persisting until solving the problem, rapid guessing, and taking a break, while those with the strongest correlation between their values and their impact on prediction were the number of submissions, total time, and (again) taking a break. This suggests that the former are important features for accurate prediction, while the latter are indicative of the differences between productive persistence and wheel spinning in a computer science context.</p>
      </abstract>
      <kwd-group>
        <kwd>Student modeling</kwd>
        <kwd>persistence</kwd>
        <kwd>modeling</kwd>
        <kwd>behavior detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Research on student modeling has identified various behaviors and
patterns related to learning outcomes and student success. One
construct has both a history of research outside of Educational Data
Mining (EDM) and is receiving renewed attention in the EDM
community. Known by the diverse names of grit [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], perseverance
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], academic tenacity [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and persistence, studies have focused
on measuring the trait, identifying when students are exhibiting it,
and quantifying its effects on various aspects of student learning.
More traditional efforts on this front have focused on measuring
persistence using questionnaires and testing its effect based on
Copyright © 2021 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
grades and test scores [
        <xref ref-type="bibr" rid="ref17 ref41 ref8">8, 17, 41</xref>
        ]. Efforts to identify persistence in
log data of game-based learning systems [
        <xref ref-type="bibr" rid="ref27 ref34 ref7">7, 27, 34</xref>
        ] or intelligent
tutoring systems (ITS) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have shown great promise. Many of
these efforts have specifically focused on improving persistence
detectors for on-the-fly student feedback systems or interventions.
One aspect of persistence that has gained interest in the EDM
community in particular is the distinction between productive and
unproductive persistence. Persistence is typically characterized by
a determination to stick with a problem for long durations despite
facing obstacles, and it has often been portrayed as a positive trait.
However, researchers have come to question this simplistic stance,
noting that there seem to be two related but opposing sides to
persistence. On one hand, persistence may produce productive
results when it leads to consistent, long-term effort [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or when
students relish the opportunity to overcome challenges [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. On the
other hand, students who are "stuck" may be better off going back
to learning more about the subject rather than continuing to spend
time working on a problem they don't yet fully understand [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In
such cases, the student’s persistence might be characterized as
unproductive.
      </p>
      <p>
        Given the opposing academic outlook of this dichotomy,
understanding what differentiates productive from unproductive
persistence is of critical importance. The latter has been termed
wheel spinning in the literature and has been defined as "a student
who spends too much time struggling to learn a topic without
achieving mastery" [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Recent research has specifically focused on
creating and improving automatic detectors of wheel spinning in
ITSs [
        <xref ref-type="bibr" rid="ref11 ref15 ref24 ref39 ref42">11, 15, 24, 39, 42</xref>
        ] and game-based learning systems [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
In the context of computer science education, [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] have suggested
that fostering grit can lead to higher retention among CS students.
Other research has identified a weak correlation between grit and
measures of academic success [
        <xref ref-type="bibr" rid="ref17 ref25 ref41">17, 25, 41</xref>
        ], especially when
focusing on one of the two main components of grit—perseverance
of effort—which most closely aligns with definitions of persistence
[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ].
      </p>
      <p>In this paper, we add to the existing literature by exploring how
different elements of persistence on computer programming
problems may contribute to learning outcomes. We defined a set of
features quantifying various aspects of persistence during problem
solving and used predictive modeling approaches to predict student
scores on subsequent and related quiz questions. We focus on
careful feature engineering and model interpretation to shed light
on the intricacies of both productive and unproductive persistence.
By investigating these constructs within a computer science course,
our study also aims to better understand their application in this
context.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Modeling Productive Persistence vs.</title>
    </sec>
    <sec id="sec-4">
      <title>Wheel Spinning</title>
      <p>
        The EDM community’s interest in persistence was sparked by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
who found that students who struggle to master a skill within a
certain timeframe are unlikely to do so at all. Besides identifying
wheel spinning and describing how it differs from productive
persistence, the same study found a clear correlation between wheel
spinning and other negative behaviors such as gaming the system
and disengagement.
      </p>
      <p>
        Subsequent studies have devised variations in criteria for
differentiating between productive persistence and wheel spinning
[
        <xref ref-type="bibr" rid="ref42">42</xref>
        ], with many models defining mastery based on the number of
correct submissions in a row and others relying heavily on the
stability of Bayesian knowledge tracing (BKT) student model
predictions [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Despite differences in operationalization,
however, predictive machine learning models have been found to
serve as successful wheel-spinning detectors. Some of the
algorithms that have been used include linear regression [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
logistic regression [
        <xref ref-type="bibr" rid="ref11 ref42">11, 42</xref>
        ], decision trees [
        <xref ref-type="bibr" rid="ref15 ref27 ref39">15, 27, 39</xref>
        ], random
forest [
        <xref ref-type="bibr" rid="ref27 ref42">27, 42</xref>
        ], and neural networks [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Most of these studies
calculated productive persistence or wheel spinning labels based
solely on the data gathered rather than relying on human observers
or coders. Two notable exceptions are [
        <xref ref-type="bibr" rid="ref24 ref27">24, 27</xref>
        ].
      </p>
      <p>
        The goal of the most recent studies has been to identify wheel
spinning in ITSs as early as possible. [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] compared different
criteria and feature sets and have shown that it is possible to make
predictions with acceptable accuracy as early as step four of a
problem. They were also surprised to find that a logistic regression
model trained on only one feature (“correct response percentage”)
resulted in prediction performance that was close to their best
models. Relying on hint requests, submission correctness, and time
per skill, [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ] concluded that models can detect students who will
wheel spin after only three questions.
      </p>
      <p>The studies mentioned thus far have focused almost exclusively on
ITSs, which are most commonly used to teach math. Detecting and
studying persistence on computer programming problems requires
first understanding how data from these tasks has been analyzed in
past studies.</p>
    </sec>
    <sec id="sec-5">
      <title>2.2 Using Action Logs to Study Programming</title>
    </sec>
    <sec id="sec-6">
      <title>Behaviors</title>
      <p>
        There is growing interest in leveraging data analytic methods to
study students’ action logs produced during programming activities
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], including to better understand the students’ programming
processes, behaviors and strategies. Log data have been used to
generate visualizations of student behaviors that can be manually
inspected to better understand their programming approach [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ],
explore how students progress through homework assignments [
        <xref ref-type="bibr" rid="ref30 ref6">6,
30</xref>
        ], understand the learning pathways of novice programmers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and analyze problem-solving behavior in a debugging game [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
Generally, two broad categories of features have been used: 1)
frequencies of behaviors and 2) similarity/distance between
programs. The first category provides aggregated information
related to the quantity of actions performed by the student. This
includes the number of blocks used in a Scratch program [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], how
often a program was compiled and how many characters it included
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the number of actions and logic primitives used [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and the
number of lines added, deleted, and modified [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] leveraged
expert judgments to identify meaningful behaviors, such as massive
deletion and replacing loops with repetitive code.
      </p>
      <p>
        Studies have also developed features to evaluate how similar or
different two computer programs are. [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] used a combination of
the differences in bag of words, abstract syntax tree (AST) edits and
similarity in calls to the application programing interface (API) to
identify similar program states. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], in addition to using this same
method, considered the frequency of changes in a student’s
program and the magnitude of those changes.
      </p>
      <p>As our goal was to focus on behaviors related to how students
approach solving a problem, rather than investigating the content
of the submitted solution, we used an approach in line with the first
category to investigate elements of student persistence in a series
of computer programming problems. This allowed us to focus
specifically on the productive and unproductive behaviors of
persistent students.</p>
    </sec>
    <sec id="sec-7">
      <title>3. METHODS</title>
    </sec>
    <sec id="sec-8">
      <title>3.1 Data Collection and Label Generation</title>
      <p>
        We collected data from an introductory computer science course at
a large midwestern university in the U.S. hosted on PrairieLearn,
an open-source, web-based problem-driven learning system [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ].
Throughout the semester, 733 students used PrairieLearn to submit
almost-daily programming homework problems, take weekly
quizzes, and complete cumulative exams. In addition, students
were free to practice past problems and questions as much as they
desired. As our work aims to investigate the relationship between
persistence during homework and subsequent assessment, we
filtered the data to focus on attempts submitted towards solving a
homework problem or a quiz question. After removing practice
submissions and other non-credit assignments, our resulting data
set consisted of 290,703 individual homework problem attempts
and 313,097 quiz question attempts.
      </p>
      <p>All homework assignments were programming problems with
checkstyle, compiler, and problem-specific tests that students’ code
had to pass to receive full credit. Students had one day to
successfully complete each homework problem. They were
allowed to submit solution attempts as often as required until they
successfully passed all the tests. After each submission, the system
ran tests to check the correctness of the solution and provided
feedback indicating mistakes. First, the system tested whether the
solution had any checkstyle and compiler errors. If such error
existed, the system showed feedback about these errors and
stopped. If there were no checkstyle or compiler errors, the system
further used several problem-specific tests to examine whether the
solution fulfilled the requirement. For example, given some random
input, would the solution generate the correct output? If not, the
system would return feedback about the problem-specific test error.
Otherwise, the solution was regarded as correct.</p>
      <p>We aggregated our dataset at the student-problem level using a
series of features specifically related to persistence. While
persistence can be studied at various grain sizes, we chose this level
due to our interest in how students tackle difficulties within a
particular programming problem. Similarly, we only kept instances
that demonstrated struggling, as defined in section 3.2.1, since
these were the cases that could elicit persistence from students.
Quizzes were conducted weekly as part of regular class activity to
assess learning and consisted of both multiple-choice questions and
programming tasks. Quizzes were made available at the end of the
week and were designed to provide early assessment related to the
content of the homework problems assigned earlier that week. We
aligned the content of each homework problem to corresponding
multiple-choice quiz questions to directly investigate the
relationship between persistence in specific homework problems
and outcome on related assessment questions. Once we had these
alignments, we calculated for each student-problem instance the
total number of points obtained on the relevant quiz questions and
the maximum possible points. Using these values, we then
calculated the point percentage as the indicator of learning. Only
quiz questions that students attempted were considered for these
calculations. After these changes and calculations, our aggregated
dataset consisted of 7,673 instances of student-problem pairs,
submitted by a total of 710 students.</p>
      <p>The resulting distribution of the score outcome variable had a
strong negative skew, with most instances accumulated at higher
scores, as shown in Figure 1. This is because students often
managed to obtain a perfect score on their aligned quiz questions.
Students were typically given two chances to select the right
answer, the second time for half credit.</p>
    </sec>
    <sec id="sec-9">
      <title>3.2 Feature Engineering</title>
      <p>
        Given our goal to study how specific behaviors might be related to
persistence, our feature engineering efforts focused on developing
features based on an underlying rationale about their relationship to
productive or unproductive persistence. Following the Carnegie
Foundation for the Advancement of Teaching’s definition of
productive persistence—“tenacity plus the use of good strategies”
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]—we sought to identify good learning strategies and habits
based on the available data. Other features were based on more
generalized applications of the aspects of unproductive persistence
that have been identified in the wheel-spinning literature. This
process resulted in a total of 12 base features. We also standardized
most of these at the problem level (by subtracting the problem’s
mean and dividing by the problem’s standard deviation) to create
an additional 10 features. The rest of this section describes each
feature and our rationale behind it.
      </p>
      <sec id="sec-9-1">
        <title>3.2.1 Struggling threshold features</title>
        <sec id="sec-9-1-1">
          <title>Whether the student went beyond a problem’s corresponding time or attempt threshold.</title>
          <p>We defined students as struggling if they worked on a programming
problem for a long time or if they submitted a high number of
solutions to a problem. We considered that students could only
show persistence in the context of problems for which they
struggled.</p>
          <p>This operationalization of struggling depends on identifying both a
time and attempt threshold, each specifically calculated for that
homework problem. Thus, once we calculated the thresholds for
each problem, we created two binary struggling threshold features:
beyond time threshold and beyond attempt threshold. We only kept
instances of students that satisfied at least one of these two criteria.
We also created two numerical features that measured a student’s
deviation from each of these thresholds. Because the thresholds
were already calculated at the problem level, standardizing these
deviation features would result in perfectly collinear features, so we
did not standardize them.</p>
          <p>For the time threshold, we used the minimum value between the
75th quantile of students’ total time on each problem and 15
minutes. We combined the 75th quantile and 15 minutes to
determine the time threshold based on several reasons. First, given
that the course is only an introductory CS course, it is reasonable
that one fourth of students struggled with difficult programming
problems. Second, the proportion of students who struggled with
unchallenging problems would be smaller. Using an absolute
threshold would be better for these cases. Third, we used 15
minutes as the absolute threshold because 57.56% of problems had
a 75th quantile of total time smaller than 15 minutes. It seems
reasonable to regard close to half of problems as unchallenging.
Given that the number of attempts is an important indicator of
persistence, many attempts on a problem might also be indicative
of struggling, even when the total time spent on the problem falls
under the time threshold. Analogous to deciding the time threshold,
we used the minimum value between the 75% quantile of the
number of attempts on a problem and 9 attempts to determine the
attempt threshold. If the 75th quantile of the number of attempts on
a problem was smaller than 9 attempts, the later became the attempt
threshold. We used 9 attempts as the absolute threshold because
56.06% of problems had a 75th quantile of the number of attempts
no more than 9 attempts. This number was close to 57.56%, the
proportion of problems with a 75th quantile of total time smaller
than 15 minutes.
3.2.2 Solved</p>
        </sec>
        <sec id="sec-9-1-2">
          <title>Whether the student successfully solved the programming problem before the deadline.</title>
          <p>
            This is directly related to wheel spinning as defined by [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]:
"problem solving without making progress towards mastery."
While PrairieLearn is not suited for measuring mastery the way
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] did with the Cognitive Algebra Tutor and ASSISTments ITSs
(three consecutive, correct responses within a specific skill),
persistence while struggling that does not lead to an eventual
correct solution can be considered a form of wheel spinning or
unproductive persistence. Based on this, we hypothesized that
solving a challenging problem (productive persistence) would lead
to a higher quiz-question score than not solving the problem (wheel
spinning).
          </p>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>3.2.3 Number of submissions</title>
        <sec id="sec-9-2-1">
          <title>The count of how many times the student submitted an attempted solution for the problem.</title>
          <p>
            This is a typical measure used in the persistence literature [
            <xref ref-type="bibr" rid="ref15 ref39 ref42">15, 39,
42</xref>
            ]. Since submissions on PrairieLearn typically end when a
student successfully solves a problem, this feature is a count of the
number of failed attempts + 1. In essence, this is one way of
measuring the level of persistence demonstrated. We reasoned that
more unsuccessful attempts would indicate more wheel spinning,
resulting in lower quiz scores.
          </p>
        </sec>
      </sec>
      <sec id="sec-9-3">
        <title>3.2.4 Total time on problem</title>
        <sec id="sec-9-3-1">
          <title>The total amount of time (in seconds) spent solving the problem.</title>
          <p>As with the number of submissions, the time that students spend on
a challenging problem might indicate the amount of persistence
being demonstrated. We again reasoned that more time (and thus
more wheel spinning) may be predictive of more struggling and
lower scores on the quiz questions.</p>
          <p>Our platform only allowed us to measure the time between
submissions, so we had no way of knowing with certainty how
much time was spent working on a problem. If the time difference
between a student’s two consecutive submissions was beyond 15
minutes, we regarded this student as being away from this problem
during that interval (see the feature taking a break below for the
choice of 15 minutes as a threshold). In these cases, we replaced
this time difference with the student’s mean time difference
between other consecutive submissions on this problem so that we
could estimate the student’s total time on the problem more
accurately.</p>
        </sec>
      </sec>
      <sec id="sec-9-4">
        <title>3.2.5 Taking a break</title>
        <sec id="sec-9-4-1">
          <title>Whether the student spent time away from the problem after passing one of the struggling thresholds.</title>
          <p>We defined taking a break as a struggling student being away from
the problem at least once. When the time between two consecutive
submissions on the same problem went beyond 15 minutes, we
regarded the student as away from the task. As discussed above, 15
minutes might be sufficient for solving unchallenging problems if
students did not struggle. Moreover, 81.57% of pairs of consecutive
submissions had a time difference less than 15 minutes. This
proportion only increased slightly to 83.77% when increasing this
threshold from 15 minutes to 1 hour. Thus, it is reasonable to use
15 minutes as the threshold for being away from the problem. Note
that if a student attempted other homework problems between two
consecutive submissions on the same problem, we regarded this
student as interleaving rather than taking a break.</p>
          <p>
            Our rationale for measuring break taking is based on the idea that a
wheel-spinning state may be overcome by time away from task.
Some of the cognitive benefits of breaks have been documented [
            <xref ref-type="bibr" rid="ref1 ref19 ref26 ref36">1,
19, 26, 36</xref>
            ] and seem to be especially impactful for intensive and
prolonged tasks. The term wheel spinning itself was coined in
reference to the imagery of a car spinning its wheels but not going
anywhere, suggesting that the indiscriminate tactic of subsequent
attempts may not always be productive. In their article defining this
new construct, [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] suggest devising ways to break up fruitless
attempts at solving problems. Our feature tries to capture students
who independently choose to break up their homework in this way.
          </p>
        </sec>
      </sec>
      <sec id="sec-9-5">
        <title>3.2.6 Interleaving</title>
        <sec id="sec-9-5-1">
          <title>Whether the student switches to a different problem for a time and then comes back to continue attempting the original problem.</title>
          <p>
            Interleaved practice, as opposed to blocked practice, refers to a
learning technique that mixes up the order of topics, lessons, or
problems presented. Studies have shown that this practice usually
improves learning outcomes [
            <xref ref-type="bibr" rid="ref32 ref38">32, 38</xref>
            ], though—to the best of our
knowledge—this has not been explored in a CS context. For the
purposes of our study, we measured interleaving as a student
attempting a problem without solving it, attempting a different
problem, and then returning to continue working on the original
problem. We reasoned that such a practice could potentially serve
to break up the monotony and potential frustration associated with
wheel spinning and thereby lead to better learning. We considered
this an alternative to taking a break and did not double count such
instances in the features.
          </p>
        </sec>
      </sec>
      <sec id="sec-9-6">
        <title>3.2.7 Rapid guessing</title>
        <sec id="sec-9-6-1">
          <title>Whether the student submitted at least three quick submissions in a row.</title>
          <p>
            Quick, consequent submissions may indicate guessing or uncritical
attempts to fix problems without much reflection. This behavior has
been associated with students trying to game the system [
            <xref ref-type="bibr" rid="ref2 ref28">2, 28</xref>
            ] and
with wheel spinning [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. Given the nature of programming tasks as
opposed to attempts in an ITS, we defined a quick submission as a
gap between attempts of less than 15 seconds. If a student’s
submission stream to a problem contains three or more consecutive
quick submissions, we labeled this student as performing rapid
guessing on the problem. We hypothesized that rapid guessing
would be associated with a lower score on related quiz questions.
          </p>
        </sec>
      </sec>
      <sec id="sec-9-7">
        <title>3.2.8 Time interval between consecutive submissions</title>
        <sec id="sec-9-7-1">
          <title>The student’s mean and standard deviation of time intervals between consecutive attempts on the problem.</title>
          <p>
            Shorter time between submissions may indicate more unproductive
attempts to push through to an answer without stopping to
think/work carefully or take breaks [
            <xref ref-type="bibr" rid="ref39">39</xref>
            ]. This is also similar to the
common practice of cramming, as opposed to the more effective
practice of spaced repetition. [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] found both the mean and standard
deviation of time differences to be about equally as predictive of
wheel spinning. We nevertheless chose to include both features in
our initial model to test this claim. We did not count cases of break
taking (intervals longer than 15 minutes) towards these features.
Because of its association with wheel spinning, we hypothesized
that we would find a positive correlation between these features and
score.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>3.3 Machine Learning and Interpretation</title>
      <p>To test the importance of our various features, we created a random
forest model using a shuffled 70/30 validation/testing split grouped
by student with 5,396 and 2,277 instances respectively. We
conducted 500 iterations of Bayesian hyperparameter optimization
on the validation set using 10-fold cross validation grouped by
student. This hyperparameter tuning was set to optimize the R2
score.</p>
      <p>
        We originally tested a wide array of models, including various
linear, tree-based, and ensemble algorithms, and we further tuned
some of the most promising ones. We found that variations of
gradient boosting models performed best. However, we chose to
focus on random forest for our feature interpretation for two
reasons: (1) the performance gained by using the best models over
random forest was negligible, and (2) random forest models have
been shown to be useful for predictions related to persistence in
other EDM research [
        <xref ref-type="bibr" rid="ref27 ref42">27, 42</xref>
        ].
      </p>
      <p>
        Once we had constructed our final model, we re-trained it on the
entire dataset in preparation for feature interpretation. For the task
of interpreting feature importance, we analyzed SHapley Additive
exPlanations (SHAP) values. SHAP is a game-theoretic approach
that calculates the effect that each value in the feature matrix has
on that instance’s prediction, relative to the mean prediction [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
That is, we can output a matrix with the same dimensions as the
features data set, each value serving as an explanation of that
feature’s effect on the prediction made for that particular instance.
These SHAP values are in the same unit as the target label—
percentage score in our case—further lending themselves for
interpretation. Though SHAP values are very resource-intensive to
fully and accurately calculate, the nature of tree-based models
makes it possible to optimize the process significantly [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
It is important to note that the mean of the SHAP values for any
feature will always be zero. This is because SHAP values are
calculated as the difference of each feature-instance from the mean
predicted score. However, by finding the mean absolute value of
the SHAP values for each feature, we can identify which features
have the strongest broad, average impact on prediction.
While mean Gini impurity has been used to interpret features in the
persistence literature [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ], and permutation feature importance is
commonly used as well, numerous studies have identified potential
issues with these approaches that can lead to misleading
interpretations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This is especially true when using highly
correlated features [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], which is the case with our data.
Furthermore, SHAP allows for investigation into the interplay
between features beyond what these other methods can do.
We conducted all our work using open-source Python packages
built on top of Scikit-learn [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. We tested and tuned a variety of
models using PyCaret [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], performed Bayesian optimization [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
with scikit-optimize [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], and investigated feature importance
using SHAP [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
    </sec>
    <sec id="sec-11">
      <title>4. RESULTS AND DISCUSSION</title>
    </sec>
    <sec id="sec-12">
      <title>4.1 Model Results and Preliminary Analysis</title>
      <p>Our tuned random forest model attained an average cross-validated
R2 of 0.133 and an average RMSE of 0.129 on the validation set.
On the held-out testing set, the resulting R2 was 0.145 and the
RMSE was 0.130. Our persistence features accounted for roughly
14% of the variation in related quiz scores.</p>
      <p>A preliminary analysis of our model uncovered certain important
patterns. For one, our least impactful features were all binary
measures—such as whether interleaving, rapid guessing, or
breaktaking were observed—whereas our top features were the
standardized measures of those binary features. Figure 2 shows the
entire set of feature rankings based on mean absolute SHAP values.
A detailed exploration of these features revealed what appears to be
an opposing impact between some binary features and their
standardized counterparts. For example, the feature solved has a
negative correlation between its values and its SHAP values
(r = -0.217, p &lt; 0.0001), whereas its standardized version,
solved_std, has a positive correlation (r = 0.33, p &lt; 0.0001).
Measuring this correlation between feature and SHAP values
allows us to better understand how the model is using the feature.
Higher correlation, and thus a stronger linear relationship, suggests
a more straight-forward interpretation for the feature’s role in the
model. While the impact of solved is very small in the overall model
(ranked 17th, mean absolute SHAP = 0.00003), solved_std is our
top feature in terms of overall impact on the predicted score (mean
absolute SHAP = 0.02129). We found this same inverted
relationship between many other impactful standardized features
and their original, binary, far less impactful counterparts.
Because we standardized features at the problem level, the
correlation between each unstandardized and corresponding
standardized feature is never quite perfect, but some do come close.
Random forest models typically do not suffer from collinear
features the way more traditional statistical regression methods do.
This is largely because of the way features are randomly sampled
for each tree. Even when both collinear features are part of the
feature subset, a decision tree will typically ignore one in favor of
the other. We suspect that much of our model’s preference for the
standardized features over unstandardized ones is the added
problem-level information they contain, which could be interpreted
as information regarding the difficulty of the problem.
However, while the predictive power of a random forest is not
affected by collinear features, model interpretability suffers, as we
found through our preliminary analysis. Given our goal of better
understanding the different aspects of persistence and their
relationships, we decided to remove the original non-standardized
features. We also removed time_threshold_deviation and
attempt_threshold_deviation, which were very highly correlated
with total_time_std and num_submissions_std respectively. We
then re-trained and re-tested our model.</p>
      <p>After removing these features, we found that our model’s average
cross-validated R2 on the validation set increased slightly, from
0.133 to 0.134, while RMSE remained constant. On the held-out
testing set, its R2 also increased, from 0.145 to 0.147, while RMSE
remained constant. We then re-trained our model on the entire
dataset in preparation for our in-depth feature analysis.</p>
    </sec>
    <sec id="sec-13">
      <title>4.2 Feature Importance and Interpretation</title>
      <sec id="sec-13-1">
        <title>4.2.1 Feature rankings</title>
        <p>Our analysis using SHAP values found that the solved_std and
rapid_guessing_std features had the biggest effect, accounting for
an average impact of 0.0215 and 0.0172 on the predicted score
respectively. The third most important feature, taking_break_std,
had an average impact less than half as strong at 0.0076. Together,
these three features account for 75% of all features’ total impact on
the predicted score. Figure 3 shows the feature rankings based on
mean absolute SHAP values, while Table 1 allows for comparison
with other methods such as Gini-impurity-based importance and
permutation importance. Rankings based on these three different
approaches yielded almost identical results with only minor
variations, strengthening the reliability of our findings.
Besides ranking the features by impact on the predicted score,
SHAP values allow us to explore the nature of that impact more
deeply, as well as the interactions between features. Figure 4 is a
beeswarm plot of SHAP values by feature with color indicating the
value of each individual instance.
To further aid our interpretation, we also explored which features
had the highest absolute correlation between their values and their
corresponding SHAP values. In essence, this correlation is a
measure of just how linear each feature’s effect is on the predicted
score. We calculated Pearson’s r for all features (see Table 1) and
found that all p values were below 0.0001, except for sd_time_diff.
Throughout this analysis, we point out when a feature’s correlation
is indicative of a linear relationship.
4.2.2 Solved
We can see (Figure 4) that the bulk of solved_std is composed of
high values (red color), indicating that most students managed to
solve most homework problems. The long positive skew suggests
that small, positive variations in this feature could potentially push
the predicted quiz score up by about 0.1. The few lower values in
this feature (blue color) are found on the left side of the plot,
suggesting that not solving the problem tended to pull the predicted
score down. Indeed, we found a moderate positive linear
relationship between solved_std and its SHAP values (r = 0.34),
further confirming our initial analysis.</p>
        <p>This confirms our hypothesis. It suggests that solving a challenging
problem (productive persistence) may be related to a better
understanding of the underlying concepts, whereas not solving the
problem (wheel spinning) suggests a lack of understanding.</p>
      </sec>
      <sec id="sec-13-2">
        <title>4.2.3 Rapid guessing</title>
        <p>
          Our models’ second most impactful feature, rapid_guessing_std, is
in many ways the opposite. Most students did not engage in rapid
guessing. Those who did, particularly on homework problems
where few others did—identified by high rapid_guessing_std, or
red color in the beeswarm plot (Figure 4)—were more generally
affected negatively in their predicted score based on this feature.
This effect can more clearly be seen when plotting the SHAP values
for the feature against the values of the feature itself (Figure 5).
This view allows us to get a better sense of how most instances with
a higher rapid_guessing_std value impact the predicted score
negatively. This aligns with our hypothesis: rapid guessing, with its
potential implications of wheel spinning [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and gaming the system
[
          <xref ref-type="bibr" rid="ref2 ref28">2, 28</xref>
          ], is indicative of lower learning outcomes.
By adding the values of our top impactful feature, solved_std, as
the color of the plot illustrated in Figure 5, we can also see an
interesting interaction between the two features. It appears that the
impact of the high rapid_guessing_std values is at least partly
dependent on solved_std—instances where the student failed to
solve the problem (in blue) were less negatively impacted by
rapid_guessing_std (as indicated by their mostly positive SHAP
values). One explanation may be that students who rely on rapid
guessing and manage to solve the problem may come away with
more misguided confidence in their mastery of the material than
those who fail to solve the problem and are thus less likely to
consider reviewing before a quiz. However, this hypothesis was not
investigated further.
        </p>
      </sec>
      <sec id="sec-13-3">
        <title>4.2.4 Taking a break</title>
        <p>
          Our model’s third most impactful feature, taking_break_std, has a
very clear pattern that is easily observable in Figure 4. Lower
feature values generally lead to a positive impact on predicted
score, whereas taking a break is more likely to have a negative
impact on score. We found a negative linear relationship between
taking_break_std and its SHAP value (Figure 6), with Pearson’s r
of -0.608. The distribution of SHAP values for this feature indicates
a potential negative impact about three times as large as the positive
one.
This result is the opposite of what we hypothesized. Since taking
breaks during a difficult task has been shown to improve cognition
[
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], we hypothesized that students who took a break while
struggling would ultimately be more productive. We specifically
marked a student as taking a break only if there was a large gap
between submissions (15 minutes) after they had passed one of the
two struggling thresholds.
        </p>
        <p>One possible explanation is that students who took a break did, in
fact, perform better than they would have otherwise. Since our
method does not directly test causation, our model may be using
this feature as a proxy for students who struggled more than others.
Another possibility is that this feature is not solely capturing
intentional break-taking, but also interruptions to students’ work,
which may serve as distractions—certainly not an ideal learning
situation. We did not calculate how many times students took a
break, only if there was at least one 15-minute gap between
submissions when struggling. Finally, because homework
problems were due at midnight on the day they became available,
students may simply not have had sufficient time for effective break
taking. Without additional information about learning context or
calculating additional features, we have no way of knowing which
of these explanations, if any, are the most likely.</p>
      </sec>
      <sec id="sec-13-4">
        <title>4.2.5 Struggling threshold features</title>
        <p>For beyond_time_threshold_std, we can see in Figure 4 that lower
values generally lead to increases in the predicted score and vice
versa. This is indicative of the underlying attribute this feature
attempts to capture—going beyond the time threshold yields
smaller (generally negative) SHAP values, whereas not going
beyond the time threshold yields larger (generally positive) values,
the exact value being heavily affected by how much other students
crossed the threshold on the same problem. For students who take
longer than the norm, this generally has a negative effect on their
score. The relationship here is moderately linear with an r of -0.349.
The fifth top feature that we identified,
beyond_attempt_threshold_std, does not have such a clear pattern.
The SHAP values seem to be widely spread irrespective of the
feature’s values. The feature’s distribution is bimodal, as is the case
with most of the features that standardize a binary variable, and we
did find a small distinction in the SHAP values between the two
modes (Figure 7). While the mean for each mode is essentially zero,
higher instances of beyond_attempt_threshold_std, which
correspond with student-problem instances that went beyond that
problem’s attempt threshold, have a moderate negative correlation
with their SHAP values (r = -0.409, p &lt; 0.0001) and lower
instances, on the other hand, have a positive correlation about
equally as strong (r = 0.382, p &lt; 0.0001). This suggests that the
impact of this feature on predicted score is highly dependent on
how much one’s status on the underlying binary variable
(beyond_attempt_threshold) varies from the norm for that given
homework problem.</p>
      </sec>
      <sec id="sec-13-5">
        <title>4.2.6 Number of submissions</title>
        <p>We found that num_submissions_std, our model’s sixth top feature
in terms of impact, has the strongest correlation between its feature
values and SHAP values (r = -0.833). This fits with our hypothesis.
The more attempts that students submit, the more likely they are to
be struggling, and the less likely they are to perform well when
tested on the same skills during their weekly quiz.</p>
      </sec>
      <sec id="sec-13-6">
        <title>4.2.7 Time features</title>
        <p>We found that our three time-related features—not including
beyond_time_threshold_std, which is of a very different nature
since its non-standardized version is a binary feature—had some of
the weakest predictive power in our model. total_time_std had a
still moderate mean absolute SHAP at 0.00257 and a very strong
correlation between its feature and SHAP values with r = -0.773.
avg_time_diff_std and sd_time_diff_std, by comparison, had a
much lower mean absolute SHAP (respectively 0.00099 and
0.00098) and no correlation.</p>
        <p>The strong, negative correlation between total_time_std and its
SHAP values mean that the model is interpreting longer time on a
problem as being related to lower learning outcomes, or at the very
least as a student struggling enough with a problem to lead to a
lower score on the weekly quiz. This latter possibility is in line with
our hypothesis and with what we found for
beyond_time_threshold_std. Interestingly, this pattern is far more
pronounced for instances that went beyond the time threshold (red
points in Figure 8), whereas the relationship is seemingly reversed
for cases where students did not go beyond the time threshold
(blue/purple points in Figure 8).
As for the two features that specifically look at time between
submissions (avg_time_diff_std and sd_time_diff_std), their
weakness both in predictive impact and correlation with SHAP
values suggest at face value that this factor has little value at
predicting learning success (or lack thereof) when students struggle
with a problem. These features’ impact may also have been affected
by the high correlation between them (r = 0.76). Similar
information may have also been captured by a combination of
beyond_time_threshold_std and beyond_attempt_threshold_std.</p>
      </sec>
      <sec id="sec-13-7">
        <title>4.2.8 Interleaving</title>
        <p>
          Finally, our model’s least impactful feature, interleaving_std, had
by far the lowest mean absolute SHAP value (0.00017) and a low
correlation between its features and its SHAP values (r = 0.126).
We originally hypothesized that this feature would play a bigger
role in predicting students’ scores, considering that the practice of
interleaving when struggling is generally considered a good
learning practice [
          <xref ref-type="bibr" rid="ref32 ref38">32, 38</xref>
          ]. However, its low impact in our model is
likely because we had so few instances of interleaving—only nine
out of 7,673 instances. Most of these nine did lead to an increase in
predicted score, but without more examples of the practice, we are
unable to make any sound conclusions regarding its role.
        </p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>4.3 Limitations</title>
      <p>Our study suffers from limitations primarily related to the
alignedquiz-question scores we calculated for each student-problem
instance. For one, the score distribution was heavily skewed due to
the abundance of almost perfect quiz scores. Additionally, while
the PrairieLearn platform allowed us to use the course’s quizzes
without requiring students to take an additional posttest, the scores
did not take into account students’ prior knowledge and skills. This
made it difficult to measure the impact of students’ productive vs.
unproductive persistence directly.</p>
      <p>These factors likely led to our model’s limited predictive
performance (R2 = 0.147 on the held-out test set). While we believe
that our final model’s performance was sufficient for our purposes
of interpreting the relationship between elements of persistence and
learning outcomes, it should be possible to create a more accurate
model without severely sacrificing interpretability.</p>
    </sec>
    <sec id="sec-15">
      <title>5. CONCLUSION</title>
      <p>The most impactful features were those related to solving the
problem, rapid guessing, and taking a break. Those with the most
straightforward linear effect were the number of submissions, total
time, and (again) taking a break. All three of the latter had a strong
negative correlation between their feature values and their impact
on prediction. In other words, more attempts, taking a longer time,
and taking a break are all correlated with lower scores on related
quiz questions. Solving the problem—our most impactful feature—
had a moderate positive correlation, highlighting the positive nature
of the relationship between successfully completing homework
problems and score on subsequent related quiz questions.
This all suggests that solving the problem and rapid guessing are
important features for accurate prediction, while the number of
submissions and total time are indicative of the differences between
productive persistence and wheel spinning in a computer science
context. Taking a break fits into both of these categories.
Perhaps most important, we were able to identify features that are
directly related to learning strategies. Our findings suggest that
students should avoid rapidly submitting subsequent programming
attempts without actively trying to address problems in their code
(rapid guessing). Taking a break may also be unproductive
behavior, though this finding may be an artifact of the specific
context in which students were able to submit homework in this
course, as well as the particular way in which we calculated this
feature. As for interleaving, its predictive strength in our model was
low, but its effects nevertheless suggest that a future investigation
should study whether it can be an effective practice when struggling
on a problem.</p>
      <p>In order to address the limitations of our study, we suggest that
future research focus on devising a more robust measure of learning
that takes into account students’ individual starting points.
Additionally, for the CS context of this study, a valid measure of
programming proficiency that considers the problem-solving
process would be superior to the quiz scores we used as proxy.</p>
    </sec>
    <sec id="sec-16">
      <title>6. ACKNOWLEDGMENTS</title>
      <p>We would like to acknowledge NSF grant #DRL-1942962 for
making this work possible.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ariga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lleras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Brief and rare mental “breaks” keep you focused: Deactivation and reactivation of task goals preempt vigilance decrements</article-title>
          .
          <source>Cognition</source>
          .
          <volume>118</volume>
          ,
          <issue>3</issue>
          (Mar.
          <year>2011</year>
          ),
          <fpage>439</fpage>
          -
          <lpage>443</lpage>
          . DOI:https://doi.org/10/b8qg78.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corbett</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koedinger</surname>
            ,
            <given-names>K.R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>A.Z.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Off-task behavior in the cognitive tutor classroom: When students “game the system</article-title>
          .
          <source>” CHI '04: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source>
          (
          <year>2004</year>
          ),
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Wheel-spinning: Students who fail to master a skill</article-title>
          .
          <source>Artificial Intelligence in Education</source>
          (Berlin, Heidelberg,
          <year>2013</year>
          ),
          <fpage>431</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Berland</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benton</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrick Smith</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Using learning analytics to understand the learning pathways of novice programmers</article-title>
          .
          <source>Journal of the Learning Sciences. 22</source>
          ,
          <issue>4</issue>
          (Oct.
          <year>2013</year>
          ),
          <fpage>564</fpage>
          -
          <lpage>599</lpage>
          . DOI:https://doi.org/10/gg7fkh.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Blikstein</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Using learning analytics to assess students' behavior in open-ended programming tasks</article-title>
          .
          <source>Proceedings of the 1st International Conference on Learning Analytics and Knowledge (Banff</source>
          , Alberta, Canada, Feb.
          <year>2011</year>
          ),
          <fpage>110</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Blikstein</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Worsley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piech</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahami</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Programming pluralism: Using learning analytics to detect patterns in the learning of computer programming</article-title>
          .
          <source>Journal of the Learning Sciences. 23</source>
          ,
          <issue>4</issue>
          (Oct.
          <year>2014</year>
          ),
          <fpage>561</fpage>
          -
          <lpage>599</lpage>
          . DOI:https://doi.org/10.1080/10508406.
          <year>2014</year>
          .
          <volume>954750</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>DiCerbo</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Game-based assessment of persistence</article-title>
          .
          <source>Journal of Educational Technology &amp; Society. 17</source>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>17</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Duckworth</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peterson</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matthews</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Grit: Perseverance and passion for long-term goals</article-title>
          .
          <source>Journal of Personality and Social Psychology</source>
          .
          <volume>92</volume>
          ,
          <issue>6</issue>
          (
          <year>2007</year>
          ),
          <fpage>1087</fpage>
          -
          <lpage>1101</lpage>
          . DOI:https://doi.org/10.1037/
          <fpage>0022</fpage>
          -
          <lpage>3514</lpage>
          .
          <year>92</year>
          .6.1087.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Dweck</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>M. and</article-title>
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Academic tenacity: Mindsets and skills that promote long-term learning</article-title>
          .
          <source>Bill &amp; Melinda Gates Foundation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fields</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quirke</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amely</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Maughan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Combining big data and thick data analyses for understanding youth learning trajectories in a summer coding camp</article-title>
          .
          <source>Proceedings of the 47th ACM Technical Symposium on Computing Science Education</source>
          (New York, NY, USA, Feb.
          <year>2016</year>
          ),
          <fpage>150</fpage>
          -
          <lpage>155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Towards detecting wheelspinning: Future failure in mastery learning</article-title>
          .
          <source>Proceedings of the Second</source>
          (
          <year>2015</year>
          )
          <article-title>ACM Conference on Learning @ Scale (Vancouver BC Canada</article-title>
          ,
          <year>Mar</year>
          .
          <year>2015</year>
          ),
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Hooker</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mentch</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Please stop permuting features: An explanation and alternatives</article-title>
          . preprint arXiv:
          <year>1905</year>
          .
          <fpage>03151</fpage>
          . (May
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ihantola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Educational data mining and learning analytics in programming: Literature review and case studies</article-title>
          .
          <source>Proceedings of the 2015 ITiCSE on Working Group Reports</source>
          (Vilnius Lithuania, Jul.
          <year>2015</year>
          ),
          <fpage>41</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Joy</surname>
            ,
            <given-names>T.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Venkatesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Hyperparameter tuning for big data using Bayesian optimisation</article-title>
          .
          <source>2016 23rd International Conference on Pattern Recognition (ICPR)</source>
          (
          <year>Dec</year>
          .
          <year>2016</year>
          ),
          <fpage>2574</fpage>
          -
          <lpage>2579</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Kai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almeda</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heffernan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Heffernan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Decision tree modeling of wheelspinning and productive persistence in skill builders</article-title>
          .
          <source>JEDM | Journal of Educational Data Mining</source>
          .
          <volume>10</volume>
          ,
          <issue>1</issue>
          (Jun.
          <year>2018</year>
          ),
          <fpage>36</fpage>
          -
          <lpage>71</lpage>
          . DOI:https://doi.org/10.5281/zenodo.3344810.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Käser</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klingler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>When to stop? Towards universal instructional policies</article-title>
          .
          <source>Proceedings of the Sixth International Conference on Learning Analytics &amp; Knowledge - LAK '16 (Edinburgh</source>
          , United Kingdom,
          <year>2016</year>
          ),
          <fpage>289</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kench</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazelhurst</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Otulaja</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Grit and growth mindset among high school students in a computer programming project: A mixed methods study</article-title>
          .
          <source>ICT Education (Cham</source>
          ,
          <year>2016</year>
          ),
          <fpage>187</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Krumm</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beattie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takahashi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D'Angelo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            , M. and Cheng,
            <given-names>B.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Practical measurement and productive persistence: Strategies for using digital learning system data to drive improvement</article-title>
          .
          <source>Journal of Learning Analytics. 3</source>
          ,
          <issue>2</issue>
          (Sep.
          <year>2016</year>
          ),
          <fpage>116</fpage>
          -
          <lpage>138</lpage>
          . DOI:https://doi.org/10/ggxwxt.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Kühnel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zacher</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bloom</surname>
            , J. de and Bledow,
            <given-names>R.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Take a break! Benefits of sleep and short breaks for daily work engagement</article-title>
          .
          <source>European Journal of Work and Organizational Psychology</source>
          .
          <volume>26</volume>
          ,
          <issue>4</issue>
          (Jul.
          <year>2017</year>
          ),
          <fpage>481</fpage>
          -
          <lpage>491</lpage>
          . DOI:https://doi.org/10/gfzk8b.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Barnes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Understanding problem solving behavior of 6-8 graders in a debugging game</article-title>
          .
          <source>Computer Science Education</source>
          .
          <volume>27</volume>
          ,
          <issue>1</issue>
          (Jan.
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          . DOI:https://doi.org/10/gftxxk.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erion</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeGrave</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prutkin</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Himmelfarb</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.-I.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>From local explanations to global understanding with explainable AI for trees</article-title>
          .
          <source>Nature Machine Intelligence. 2</source>
          ,
          <issue>1</issue>
          (Jan.
          <year>2020</year>
          ),
          <fpage>56</fpage>
          -
          <lpage>67</lpage>
          . DOI:https://doi.org/10/ggjtp4.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.-I.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A unified approach to interpreting model predictions</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          .
          <volume>30</volume>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Mahatanankoon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sikolia</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Intention to remain in a computing program: Exploring the role of passion and grit</article-title>
          .
          <source>Twenty-third Americas Conference on Information Systems</source>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Matsuda</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stamper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>How quickly can wheel spinning be detected?</article-title>
          <source>Proceedings of The 9th International Conference on Educational Data Mining (EDM</source>
          <year>2016</year>
          )
          <article-title>(</article-title>
          <year>2016</year>
          ),
          <fpage>607</fpage>
          -
          <lpage>608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>McDermott</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniels</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cajander</surname>
          </string-name>
          , Å.
          <year>2015</year>
          .
          <article-title>Perseverance measures and attainment in first year computing science students</article-title>
          .
          <source>Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education (Vilnius</source>
          , Lithuania, Jun.
          <year>2015</year>
          ),
          <fpage>302</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>McGinley</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Test performance and study breaks</article-title>
          .
          <source>Fort</source>
          Hays State University.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Owen</surname>
            ,
            <given-names>V.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , M.-H.,
          <string-name>
            <surname>Thai</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burnett</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keylor</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Detecting wheel-spinning and productive persistence in educational games</article-title>
          .
          <source>Proceedings of The 12th International Conference on Educational Data Mining (EDM</source>
          <year>2019</year>
          )
          <article-title>(Jul</article-title>
          .
          <year>2019</year>
          ),
          <fpage>378</fpage>
          -
          <lpage>383</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Paquette</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Carvalho</surname>
            ,
            <given-names>A.M.J.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Towards Understanding Expert Coding of Student Disengagement in Online Learning</article-title>
          .
          <source>Proceedings of the 36th Annual Cognitive Science Conference</source>
          (
          <year>2014</year>
          ),
          <fpage>1126</fpage>
          -
          <lpage>1131</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          .
          <volume>12</volume>
          , (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Piech</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahami</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Blikstein</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>Modeling how students learn to program</article-title>
          .
          <source>Proceedings of the 43rd ACM technical symposium on Computer Science</source>
          Education (New York, NY, USA, Feb.
          <year>2012</year>
          ),
          <fpage>153</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <article-title>PyCaret: An open source, low-code machine learning library in Python: 2020</article-title>
          . https://www.pycaret.org.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Rohrer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedrick</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Stershic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Interleaved practice improves mathematics learning</article-title>
          .
          <source>Journal of Educational Psychology</source>
          .
          <volume>107</volume>
          ,
          <issue>3</issue>
          (
          <year>2015</year>
          ),
          <fpage>900</fpage>
          -
          <lpage>908</lpage>
          . DOI:https://doi.org/10/gf7dfp.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <article-title>Scikit-optimize: Sequential model-based optimization in Python: 2020</article-title>
          . https://scikit-optimize.github.io/.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Shute</surname>
            ,
            <given-names>V.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D'Mello</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ocumpaugh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Almeda</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Modeling how incoming knowledge, persistence, affective states, and in-game progress influence student learning from an educational game</article-title>
          .
          <source>Computers &amp; Education. 86</source>
          ,
          <string-name>
            <surname>(Aug</surname>
          </string-name>
          .
          <year>2015</year>
          ),
          <fpage>224</fpage>
          -
          <lpage>235</lpage>
          . DOI:https://doi.org/10.1016/j.compedu.
          <year>2015</year>
          .
          <volume>08</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Sigurdson</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Petersen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>An exploration of grit in a CS1 context</article-title>
          .
          <source>Proceedings of the 18th Koli Calling International Conference on Computing Education Research</source>
          (Koli, Finland, Nov.
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Steinborn</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Huestegge</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A walk down the lane gives wings to your brain: Restorative benefits of rest breaks on cognition and self-control</article-title>
          .
          <source>Applied Cognitive Psychology</source>
          .
          <volume>30</volume>
          ,
          <issue>5</issue>
          (
          <year>2016</year>
          ),
          <fpage>795</fpage>
          -
          <lpage>805</lpage>
          . DOI:https://doi.org/10/ghcrj3.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Strobl</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulesteix</surname>
            ,
            <given-names>A.-L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kneib</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zeileis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Conditional variable importance for random forests</article-title>
          .
          <source>BMC Bioinformatics</source>
          .
          <volume>9</volume>
          ,
          <issue>1</issue>
          (Dec.
          <year>2008</year>
          ). DOI:https://doi.org/10/d7p3rw.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Taylor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rohrer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>The effects of interleaved practice</article-title>
          .
          <source>Applied Cognitive Psychology</source>
          .
          <volume>24</volume>
          ,
          <issue>6</issue>
          (
          <year>2010</year>
          ),
          <fpage>837</fpage>
          -
          <lpage>848</lpage>
          . DOI:https://doi.org/10/fkm7mp.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>Early detection of wheel-spinning in ASSISTments</article-title>
          .
          <source>Artificial Intelligence in Education. I.I. Bittencourt</source>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cukurova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Muldner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Luckin</surname>
          </string-name>
          , and E. Millán, eds. Springer International Publishing.
          <volume>574</volume>
          -
          <fpage>585</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>West</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zilles</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>PrairieLearn: Mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning</article-title>
          .
          <source>2015 ASEE Annual Conference and Exposition Proceedings</source>
          (Seattle, Washington, Jun.
          <year>2015</year>
          ),
          <volume>26</volume>
          .1238.
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
          <fpage>1238</fpage>
          .14.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>The role of grit in predicting student performance in introductory programming courses: An exploratory study</article-title>
          .
          <source>SAIS 2015 Proceedings. 21</source>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fancsali</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holstein</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Aleven</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Early detection of wheel spinning: Comparison across tutors, models, features, and operationalizations</article-title>
          .
          <source>Proceedings of The 12th International Conference on Educational Data Mining (EDM</source>
          <year>2019</year>
          ).
          <article-title>(</article-title>
          <year>2019</year>
          ),
          <fpage>468</fpage>
          -
          <lpage>473</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>