<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A New Interpretation of Knowledge Tracing Models' Predictive Performance in Terms of the Cold Start Problem</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rohini Das</string-name>
          <email>rohinidas604@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiayi Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryan S. Baker</string-name>
          <email>rybaker@upenn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Scruggs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Pennsylvania</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Previous studies on the accuracy of knowledge tracing models have typically considered the performance of all student actions. However, this practice ignores the difference between students' initial and later attempts on the same skill. To be effective for uses such as mastery learning, a knowledge tracing model should be able to infer student knowledge and performance on a skill after the student has practiced that skill a few times. However, a model's initial performance prediction - on the first attempt at a new skill - has a different meaning. It indicates how successful a model is at inferring student performance on a skill from both their performance on other skills and from the difficulty and other properties of the first item the student encounters. As such, it may be relevant to differentiate prediction in these two contexts when evaluating a knowledge tracing model. In this paper, we describe model performance at a more granular level and examine the consistency of model performance across the number of student instances on a given skill. Results from our research show that much of the difference in performance between classic algorithms such as BKT (Bayesian Knowledge Tracing) and PFA (Performance Factors Analysis), as compared to a modern algorithm such as DKVMN (Dynamic Key-Value Memory Networks), comes down to the first attempts of a skill. Model performance is much more comparable by the time the student reaches their third attempt at a skill. Thus, while there are many benefits to using contemporary knowledge tracing algorithms, they may not be as different as previously thought in terms of mastery learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Knowledge Tracing (KT), attempting to measure student
knowledge through performance during learning, is a critical
component in modern intelligent tutoring systems and adaptive
learning systems [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. These models use students’ previous
performance to predict their proficiency on latent knowledge and
infer their likelihood of success in future attempts within the
learning system.
      </p>
      <p>
        For well over a decade, Bayesian Knowledge Tracing (BKT; [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ])
was the dominant algorithm in research on knowledge tracing – it
remains the dominant algorithm in use in systems used at scale by
students today. Later on, two waves of competing algorithms
emerged – a first wave around 2010, including many
psychometrically-influenced algorithms such as Performance
Factor Analysis (PFA; [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) and a second wave in the mid-to-late
2010s based on neural networks, including Deep Knowledge
Tracing (DKT; [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) and Dynamic Key-Value Memory Networks
(DKVMN; [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]). Work over the last decade has shown that variants
of BKT and PFA that take individual differences and timing into
account perform better [
        <xref ref-type="bibr" rid="ref15 ref25 ref9">9, 15, 25</xref>
        ]. The current wave of algorithms
based on neural networks, such as DKT and DKVMN, have
reported further improvements to model fit [
        <xref ref-type="bibr" rid="ref12 ref26">12, 26</xref>
        ].
      </p>
      <p>
        The comparisons between these algorithms have generally focused
on metrics comparing overall success at predicting on later items,
within the learning system applied to held-out students. In these
comparisons, multiple large data sets are typically used, but
performance is considered evenly across the data set. However,
there are some reasons to think this may be a concerning practice.
For one thing, even though the data sets used are typically large,
these papers generally do not report if samples are large for all skills.
Coetzee [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] notes that BKT parameter estimation is more precise
for larger data sets than smaller data sets. Furthermore, Gervet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
concluded that algorithms based on logistic regression, such as PFA,
tend to underfit large datasets, while deep learning based
algorithms, like DKT, tend to overfit larger datasets.
      </p>
      <p>
        More concerningly, many data sets used in student modeling have
skills which have only been encountered once or twice by many
students, either due to stop-out [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or rarely-tagged secondary skills.
Slater and Baker [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] suggest that BKT models cannot be reliably
fit unless there is sufficiently large pool of students who have at
least three opportunities to practice each skill. As such, large
proportions of existing data sets may reflect a seeming special case.
Indeed, accurate prediction on these items likely reflects something
different than accurate prediction after a student has had more
practice. When a student has not yet worked on a skill, predicting
their performance at this point represents what is referred to as a
“cold start problem” – needing to perform well before having
sufficient data for the current student [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. It is possible that some
more recent algorithms may perform better in these situations than
earlier algorithms, either by using information from the student’s
performance on other skills or information on the difficulty or other
properties of specific items. However, this better performance may
reflect something different than the student’s knowledge of the
current skill being studied. As such, it may be meaningful to
separate out cold start situations (for a given student and skill) from
situations where the model has sufficient data to estimate the
current skill by itself, when comparing KT algorithms.
In this paper, we study how the performance of three KT algorithms
changes, depending on how much data the algorithms have on the
current student’s performance on the current skill. We compare the
classic algorithms BKT and PFA to a more recent neural
networkbased algorithm, DKVMN, using the ASSISTments 2009-2010
Skill Builder data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Within each model, the predictive
performance, determined by AUC ROC (Area Under the
ReceiverOperating Characteristic Curve) and RMSE (Root Mean Square
Error) was analyzed at students’ first through eighth encounter on
a skill, reflecting the changes in model performance as students
practice a skill more. We conclude with a discussion of the
implications of our finding, for both the evaluation and use of
knowledge tracing models.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. METHODS 2.1 Data</title>
      <p>
        In this study, we utilized the ASSISTments Skill Builder
20092010 dataset [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], using the updated version which represents an
item requiring multiple skills as a single data point [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This
specific dataset was chosen because it has clearly defined skills and
because this dataset had frequently been used to compare KT
models in previous research [
        <xref ref-type="bibr" rid="ref11 ref13 ref14 ref23 ref27">11, 13, 14, 23, 27</xref>
        ].
      </p>
      <p>
        In the data preprocessing stage, we removed items not linked to any
skill. Each student attempt was annotated with how many
opportunities to practice the relevant skill(s) the student had
encountered so far – i.e., the first instance means the learner is
encountering a skill for the first time, the eighth instance indicates
that the learner is encountering the skill for the eighth time. The
resultant data set consisted of 4,151 students who attempted 16,891
unique problems on 101 skills, resulting in 274,590 responses.
While all the skills were included in model training, only the four
most common skills are discussed below (see Table 1).
While using the ASSISTments platform, students have to
correctly answer n problems in a row to achieve mastery of a skill
(where n is set by the teacher but is usually three) and can only
then move on to another skill. Given the design of the platform’s
three-in-a-row mastery learning approach, there is a drop in
sample size as the number of instances increases (a common
pattern in adaptive learning systems). There is also attrition due to
stop-out, where students stop working on a problem set without
mastering it [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Table 1 shows that across all four skills, the
number of students encountering a specific skill n times decreased
with instance. Of the four skills, an average 20% and 45%
attrition rate is observed on the third and eighth instances,
respectively.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.2 Model Construction</title>
      <p>We constructed the following three knowledge tracing models
with the preprocessed ASSISTments 2009 dataset: BKT, PFA and
DKVMN. Each model was implemented with 5-fold student-level
cross-validation. For the cross-validation, the dataset was split
into five folds at the student level. Four folds were used to train
the model and the trained model predicted student’s performance
in the 5th fold. Each part acted as the test set once. Predictions in
the test sets were combined and used to compute AUC and RMSE
for each opportunity to practice, within each skill. For
comparability, the original skills were used to calculate
opportunities to practice rather than the new skills derived by
DKVMN. The folds were kept the same across models, reducing
the likelihood of randomly favoring one algorithm over another.
The metrics were averaged across the four skills in each instance
for each model.</p>
      <p>
        BKT and PFA predict students’ success at each attempt based on
their previous performance on the skill. When predicting a
student’s success on the first attempt of a new skill, without
having any prior data, the initial prediction made by BKT and
PFA reflect the overall student performance across the entire
(training) data set on that skill, instead of the individual student’s
knowledge level on the skill. By contrast, the deep learning model
DKVMN utilizes all of a student’s historical data and exploits the
underlying relationships between concepts. This transferability of
prediction across skills can be expected to give the algorithm an
advantage of making the initial predictions on a newly
encountered skill. In fact, [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] studied the effect of interaction
among skills in DKT, a closely-related deep learning model, and
compared it to BKT. By comparing different approaches to
leverage skill data, they concluded that DKT’s better performance
may be largely due to their use of a student’s performance on one
skill to predict performance on another skill, whereas skills are
strictly separated in BKT. PFA occupies a middle ground, as skills
do not directly influence each other, but their combinations in the
training set may influence the model parameters found during
fitting.
      </p>
      <p>The two widely studied deep learning algorithms DKT and
DKVMN utilize neural networks to discover underlying
relationships among skills and items when predicting student
performance. Because of this, both algorithms have shown
significant improvements in model fit compared to traditional
algorithms. However, DKT maps the relationships on item level
while DKVMN fits a skill model from scratch by considering the
relationship among skills and items. Given the purpose of the
study is to understand whether transferring information between
skills influences a model’s accuracy during the first few
opportunities, DKVMN is a closer comparison to BKT and PFA
within the class of deep learning based KT algorithms.</p>
      <sec id="sec-3-1">
        <title>2.2.1 Bayesian Knowledge Tracing</title>
        <p>
          Bayesian Knowledge Tracing (BKT; [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) inputs performance into
a simple Markov model that is also a Bayesian Network [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. To fit
BKT, we applied BKT-Brute Force [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to the data set with a floor
of 0.01 for all probabilities and a ceiling of 0.3 for guess and slip to
avoid model degeneracy [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The algorithm produced estimations
for guessing, slipping, initial knowledge, and learning transition
probabilities for each of the skills, which were then used to predict
the probability of success for each student on each opportunity to
practice each skill.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2.2 Performance Factors Analysis</title>
        <p>
          Performance Factors Analysis (PFA; [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]) is a model that predicts
learner performance using a logistic function that models changes
in performance through learners’ success and failures within a skill.
In this study, following the formulas in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], the basin hopping
algorithm was used to fit the model to obtain the optimal parameters.
A set of parameters for success, failure and skill difficulty was
derived for each skill, which were then used to compute the
probability P(m) that the student would perform correctly, for each
student at each opportunity to practice each skill.
2.2.3 Dynamic Key-Value Memory Networks
Developed based on neural networks, Dynamic Key-Value
Memory Networks (DKVMN; [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]) employs two matrices that
capture states and the relationships between skill and student
mastery to predict performance on items and estimate mastery on a
set of automatically-derived skills. We utilized code from Zhang et
al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] to implement the DKVMN model and used the set of
parameters that produced the optimal outcome for the
ASSISTments 2009 dataset in the study. The model outputs a
probability of success for each student at each problem.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>RESULTS</title>
    </sec>
    <sec id="sec-5">
      <title>3.1 AUC Results</title>
      <p>Table 2 summarizes the average AUC results for each of the eight
opportunities to practice each skill and the combined AUC for
opportunities three through eight in the BKT, PFA, and DKVMN
models. Additionally, the overall AUC across the first eight
opportunities is also reported for the four skills. Note that the
overall AUC only includes the targeted four skills in the first eight
attempts and therefore, should not be considered to be the overall
AUC of the algorithm across the entire data set.</p>
      <p>For the first eight instances, a general upward trend is observed in
AUC for all three models. Starting at the first instance, the AUC
value for BKT is 0.49, PFA is 0.52, and DKVMN is 0.65. At this
point, the AUC value for the DKVMN model is much greater than
that of other two models, by approximately 0.15. Compared to BKT
and PFA, DKVMN is better at making the initial prediction on the
very first time a student sees a skill. In fact, at this point, both BKT
and PFA are performing at or below chance.</p>
      <p>In the following instances, the values of BKT and PFA became
closer to the performance of DKVMN. In fact, by the fourth
instance, the models’ AUC values were fairly similar, having a
range of 0.65-0.70. From the fourth opportunity to the eighth, the
AUC values increased by 0.02 to 0.06 across skills. Performance
stayed similar between algorithms at this point, but DKVMN still
tended to achieve slightly higher performance. Across the 3rd-8th
opportunities, DKVMN averaged AUC 0.02-0.05 higher than the
other two algorithms (0.70 versus 0.68 for BKT and 0.65 for PFA).
These trends can be seen in Figures 1-3.</p>
    </sec>
    <sec id="sec-6">
      <title>3.2 RMSE Results</title>
      <p>Table 3 summarizes the average RMSE results for each opportunity
to practice the skills and the combined RMSE for the 3rd-8th
opportunities and the 1st-8th opportunities in the BKT, PFA, and
DKVMN models. Again, the RMSE reported in the table only
considers the targeted four skills in the first eight opportunities.
The RMSE demonstrates a downward trend across the first eight
opportunities in all three models. As RMSE measures the
difference between actual and predicted values, lower RMSE
values indicate more accurate predictions. In the first instance, the
RMSE value for BKT is 0.49, PFA is 0.51, and DKVMN is 0.47.
As the RMSE value for DKVMN is better than that of BKT and
PFA, similar to the AUC value, DKVMN is better able to predict
student knowledge at the first attempt (0.02 better than BKT and
0.04 better than PFA).</p>
      <p>
        In the following instances, the values of BKT and PFA became
closer to the performance of DKVMN. In fact, by the fourth
instance, the models’ RMSE values were fairly similar, having a
range of 0.43-0.46. From the fourth opportunity to the eighth, the
RMSE values in all three models roughly remained the same across
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.5
0.5
0.4
0.4
0.3
0.3
0.2
3
In this study, we examined the performance of three KT models,
BKT, PFA, and DKVMN, across students’ history of work on
specific skills, and compared how the three models differ in
predictive accuracy during the earliest and later opportunities to
practice each skill. With all eight opportunities considered together,
DKVMN outperformed BKT and PFA in both AUC and RMSE.
However, DKVMN’s better performance appears to be largely due
to its initial prediction on the first attempt on a skill, in which
DKVMN ‘s AUC was 0.16 higher than BKT and 0.13 higher than
PFA, and RMSE was 0.02-0.04 better. After the first attempt, BKT
and PFA’s predictive performance improved substantially, and
model performance became closer across the three algorithms after
the third attempt, though DKVMN remained slightly better.
The results suggest that much of the difference in performance
between these algorithms is due to DKVMN’s ability to make more
accurate initial predictions by using factors other than mastery of
the current skill, such as past performance on other skills and other
students’ performance on the same item. In other words, a
substantial amount of the difference between algorithms appears to
be due to factors other than estimating mastery of the current skill
the student is working on, from their performance on that skill. This
may be especially true in datasets where students stop-out on
specific skills [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or where the skill model is added to or modified
after the system is built. In these cases, many student/skill
combinations may only occur once or twice and having relatively
higher performance on the first attempt will inflate AUC and
RMSE values for models such as DKVMN. This raises the question
of what the application is for having better knowledge prediction at
the first time when a student sees a new skill. This type of
improvement in prediction may be useful to systems that decide
which skill a student should work on next (i.e., [
        <xref ref-type="bibr" rid="ref28 ref6">6, 28</xref>
        ]) but less
useful in systems that have a predefined order of skills for the
student to work on (i.e. [
        <xref ref-type="bibr" rid="ref5 ref8">5, 8</xref>
        ]) and the student does not move on
until they have demonstrated mastery on the current skill.
Given the difference in predictive performance between situations,
it may be appropriate to separate cold start situations (for a given
student and skill) from situations where the model has sufficient
data to estimate the current skill by itself when comparing KT
algorithms. Specifically, we propose that the calculation of
predictive metrics should separate the predictions on the initial two
opportunities to practice each skill from the rest. Adopting this
approach will increase our ability to interpret the difference
between algorithms and understand how much better a specific
algorithm will be for specific use cases.
      </p>
      <p>
        Two limitations to the current analyses can be addressed in future
work. First, our recommendations may not be meaningful for all
learning systems where contemporary KT is used. In specific, some
systems may not have skill models at all, and may never intend to
make inferences at the level of interpretable skills. Although these
systems typically use an entirely different family of KT models (i.e.
[
        <xref ref-type="bibr" rid="ref16 ref21">16, 21</xref>
        ]), our recommendations would not be relevant in these
cases. Second, we have only investigated these issues in the context
of a single system and a set of skills for which there is extensive
data, and for three algorithms; the generalizability of the findings
presented here should be further investigated, using data from other
learning systems where, for instance, the granularity of the skills
differs. However, only a limited effort is needed to separate practice
on early learning opportunities from later learning opportunities
when calculating model AUC/RMSE. Therefore, it may be
warranted to adopt this approach and see whether practical
differences are found for other contexts and algorithms as well.
      </p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION AND DISCUSSION</title>
      <p>In the last few years, there has been an explosion of interest in new
variants to knowledge tracing that achieve higher predictive
performance using neural networks. However, this work has
generally not yet explored where and when these algorithms
perform better, and what the implications are for using these models
in practice. More specifically, previous practices have averaged
predictions across students’ entire learning history, ignoring the
difference between the earliest work and later work on a skill.
Overall, we find initial evidence that one key factor leading to
better performance for DKVMN compared to earlier algorithms is
its performance in situations before a student has had a significant
opportunity to work on a skill. This result leads to
recommendations in how to better evaluate KT algorithms and
suggests that the benefits of this algorithm may be greater for some
applications (deciding which skill a student should work on next)
than others (deciding if a student has reached mastery in the current
skill they are working on). From the results of this study, future
studies conducting research involving KT models may find it useful
to calculate performance separately for a student’s initial
performance and their later performance on a skill; this would
provide researchers with more information on how their models are
working, and where their greatest benefits and potential are.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.J.</given-names>
          </string-name>
          <string-name>
            <surname>d</surname>
          </string-name>
          . et al.
          <year>2010</year>
          .
          <article-title>Contextual Slip and Prediction of Student Performance after Use of an Intelligent Tutor</article-title>
          . User Modeling, Adaptation, and
          <string-name>
            <surname>Personalization</surname>
          </string-name>
          (Berlin, Heidelberg,
          <year>2010</year>
          ),
          <fpage>52</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.J.D.</given-names>
          </string-name>
          et al.
          <year>2008</year>
          .
          <article-title>More Accurate Student Modeling through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing</article-title>
          .
          <source>Intelligent Tutoring Systems</source>
          (Berlin, Heidelberg,
          <year>2008</year>
          ),
          <fpage>406</fpage>
          -
          <lpage>415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Botelho</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          et al.
          <year>2019</year>
          .
          <article-title>Refusing to try: Characterizing early stopout on student assignments</article-title>
          .
          <source>Proceedings of the 9th International Conference on Learning Analytics &amp; Knowledge</source>
          (New York, NY, USA, Mar.
          <year>2019</year>
          ),
          <fpage>391</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Coetzee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Choosing sample size for knowledge tracing models</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          (
          <year>2014</year>
          ),
          <fpage>117</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Corbett</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Knowledge Tracing : Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction</article-title>
          . (
          <year>1995</year>
          ),
          <fpage>253</fpage>
          -
          <lpage>278</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Craig</surname>
            ,
            <given-names>S.D.</given-names>
          </string-name>
          et al.
          <year>2013</year>
          .
          <article-title>The impact of a technology-based mathematics after-school program using ALEKS on student's knowledge and behaviors</article-title>
          .
          <source>Computers and Education</source>
          .
          <volume>68</volume>
          , (
          <year>2013</year>
          ),
          <fpage>495</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>2009</year>
          .
          <article-title>Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction</article-title>
          .
          <volume>19</volume>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>243</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>2009</year>
          .
          <article-title>Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction</article-title>
          .
          <volume>19</volume>
          ,
          <issue>3</issue>
          (Aug.
          <year>2009</year>
          ),
          <fpage>243</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Galyardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goldin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Move your lamp post: Recent data reflects learner knowledge better than older data</article-title>
          .
          <source>Journal of Educational Data Mining. 7</source>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>83</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Gervet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          et al.
          <year>2020</year>
          .
          <article-title>When is Deep Learning the Best Approach to Knowledge Tracing? Journal of Educational Data Mining</article-title>
          .
          <volume>12</volume>
          ,
          <issue>3</issue>
          (
          <year>2020</year>
          ),
          <fpage>31</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <year>2010</year>
          .
          <article-title>Comparing Knowledge Tracing and Performance Factor Analysis by Using Multiple Model Fitting Procedures</article-title>
          .
          <source>Intelligent Tutoring Systems</source>
          (Berlin, Heidelberg,
          <year>2010</year>
          ),
          <fpage>35</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Khajah</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.
          <year>2016</year>
          .
          <article-title>How deep is knowledge tracing?</article-title>
          <source>Proceedings of the 9th International Conference on Educational Data Mining</source>
          ,
          <string-name>
            <surname>EDM</surname>
          </string-name>
          <year>2016</year>
          (
          <year>2016</year>
          ),
          <fpage>94</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Minn</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          et al.
          <year>2018</year>
          .
          <article-title>Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing. 2018 IEEE International Conference on Data Mining (ICDM) (Singapore</article-title>
          , Nov.
          <year>2018</year>
          ),
          <fpage>1182</fpage>
          -
          <lpage>1187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Montero</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          et al.
          <year>2018</year>
          .
          <article-title>Does deep knowledge tracing model interactions among skills?</article-title>
          <source>Proceedings of the 11th International Conference on Educational Data Mining</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Pardos</surname>
            ,
            <given-names>Z.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Heffernan</surname>
            ,
            <given-names>N.T.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing</article-title>
          . In International Conference on User Modeling, Adaptation, and
          <string-name>
            <surname>Personalization</surname>
          </string-name>
          (
          <year>2010</year>
          ),
          <fpage>255</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pavlik</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          et al.
          <year>2008</year>
          .
          <article-title>Using Optimally Selected Drill Practice to Train Basic Facts</article-title>
          .
          <source>Intelligent Tutoring Systems</source>
          (Berlin, Heidelberg,
          <year>2008</year>
          ),
          <fpage>593</fpage>
          -
          <lpage>602</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pavlik</surname>
            ,
            <given-names>P.I.</given-names>
          </string-name>
          et al.
          <year>2009</year>
          .
          <article-title>Performance Factors Analysis - A New Alternative to Knowledge Tracing</article-title>
          .
          <source>Proceedings of the 14th International Conference on Artificial Intelligence in Education (Brighton</source>
          , England,
          <year>2009</year>
          ),
          <fpage>531</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Pelánek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Modeling and User-Adapted Interaction</article-title>
          .
          <volume>27</volume>
          ,
          <issue>3</issue>
          -
          <fpage>5</fpage>
          (
          <year>2017</year>
          ),
          <fpage>313</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Piech</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Deep knowledge tracing</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          . 2015-Janua, (
          <year>2015</year>
          ),
          <fpage>505</fpage>
          -
          <lpage>513</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Reye</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Student Modelling based on Belief Networks</article-title>
          .
          <source>International Journal of Artificial Intelligence in Education</source>
          ,.
          <volume>14</volume>
          , (
          <issue>1</issue>
          ) (
          <year>2004</year>
          ),
          <fpage>63</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Meeder</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A trainable spaced repetition model for language learning. 54th Annual Meeting of the Association for Computational Linguistics</article-title>
          ,
          <source>ACL 2016 - Long Papers. 4</source>
          , (
          <year>2016</year>
          ),
          <fpage>1848</fpage>
          -
          <lpage>1858</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Slater</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Degree of error in Bayesian knowledge tracing estimates from differences in sample sizes</article-title>
          .
          <source>Behaviormetrika</source>
          .
          <volume>45</volume>
          ,
          <issue>2</issue>
          (Oct.
          <year>2018</year>
          ),
          <fpage>475</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          et al.
          <year>2016</year>
          .
          <article-title>Going Deeper with Deep Knowledge Tracing</article-title>
          .
          <source>Proceedings of the 9th International Conference on Educational Data Mining</source>
          . (
          <year>2016</year>
          ),
          <fpage>545</fpage>
          -
          <lpage>550</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , T.-Y. et al.
          <year>2019</year>
          .
          <article-title>Active Learning for Student Affect Detection</article-title>
          .
          <source>Proceedings of The 12th International Conference on Educational Data Mining</source>
          (
          <year>2019</year>
          ),
          <fpage>208</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Yudelson</surname>
            ,
            <given-names>M. V.</given-names>
          </string-name>
          et al.
          <year>2013</year>
          .
          <article-title>Individualized bayesian knowledge tracing models</article-title>
          .
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>
          . 7926
          <string-name>
            <surname>LNAI</surname>
          </string-name>
          , (
          <year>2013</year>
          ),
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <year>2017</year>
          .
          <article-title>Dynamic key-value memory networks for knowledge tracing</article-title>
          .
          <source>26th International World Wide Web Conference</source>
          ,
          <string-name>
            <surname>WWW</surname>
          </string-name>
          <year>2017</year>
          (
          <year>2017</year>
          ),
          <fpage>765</fpage>
          -
          <lpage>774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          et al.
          <year>2017</year>
          .
          <article-title>Dynamic Key-Value Memory Networks for Knowledge Tracing</article-title>
          .
          <source>Proceedings of the 26th International Conference on World Wide Web (Perth Australia, Apr</source>
          .
          <year>2017</year>
          ),
          <fpage>765</fpage>
          -
          <lpage>774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          et al.
          <year>2019</year>
          .
          <article-title>Towards Helping Teachers Select Optimal Content for Students</article-title>
          .
          <source>International Conference on Artificial Intelligence in Education (Cham</source>
          ,
          <year>2019</year>
          ),
          <fpage>413</fpage>
          -
          <lpage>417</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>