<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Effect of Variations of Prior on Knowledge Tracing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matti Nelimarkka</string-name>
          <email>matti.nelimarkka@hiit</email>
          <email>matti.nelimarkka@hiit.fi</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Madeeha Ghori</string-name>
          <email>madeeha.ghori@berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical Engineering and</institution>
          ,
          <addr-line>Computer Sciences, UC Berkeley, 387 Soda Hall, Berkeley, California 94720-17761</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information, UC Berkeley</institution>
          ,
          <addr-line>102 South Hall, Berkeley, California 94720-4600</addr-line>
          ,
          <institution>Helsinki Institute for Information Technology HIIT, Aalto University</institution>
          ,
          <addr-line>PO Box 15600, Aalto, Finland 00076</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge tracing is a method which enables approximation of a student's knowledge state using a Bayesian network for approximation. As the applications of this method increase, it is vital to understand the limits of this approximation. We are interested how well knowledge tracing performs when students' prior knowledge on the topic is extremely high or low. Our results indicate that the estimates become more erroneous when prior knowledge is extremely high (prior = 0.90).</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;bayesian knowledge tracing</kwd>
        <kwd>personalization</kwd>
        <kwd>prior</kwd>
        <kwd>parameter estimation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Bayesian Knowledge-Tracing (BKT) algorithm was
developed in 1995 in an effort to model students’ changing
knowledge state during skill acquisition [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. The idea is to
interpret students’ knowledge – a hidden variable – based
on observed answers to a set of questions. The algorithm
tracks the change in this probability distribution over time
using a simple Bayes’ net. The model is often presented as
four parameters: prior, learn, guess and slip (see Figure 1).
Prior refers to the probability that the student knows the
material initially, before acquiring any skills, learn indicates
that the student did not have the skill initially but acquired
it through doing the exercise, guess refers to accidentally
answering the question correct and slip to answering
accidentally wrong.
      </p>
      <p>
        Knowledge tracing is the most prominent method used to
model student knowledge acquisition and is used in most
intelligent learning systems. These systems have been said to
be outperforming humans since 2001 [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ] and have been used
in the real world to tutor students [
        <xref ref-type="bibr" rid="ref5">4</xref>
        ]. For these reasons it is
important to fully understand the strengths and limitations
of knowledge tracing before applying it more widely in the
classroom. As the parameters of the model are now known,
there is a need to estimate these parameters from the given
data. Previous research has demonstrated that the accuracy
of parameter estimation – and therefore knowledge tracing
– can be improved by applying different heuristics [
        <xref ref-type="bibr" rid="ref14 ref18">17, 13</xref>
        ]
or methods [
        <xref ref-type="bibr" rid="ref17 ref19">16, 18</xref>
        ] including personalizing the model for
each user [
        <xref ref-type="bibr" rid="ref21 ref9">20, 8</xref>
        ] or by extending the data used for analysis
[
        <xref ref-type="bibr" rid="ref1 ref16 ref7">15, 6, 1</xref>
        ].
      </p>
      <p>Our work starts from a different premise: how robust is the
BKT approach to variation in the parameter space? Our
special interest is in the prior variable, which correlates to
a student’s knowledge of the topic before answering a
question. In any classroom, MOOC or otherwise, some students
will come in with a better understanding of the material
than others. Therefore it is important to study the
effectiveness of knowledge tracing on parameter estimation when
prior is extremely high or low.</p>
      <p>If knowledge tracing models are inaccurate in modelling
students of a certain prior parameter, then smart tutors and
other systems designed to help those students learn will be
less effective. Especially if the students being modelled
inaccurately are those students doing poorly in the class, as
the smart tutors exist to help them the most.</p>
    </sec>
    <sec id="sec-2">
      <title>2. PREVIOUS WORK</title>
      <p>For the purposes of this work, here we shortly summarize
three methods previously applied to improve the prediction
capabilities of BKT models. However, these methods are
insufficient to address the practical problem described above,
resulting in a need for our own experiment.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Individualization</title>
      <p>
        Yudelson et al. [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ] experimented with individualization by
bringing student-specific parameters into the BKT algorithm
on a larger scale. They split the usual skill-specific BKT
parameters into two components: one skill-specific and one
student-specific. They then built several individualized BKT
models and added student-specific parameters in batches,
examining the effect each addition had on the model’s
performance. They found that student-specific prior
parameters did not provide a vast improvement. However,
studentspecific learning provided a significant improvement to the
model’s prediction accuracy.
      </p>
      <p>
        Pardos and Heffernan furthered the experiment by
developing a method of formulating the individualization within the
Bayes’ Net framework [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ]. Especially interesting in terms
of our work is the difference prior values and methods
suggested for this individualization. Pardos observes that
models taking student spesific priors based on students’ prior
knowledge clearly outperform traditional knowledge trace
approach. This is a contrast Yudelson et al.’s findings [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ]
but it still underscores the importance of individualization
in the BKT algorithm.
      </p>
      <p>
        Related to individualization per user, there have been
discussion on using different values per resources. It can be
argued that different exercises teach different topics [
        <xref ref-type="bibr" rid="ref15 ref8">7, 14</xref>
        ].
This can be further used to individualize the model for
different topics, an approach which has gained initial support
on empirical studies [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Enhancing the data</title>
      <p>
        The second approach to improve these methods is related
to enhanching the data used for prediction. In its most
simple form, this can be done by adding additional relevant
data, such as data from past years, to the analysis [
        <xref ref-type="bibr" rid="ref16">15</xref>
        ].
Others have explored the possibility of adding more data to
the general domain-related knowledge on the models, and
suggest that these indeed improve the estimates [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ].
However, the current direction in enhanced data relates to
information available on user interaction – especially in MOOC
environments where it is possible to access this kind of data.
To illustrate, Baker, Corbett, and Aleven [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] explore
interactions with the learning system and other non-exercise related
data, such as time spent on answering and asking help, to
determine the difference between slips and guesses.
We applaud these efforts and acknowledge that data other
than just student responses may indeed help to detect both
the cases where initial knowledge (prior) is high and when
it is low, instead of tweaking the EM algorithm further.
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Improving the methods</title>
      <p>
        There are several heuristics currently used to enhance the
BKT algorithm. One such heuristic involves expecting the
sum of slip and guess to be less than or equal to 1 [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ]. Other
work determined that one’s starting estimated parameters
could affect where the algorithm converged to. In order to
improve the accuracy of the convergence, it was suggested
that starting parameters be selected from a Dirichlet
distrbution derived from the data set [
        <xref ref-type="bibr" rid="ref14 ref3">2, 13</xref>
        ].
      </p>
      <p>
        There have also been efforts to explore other machine
learning methods on educational data. Initial trials born in the
KDDCup competition use a medley of random forests and
other machine learning algorithms but these methods have
proven largely unsuccessful [
        <xref ref-type="bibr" rid="ref17 ref19">16, 18</xref>
        ].
      </p>
      <p>
        The knowledge tracing community, while accepting the
validity of some of these heuristics [
        <xref ref-type="bibr" rid="ref10 ref13">9, 12</xref>
        ], has criticized their
inability to provide any insight into the student learning
model. Individualization, however, has the potential to
improve the BKT algorithm while also providing a pedagogical
explanation for said improvements.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3. METHODOLOGY</title>
      <p>We began by generating datasets with specific known
initial parameters in order to simulate groups of students at
different knowledge levels. We then ran expectation
maximization (EM) on these datasets and allowed knowledge
tracing to calculate its own estimated parameters. We then
compared these estimated parameters to the original ones
used for generation to determine if the accurency of the
parameter estimation depends on the initial parameters.</p>
    </sec>
    <sec id="sec-7">
      <title>3.1 Generating the Data</title>
      <p>
        As our goal was to determine how the prior ground truth
affects parameter estimation, we varied the prior used to
synthesize the data sets. We used six different priors (0.15, 0.30, . . . ,
0.75, 0.9), and two variations on learn, slip and guess1 each
(see Table 1); total of 48 variations of these parameters.
Each of these data sets consists of 10,000 students and 20
observations per student. To increase the variation, we
generated 6 datasets per condition. This kind of simulated
approach has been previously used to evaluate the success of
Bayesian machine learning methods [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>3.2 Analysis Procedure</title>
      <p>
        For each data set, we estimated the parameters using the
expectation maximization fitting (EM) algorithm using the
fastHMM implementation [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ]. The parameter estimation
was conducted using a grid search with ten parameters, and
the best fitting model was selected using the log likelihood.
Using our 288 data sets, we can compare the estimates and
ground truths for each parameter and analyze the accuracy
of the estimates. We apply the standard methods of
rootmean-square error (RMSE) and other visualizations to do
our analysis. Using RMSE, we will be able to see if certain
ground truths lend themselves to more accurate estimations.
      </p>
    </sec>
    <sec id="sec-9">
      <title>4. RESULTS</title>
      <p>First, let us explore the parameter estimation in detail. The
avarage RMSE measurement in the data (Table 2) indicate
that the prediction quality decreases as the prior increases;
there is also increase of variance of the RMSE. This
indicates that the predictions with higher priors are first more
erronous and second, they converge in a larger area,
resulting in variance. To confirm our observations, we conducted
a Wilcox-Mann-Whitney test to explore if the computed
RMSEs differented in statistically significant manner. As
shown in Table 3, both the RMSEs computed from the data
sets with priors 0.15 and 0.90 statistically differ significantly
from the other datasets (p &lt; 0.05). Therefore we conclude
that the EM algortihm performs badly when prior is high.
To further understand this phenomena, we explore the
estimates per parameter. The errors per parameter are shown
in the Figure 3. The mean estimates are rather constantly
close by the zero, though a higher prior does affect variance.
As ground truth prior increases, the variance of guess and
learn increases while the variance of prior decreases. In
theory, a lesser variance on the prior prediction should imply
1Variations were 0.10 and 0.20 for learn and guess, and 0.05,
0.10 for slip.
a more accurate prior estimate. However, as we saw in
Table 2, this is not actually the case. The prior estimate gets
less accurate as the value of the ground truth prior increases.
In Figure 3 we can see again some of the results we saw in
Table 2: the prediction accuracy decreases when prior is 0.6
and continues to decrease as prior increases.</p>
      <p>Figure 4 shows that the log likelihood for each of the
parameter combinations we analyzed. We see a slight, but
nonsignificant increase in the log likelihoods, suggesting that
the model is performing better – even while our RMSE
error indicator demonstrates otherwise. It is also noteworthy
to observe that that when slip is 0.10, all log likelihoods
range between -65500 and -65250 but when slip is 0.05, all
log likelihoods range between -40000 and -35750,
indicating that the slip value had a dramatic effect on the model
estimation accurancy.</p>
    </sec>
    <sec id="sec-10">
      <title>5. IMPLICATIONS</title>
      <p>
        Our findings indicate that there are higher errors in the
parameter estimations when prior is high (0.90). This is
probably due to the lack of evidence available for the HMM
to attribute to the learn and guess parameters. One
approach to examine the impact of these errors is to examine
the students’ subjective experience in different conditions
[
        <xref ref-type="bibr" rid="ref20">19</xref>
        ]. As our data is syntetic, we can not measure the time
consumed by students due to errors, as examined by
Youdelson &amp; Koedinger [
        <xref ref-type="bibr" rid="ref20">19</xref>
        ]. Instead we explore the difference on
the number of questions students’ need to answer to achieve
mastery learning – for our purposes knowledge above 95 %
and assuming that the students answer each question
correctly.
      </p>
      <p>Examining the case of high prior knowledge, and when the
true learning was 0.1, we observed that majority of students
needed to answer over 5 times to achieve mastery (or: from
the 168 predicted value sets available, only 24 achieved
mastery), and for the high learning (0.2) the situation was not
significantly better – there 56 values achieved mastery with 5
responses. This indicates that the impact indeed was
significant in terms of impact to students learning and highlights
the importance of this study.</p>
    </sec>
    <sec id="sec-11">
      <title>6. CONCLUSIONS</title>
      <p>We started this study with the motivation to explore how
well the knowledge tracing method performs when the prior
is high or low; this performance has practical implications
when applying this approach in a heterogenius classroom
where students arrive with highly different knowledge of the
domain. We studied this empirically by generating 288
different synthetic datasets and explored the difference between
the predicted parameters and the parameters used to
generate the dataset.</p>
      <p>Our results indicated a slightly increased in the estimation
error when prior was 0.90, which we mostly attribute to
higher error in learn and guess parameters. This observation
was statistically significant and most likely due to the fact
that students with higher priors produce less information
to be used by the HMM to estimate the guess and learn
parameters.</p>
      <p>We explored the influence these errors had on the
propability of knowledge and observed that these errors significantly
reduced the speed students achieved mastery learning. This
result therefore implies that more work needs to be done to
detect those with high prior knowledge to cater their
learning needs.</p>
    </sec>
    <sec id="sec-12">
      <title>Acknowledgments</title>
      <p>This work was conducted during UC Berkeley School of
Information class “INFO290: Machine learning in education“
instructed by Zach Pardos. We thank the support of the
course staff and peers on the presentation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>RyanS</surname>
            .J.d. Baker, AlbertT. Corbett, and
            <given-names>Vincent</given-names>
          </string-name>
          <string-name>
            <surname>Aleven</surname>
          </string-name>
          .
          <article-title>More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing</article-title>
          .
          <source>In BeverleyP. Woolf</source>
          ,
          <string-name>
            <surname>Esma</surname>
            <given-names>AÃŕmeur</given-names>
          </string-name>
          , Roger Nkambou, and Susanne Lajoie, editors,
          <source>Intelligent Tutoring Systems</source>
          , volume
          <volume>5091</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>406</fpage>
          -
          <lpage>415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          Springer Berlin Heidelberg,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Joseph</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Beck</surname>
          </string-name>
          and
          <article-title>Kai-min Chang. Identifiability : A Fundamental Problem of Student Modeling</article-title>
          . pages
          <fpage>137</fpage>
          -
          <lpage>146</lpage>
          ,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -73078-1_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Albert</given-names>
            <surname>Corbett</surname>
          </string-name>
          .
          <article-title>Cognitive computer tutors: Solving the two-sigma problem</article-title>
          .
          <source>In User Modeling</source>
          <year>2001</year>
          , volume
          <volume>2109</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>137</fpage>
          -
          <lpage>147</lpage>
          . Springer Berlin Heidelberg,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Albert</surname>
            <given-names>Corbett</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Megan McLaughlin</surname>
            , and
            <given-names>K Christine</given-names>
          </string-name>
          <string-name>
            <surname>Scarpinatto</surname>
          </string-name>
          .
          <article-title>Modeling student knowledge: Cognitive tutors in high school and college. User modeling and user-adapted interaction</article-title>
          ,
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Albert</surname>
            <given-names>T Corbett</given-names>
          </string-name>
          and
          <string-name>
            <surname>John R Anderson.</surname>
          </string-name>
          <article-title>Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, 4(4</article-title>
          ):
          <fpage>253</fpage>
          -
          <lpage>278</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Albert</surname>
            <given-names>T Corbett</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Akshat</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          .
          <article-title>Student modeling in the act programming tutor: Adjusting a procedural learning model with declarative knowledge. COURSES AND LECTURESINTERNATIONAL CENTRE FOR MECHANICAL SCIENCES</article-title>
          , pages
          <fpage>243</fpage>
          -
          <lpage>254</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Tanja</surname>
            <given-names>KÃďser</given-names>
          </string-name>
          , Severin Klingler, AlexanderGerhard Schwing, and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Gross</surname>
          </string-name>
          .
          <article-title>Beyond knowledge tracing: Modeling skill topologies with bayesian networks</article-title>
          .
          <source>In Stefan Trausan-Matu</source>
          , KristyElizabeth Boyer, Martha Crosby, and Kitty Panourgia, editors,
          <source>Intelligent Tutoring Systems</source>
          , volume
          <volume>8474</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>188</fpage>
          -
          <lpage>198</lpage>
          . Springer International Publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z. A.</given-names>
            <surname>Pardos</surname>
          </string-name>
          and
          <string-name>
            <given-names>N. T.</given-names>
            <surname>Heffernan</surname>
          </string-name>
          .
          <article-title>Navigating the parameter space of Bayesian Knowledge Tracing models Visualizations of the convergence of the Expectation Maximization algorithm</article-title>
          .
          <source>In Proceedings of the 3rd International Conference on Educational Data Mining</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>ZA</given-names>
            <surname>Pardos and NT Heffernan</surname>
          </string-name>
          .
          <article-title>Using HMMs and bagged decision trees to leverage rich features of user and skill from an intelligent tutoring system dataset</article-title>
          .
          <source>Journal of Machine Learning Research W &amp; CP</source>
          ,
          <year>2010</year>
          . URL http://people.csail.mit.edu/zp/papers/pardos_JMLR_in_press.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.A.</given-names>
            <surname>Pardos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , and et al.
          <article-title>Scaling cognitive modeling to massive open environments</article-title>
          . TOCHI Special Issue on Learning at Scale, (in preparation).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>ZacharyA. Pardos</surname>
          </string-name>
          and
          <string-name>
            <surname>Neil</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Heffernan</surname>
          </string-name>
          .
          <article-title>Modeling individualization in a bayesian networks implementation of knowledge tracing</article-title>
          . In Paul Bra, Alfred Kobsa, and David Chin, editors,
          <source>User Modeling, Adaptation, and Personalization</source>
          , volume
          <volume>6075</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>255</fpage>
          -
          <lpage>266</lpage>
          . Springer Berlin Heidelberg,
          <year>2010</year>
          . ISBN 978-3-
          <fpage>642</fpage>
          -13469-2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Pardos</surname>
          </string-name>
          ,
          <string-name>
            <surname>Zachary</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sujith</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gowda</surname>
          </string-name>
          , Ryan S.J.d. Baker, and
          <string-name>
            <surname>Neil</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Heffernan</surname>
          </string-name>
          .
          <article-title>The sum is greater than the parts</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ):
          <fpage>37</fpage>
          , May
          <year>2012</year>
          . ISSN 19310145. doi:
          <volume>10</volume>
          .1145/2207243.2207249. URL http://dl.acm.org/citation.cfm?id=2207249 http://dl.acm.org/citation.cfm?doid=
          <volume>2207243</volume>
          .
          <fpage>2207249</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dovan</surname>
            <given-names>Rai</given-names>
          </string-name>
          , Yue Gong, and
          <string-name>
            <given-names>Joseph E</given-names>
            <surname>Beck</surname>
          </string-name>
          .
          <article-title>Using dirichlet priors to improve model parameter plausibility</article-title>
          .
          <source>International Working Group on Educational Data Mining</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Leena</surname>
            <given-names>Razzaq</given-names>
          </string-name>
          , Neil T Heffernan,
          <article-title>Mingyu Feng, and Zachary A Pardos. Developing Fine-Grained Transfer Models in the ASSISTment System</article-title>
          . Technology, Instruction,
          <source>Cognition &amp; Learning</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Steven</surname>
            <given-names>Ritter</given-names>
          </string-name>
          , Thomas K Harris, Tristan Nixon, Daniel Dickison,
          <string-name>
            <given-names>R Charles</given-names>
            <surname>Murray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Brendon</given-names>
            <surname>Towle</surname>
          </string-name>
          .
          <article-title>Reducing the knowledge tracing space</article-title>
          .
          <source>International Working Group on Educational Data Mining</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A</given-names>
            <surname>Toscher</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Jahrer</surname>
          </string-name>
          .
          <article-title>Collaborative filtering applied to educational data mining</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Brett van De Sande</surname>
          </string-name>
          .
          <article-title>Properties of the Bayesian Knowledge Tracing Model</article-title>
          .
          <source>Journal of Educational Data Mining</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Hsiang-Fu</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hung-Yi</surname>
            <given-names>Lo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsun-Ping</surname>
            <given-names>Hsieh</given-names>
          </string-name>
          , JingKai Lou, Todd G McKenzie,
          <string-name>
            <surname>Jung-Wei</surname>
            <given-names>Chou</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Po-Han</surname>
            <given-names>Chung</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chia-Hua</surname>
            <given-names>Ho</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chun-Fu</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin-Hsuan Wei</surname>
          </string-name>
          , et al.
          <article-title>Feature engineering and classifier ensemble for kdd cup 2010</article-title>
          . JMLR: Workshop and Conference Proceedings,
          <volume>1</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Michael</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Yudelson and Kenneth R Koedinger.</surname>
          </string-name>
          <article-title>Estimating the benefits of student model improvements on a substantive scale</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Educational Data Mining</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Michael</surname>
            <given-names>V Yudelson</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kenneth R Koedinger</surname>
          </string-name>
          , and
          <string-name>
            <surname>Geoffrey</surname>
          </string-name>
          J Gordon.
          <article-title>Individualized bayesian knowledge tracing models</article-title>
          .
          <source>In Artificial Intelligence in Education</source>
          , pages
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>