=Paper= {{Paper |id=Vol-1183/bkt20y_paper04 |storemode=property |title= Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing |pdfUrl=https://ceur-ws.org/Vol-1183/bkt20y_paper04.pdf |volume=Vol-1183 |dblpUrl=https://dblp.org/rec/conf/edm/HawkinsH14 }} == Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing== https://ceur-ws.org/Vol-1183/bkt20y_paper04.pdf
        Using Similarity to the Previous Problem to Improve
                   Bayesian Knowledge Tracing
          William J. Hawkins                                                                            Neil T. Heffernan
     Worcester Polytechnic Institute                                                             Worcester Polytechnic Institute
          100 Institute Road                                                                          100 Institute Road
        Worcester, MA 01609                                                                         Worcester, MA 01609
     whawkins90@gmail.com                                                                                 nth@wpi.edu



ABSTRACT                                                                level, seeing similar problems consecutively as opposed to seeing
Bayesian Knowledge Tracing (BKT) is a popular student model             dissimilar problems may have effects on guessing and slipping,
used extensively in educational research and in intelligent tutoring    two important components of BKT models. For example, if a
systems. Typically, a separate BKT model is fit per skill, but the      student does not understand the skill they are working on, seeing a
accuracy of such models is dependent upon the skill model, or           certain type of question twice or more consecutively may improve
mapping between problems and skills. It could be the case that the      their chances of “guessing” the answer using a suboptimal proce-
skill model used is too coarse-grained, causing multiple skills to      dure that would not work on other questions from the same skill.
all be considered the same skill. Additionally, even if the skill       Whether the skill model is not at the appropriate level or seeing
model is appropriate, having problems that exercise the same skill      consecutive similar questions helps students succeed without fully
but look different can have effects on student performance. There-      learning a skill, it may be important to take problem similarity
fore, this work introduces a student model based on BKT that            into account in student models. In this work, we introduce the
takes into account the similarity between the problem the student       Bayesian Knowledge Tracing – Same Template (BKT-ST) model,
is currently working on and the one they worked on just prior to        a modification of BKT that considers problem similarity. Specifi-
it. By doing this, the model can capture the effect of problem          cally, using data from the ASSISTments system [4], the model
similarity on performance, and moderately improve accuracy on           takes into account whether the problem the student is currently
skills with many dissimilar problems.                                   working on was generated from the same template as the previous
                                                                        problem.
Keywords
                                                                        The next section describes the ASSISTments system, its template
Student modeling, Bayesian Knowledge Tracing, Problem Simi-             system and the data used for this paper. Section 3 describes BKT
larity                                                                  and BKT-ST in more detail, and describes the analyses we per-
                                                                        formed on these models. The results are reported in Section 4,
1. INTRODUCTION                                                         followed by discussion and possible directions for future work in
Bayesian Knowledge Tracing (BKT) [3] is a popular student               Section 5.
model used both in research and in actual intelligent tutoring
systems. As a model that infers student knowledge, BKT has              2. TUTORING SYSTEM AND DATA
helped researchers answer questions about the effectiveness of
help within a tutor [1], the impact of “gaming the system” on           2.1 ASSISTments
learning [5], and the relationship between student knowledge and        ASSISTments [4] is a freely available web-based tutoring system
affect [9], among others. Additionally, it has been used in the         used primarily for middle and high school mathematics. In addi-
Cognitive Tutors [6] to determine which questions should be             tion to providing a way for teachers to assess their students, AS-
presented to a student, and when a student no longer needs prac-        SISTments also assists the students in a few different ways:
tice on a given skill.                                                  through the use of series of on-demand hint messages that typical-
                                                                        ly end in the answer to the question (the “bottom-out hint”),
However, BKT models are dependent upon the underlying skill             “buggy” or feedback messages that appear when the student gives
model of the system, as a separate BKT model is typically fit per       a common wrong answer, and “scaffolding” questions that break
skill. If a skill model is too coarse-grained or too fine-grained, it   the original question into smaller questions that are easier to an-
can make it more difficult for a BKT model to accurately infer          swer.
student knowledge [8].
                                                                        While teachers are free to author their own content, ASSISTments
Additionally, even when a skill model is tagged at the appropriate      provides a library of approved content, which includes problem
                                                                        sets called skill-builders, which are meant to help students prac-
                                                                        tice a particular skill. While most problem sets contain a fixed
                                                                        number of problems that must all be completed for a student to
                                                                        finish, a skill-builder is a special type of problem set that assigns
                                                                        questions in a random order and that is considered complete once
                                                                        a student answers three consecutive questions correctly on the
                                                                        same day.
While requiring students to answer three consecutive questions          responses to questions that exercise a given knowledge compo-
correctly on the same day to complete a skill-builder ensures that      nent (or “skill”).
they have some level of knowledge of the particular skill being
                                                                        Typically, a separate BKT model is fit for each skill. BKT models
exercised, it takes some students many problems to achieve this,
                                                                        assume that there are only two states a student can be in for a
meaning they may see the same problem more than once if the
                                                                        given skill: the known state or the unknown state. Using a stu-
skill-builder does not contain enough unique problems.
                                                                        dent’s performance history on a given skill, a BKT model infers
To ensure this does not happen (or at least make it highly unlike-      the probability that the student is in the known state on question t,
ly), ASSISTments has a templating system that facilitates creating      P(Kt).
large numbers of similar problems quickly. The content creator
creates a question as normal, but specifies that it is a template and   Fitting a BKT model involves estimating four probabilities:
uses variables in the problem statement and answer rather than               1.   Prior Knowledge – P(L0): the probability the student
specific values. Then, they are able to generate 10 unique prob-                  knew the skill before answering the first question
lems at a time from that template, where each problem is random-
ly populated with specific values as prescribed by the template.             2.   Learn Rate – P(T): the probability the student will know
This is especially useful for skill-builders, whose problems should               the skill on the next question, given that they do not
theoretically all exercise the same skill. Figure 1 shows an exam-                know the skill on the current question
ple of a template (a) and a problem generated from it (b).
                                                                             3.   Guess Rate – P(G): the probability the student will an-
                                                                                  swer the current question correctly despite not knowing
                                                                                  the skill
                                                                             4.   Slip Rate – P(S): the probability the student will answer
                                                                                  the current question incorrectly despite knowing the
                                                                                  skill
                                                                        Note that forgetting is typically not modeled in BKT: it is as-
                                                                        sumed that once a student learns a skill, they do not forget it. An
                                                                        example of a BKT model, represented as a static unrolled Bayesi-
                                                                        an network, is shown in Figure 2.




  Figure 1. A template (top image) and a problem generated                    Figure 2. Static unrolled representation of Bayesian
from it (bottom). The variables ‘b’ and ‘c’ in the template are            Knowledge Tracing. The Kt nodes along the top represent
     replaced by ‘8’ and ‘23’ in the generated problem.                  latent knowledge, while the Ct nodes represent performance.

2.2 Data                                                                3.2 Bayesian Knowledge Tracing – Same
In this work, we used ASSISTments skill-builder data from the
2009-2010 school year. This data set consists of 61,522 problem
                                                                        Template
attempts by 1,579 students, spread across 67 different skill-           The Bayesian Knowledge Tracing - Same Template (BKT-ST)
builders. A (student, skill-builder) pair was only included if the      model differs from the regular BKT model in one way: it takes
student attempted three or more problems on that particular skill-      into account whether the problem it’s about to predict was gener-
builder, and a skill-builder was included if it was used by at least    ated from the same template as the previous problem the student
10 students and at least one of them completed it.                      worked on. This is modeled as a binary observed variable that
                                                                        influences performance.
3. METHODS                                                              This results in six parameters to be learned per skill: the initial
In this section, we begin by describing Bayesian Knowledge Trac-        knowledge rate, the learn rate, and two sets of guess and slip rates:
ing, and then move on to our modification of it, called Bayesian        one set for when the previous problem and current problem were
Knowledge Tracing – Same Template. Finally, we describe the             generated from the same template (P(G|Same) and P(S|Same)),
analyses we performed using these two models.                           and one for when they aren’t (P(G|Different) and P(S|Different)).
                                                                        The model is shown in Figure 3.
3.1 Bayesian Knowledge Tracing
Bayesian Knowledge Tracing (BKT) [3] is a popular student
model that uses a dynamic Bayesian network to infer student
knowledge using only a student’s history of correct and incorrect
                                                                      According to these results, BKT-ST outperforms BKT in all three
                                                                      metrics. Statistical tests confirmed that these results were reliable
                                                                      (MAE: p < .0001, t(1578) = 9.939; RMSE: p < .0001, t(1578) =
                                                                      4.825; AUC: p < .0001, t(1314) = -11.095), though according to
                                                                      the values in the table, the only noticeable gain was in AUC.

                                                                      4.2 By Number of Templates
                                                                      Next, we considered how well each model did based on the num-
                                                                      ber of templates a skill-builder contained. The results are shown
                                                                      in Figure 4.

     Figure 3. Static unrolled representation of Bayesian
  Knowledge Tracing – Same Template. The only difference
  from BKT is the presence of the Dt nodes, which represent
   whether the previous question was generated by the same
                 template as the current one.

3.3 Analyses
The first analysis in this work simply considers how well the two
models fit the data compared to each other overall. This is deter-
mined by fitting separate BKT and BKT-ST models for each skill
and then predicting unseen student data using five-fold student-
level cross-validation. Then, we evaluate each model’s ability to
predict next question correctness by computing the mean absolute
error (MAE), root mean squared error (RMSE) and area under the
curve (AUC) for each student and then averaging across students
for each type of model. Finally, two-tailed paired t-tests are used
to determine the significance of the differences in the metrics.
The second analysis considers what the metrics look like for each      Figure 4. Graph of MAE, RMSE and AUC for the BKT and
model based on how many templates were used for each skill-            BKT-ST models, plotted against the number of unique tem-
builder problem set. This is done by splitting the predictions made                         plates per skill.
in the first analysis by how many templates were used in the cor-
                                                                      Interestingly, both BKT and BKT-ST decline rapidly in terms of
responding skill-builder. We did this to see when it would be
                                                                      model goodness as the number of templates per skill-builder in-
worth using BKT-ST over BKT.
                                                                      creases. This is likely the case because those with more templates
Finally we consider the parameter values learned for the BKT-ST       are more likely to have more than one skill being tested within
model to determine any effects that seeing problems generated by      them. Interestingly, although both models decline similarly in
the same template consecutively has on guessing and slipping.         terms of MAE and RMSE, BKT-ST declines at a slower rate than
The BKT and BKT-ST models used in these analyses are fit using        BKT does in terms of AUC. In fact, BKT-ST outperforms BKT in
the Expectation-Maximization (EM) algorithm in the Bayes Net          terms of AUC for every group of skills with more than one tem-
Toolbox for Matlab (BNT) [7]. The initial values given to EM for      plate. When grouping the skills by the number of templates they
BKT were 0.5 for P(L0) and 0.1 for the other three parameters.        had, BKT-ST achieved an AUC of at least 0.0236 better than
This was also true for BKT-ST, except the slip rate was set to 0.2    BKT for each group that had more than one template, and
when the current and previous problems were generated from the        achieved AUC values that were 0.1086 and 0.0980 better than
same template.                                                        BKT for skills with five and 10 templates, respectively. Addition-
                                                                      ally, while BKT performs worse than chance (AUC < 0.5) on
                                                                      skills with eight or more templates, BKT-ST never performs
4. RESULTS                                                            worse than chance.
In this section, we first present the overall comparison of BKT
and BKT-ST, then show how they compare to each other based on         4.3 Parameter Values
the number of templates used in each skill-builder. Finally, we       To analyze the parameters learned by BKT-ST, for each skill, we
examine the learned parameters for the BKT-ST model.                  took the average value of each of the six parameters learned
                                                                      across the five folds from the overall analysis.
4.1 Overall
The overall results comparing BKT to BKT-ST are shown in              First, we computed the average value of each parameter across all
Table 1.                                                              67 skills. These are shown in Table 2.

 Table 1. Overall results of fitting BKT and BKT-ST models.           Table 2. Means and standard deviations of BKT-ST parameter
                                                                                 values learned across 67 skill-builders
                    MAE               RMSE            AUC
 BKT                0.3830            0.4240          0.5909          Parameter               Mean                    SD
 BKT-ST             0.3751            0.4205          0.6314          P(L0)                   0.6030                  0.2617
                                                                      P(T)                    0.2966                  0.2500
P(G|Different)          0.1880                  0.1655                  student models to take the similarity of the problems students
                                                                        encounter into account when trying to model student knowledge.
P(S|Different)          0.2941                  0.1737
P(G|Same)               0.3337                  0.2495                  One direction for future work would be to try going back further
                                                                        in the problem sequence to see how the similarity of problems
P(S|Same)               0.1514                  0.0848                  earlier in a student’s history affects their ability to answer the
                                                                        current problem. Additionally, it would be interesting to deter-
                                                                        mine whether the effect changes in certain situations. For exam-
From the results in Table 2, it appears that on average, seeing
                                                                        ple, what is the effect of seeing two similar problems in a row,
consecutive questions generated from the same template both
                                                                        followed by one that is different from both?
increases the guess rate (p < .0001, t(66) = -4.516) and decreases
the slip rate (p < .0001, t(66) = 7.186).                               Another area of interest would be to use a model that takes prob-
Next, we examined how these parameters changed with respect to          lem similarity into account when trying to predict a longer-term
the number of templates used per skill-builder. The average values      outcome, such as wheel-spinning [2], retention and transfer, as
of the performance parameters (guess and slip rates for same and        opposed to simply predicting next question correctness.
different templates) are shown in the graph in Figure 5. The re-        Finally, applying this model and others like it to other learning
sults for skills with one template are omitted since the                environments and skill models of various grain sizes would be
P(G|Different) and P(S|Different) parameters are meaningless in         helpful for understanding when it is useful. Presumably, if a skill
such cases.                                                             model is at the appropriate grain size, the difference in predictive
                                                                        performance between BKT and BKT-ST would be reduced. The
                                                                        same would be true of systems that fall to one of two extremes:
                                                                        those whose problem sets are highly repetitive, and those whose
                                                                        problem sets have a rich variety of problems.

                                                                        6. ACKNOWLEDGMENTS
                                                                        We acknowledge funding from NSF (#1316736, 1252297,
                                                                        1109483, 1031398, 0742503), ONR's 'STEM Grand Challenges'
                                                                        and IES (# R305A120125 & R305C100024).

                                                                        7. REFERENCES
                                                                        [1] Beck, J.E., Chang, K., Mostow, J., Corbett, A. Does help
                                                                            help? Introducing the Bayesian Evaluation and Assessment
                                                                            methodology. Intelligent Tutoring Systems, Springer Berlin
                                                                            Heidelberg, 2008, 383-394.
                                                                        [2] Beck, J. E., and Gong, Y. Wheel-Spinning: Students Who
 Figure 5. Average value of each performance parameter for                  Fail to Master a Skill. In Artificial Intelligence in Education,
       the number of templates used per skill-builder.                      pp. 431-440. Springer Berlin Heidelberg, 2013.
Although there is no clear pattern for any of the four performance      [3] Corbett, A. and Anderson, J. Knowledge Tracing: Modeling
parameters shown in the graph, the average value of P(G|Same) is            the Acquisition of Procedural Knowledge. User Modeling
always higher than that of P(G|Different), and that of P(S|Same) is         and User-Adapted Interaction, 4(4), 253-278.
always lower than that of P(S|Different), with respect to the num-      [4] Feng, M., Heffernan, N.T., Koedinger, K.R. Addressing the
ber of templates used per skill. This appears to reinforce the no-          assessment challenge in an Intelligent Tutoring System that
tion that seeing consecutive problems generated from the same               tutors as it assesses. User Modeling and User-Adapted Inter-
template makes the latter easier to solve, whether this is due to the       action, 19(3), 243-266.
skill model being too coarse-grained or familiarity with a certain
type of problem within a skill inflating performance.                   [5] Gong, Y., Beck, J., Heffernan, N., Forbes-Summers, E, The
                                                                            impact of gaming (?) on learning at the fine-grained level. in
                                                                            Proceedings of the 10th International Conference on Intelli-
5. DISCUSSION AND FUTURE WORK                                               gent Tutoring Systems, (Pittsburgh, PA, 2010), Springer,
From the results in this work, it appears that modifying Bayesian           194-203.
Knowledge Tracing to take similarity between consecutive prob-
lems into account moderately improves cross-validated predictive        [6] Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M.A.
performance, especially in terms of AUC. Additionally, this work            (1997). Intelligent Tutoring Goes To School in the Big City.
showed that seeing consecutive similar problems improves stu-               International Journal of Artificial Intelligence in Education,
dent performance by both increasing the guess rate – the probabil-          8(1), 30-43.
ity of answering a question correctly despite not knowing the skill     [7] Murphy, K. The bayes net toolbox for matlab. Computing
– and decreasing the slip rate – the probability of answering a             science and statistics, 33(2), 1024-1034.
question incorrectly despite knowing the skill. Regardless of the
                                                                        [8] Pardos, Z. A., Heffernan, N. T., & Anderson, B., Heffernan,
underlying reason for this, whether it is because the skill model is
                                                                            C. L. Using Fine-Grained Skill Models to Fit Student Per-
too coarse-grained or simply that familiarity with a type of prob-
                                                                            formance with Bayesian Networks. Proceedings of the Work-
lem within a skill improves performance, it appears important for
                                                                            shop in Educational Data Mining held at the 8th Interna-
    tional Conference on Intelligent Tutoring Systems. (Taiwan,     In Lane, H.C., Yacef, K., Mostow, M., Pavlik, P. (Eds.)
    2006).                                                          AIED 2013. LNCS, vol. 7926/2013, pp.41-50. Springer-
[9] San Pedro, M., Baker, R.S.J.d, Gowda, S.M., Heffernan,          Verlag, Berlin Heidelberg.
    N.T. Towards an Understanding of Affect and Knowledge
    from Student Interaction with an Intelligent Tutoring System.