=Paper= {{Paper |id=Vol-1183/bkt20y_paper04 |storemode=property |title= Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing |pdfUrl=https://ceur-ws.org/Vol-1183/bkt20y_paper04.pdf |volume=Vol-1183 |dblpUrl=https://dblp.org/rec/conf/edm/HawkinsH14 }} == Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing== https://ceur-ws.org/Vol-1183/bkt20y_paper04.pdf

Using Similarity to the Previous Problem to Improve
Bayesian Knowledge Tracing
William J. Hawkins Neil T. Heffernan
Worcester Polytechnic Institute Worcester Polytechnic Institute
100 Institute Road 100 Institute Road
Worcester, MA 01609 Worcester, MA 01609
whawkins90@gmail.com nth@wpi.edu

ABSTRACT level, seeing similar problems consecutively as opposed to seeing
Bayesian Knowledge Tracing (BKT) is a popular student model dissimilar problems may have effects on guessing and slipping,
used extensively in educational research and in intelligent tutoring two important components of BKT models. For example, if a
systems. Typically, a separate BKT model is fit per skill, but the student does not understand the skill they are working on, seeing a
accuracy of such models is dependent upon the skill model, or certain type of question twice or more consecutively may improve
mapping between problems and skills. It could be the case that the their chances of “guessing” the answer using a suboptimal proce-
skill model used is too coarse-grained, causing multiple skills to dure that would not work on other questions from the same skill.
all be considered the same skill. Additionally, even if the skill Whether the skill model is not at the appropriate level or seeing
model is appropriate, having problems that exercise the same skill consecutive similar questions helps students succeed without fully
but look different can have effects on student performance. There- learning a skill, it may be important to take problem similarity
fore, this work introduces a student model based on BKT that into account in student models. In this work, we introduce the
takes into account the similarity between the problem the student Bayesian Knowledge Tracing – Same Template (BKT-ST) model,
is currently working on and the one they worked on just prior to a modification of BKT that considers problem similarity. Specifi-
it. By doing this, the model can capture the effect of problem cally, using data from the ASSISTments system [4], the model
similarity on performance, and moderately improve accuracy on takes into account whether the problem the student is currently
skills with many dissimilar problems. working on was generated from the same template as the previous
problem.
Keywords
The next section describes the ASSISTments system, its template
Student modeling, Bayesian Knowledge Tracing, Problem Simi- system and the data used for this paper. Section 3 describes BKT
larity and BKT-ST in more detail, and describes the analyses we per-
formed on these models. The results are reported in Section 4,
1. INTRODUCTION followed by discussion and possible directions for future work in
Bayesian Knowledge Tracing (BKT) [3] is a popular student Section 5.
model used both in research and in actual intelligent tutoring
systems. As a model that infers student knowledge, BKT has 2. TUTORING SYSTEM AND DATA
helped researchers answer questions about the effectiveness of
help within a tutor [1], the impact of “gaming the system” on 2.1 ASSISTments
learning [5], and the relationship between student knowledge and ASSISTments [4] is a freely available web-based tutoring system
affect [9], among others. Additionally, it has been used in the used primarily for middle and high school mathematics. In addi-
Cognitive Tutors [6] to determine which questions should be tion to providing a way for teachers to assess their students, AS-
presented to a student, and when a student no longer needs prac- SISTments also assists the students in a few different ways:
tice on a given skill. through the use of series of on-demand hint messages that typical-
ly end in the answer to the question (the “bottom-out hint”),
However, BKT models are dependent upon the underlying skill “buggy” or feedback messages that appear when the student gives
model of the system, as a separate BKT model is typically fit per a common wrong answer, and “scaffolding” questions that break
skill. If a skill model is too coarse-grained or too fine-grained, it the original question into smaller questions that are easier to an-
can make it more difficult for a BKT model to accurately infer swer.
student knowledge [8].
While teachers are free to author their own content, ASSISTments
Additionally, even when a skill model is tagged at the appropriate provides a library of approved content, which includes problem
sets called skill-builders, which are meant to help students prac-
tice a particular skill. While most problem sets contain a fixed
number of problems that must all be completed for a student to
finish, a skill-builder is a special type of problem set that assigns
questions in a random order and that is considered complete once
a student answers three consecutive questions correctly on the
same day.
While requiring students to answer three consecutive questions responses to questions that exercise a given knowledge compo-
correctly on the same day to complete a skill-builder ensures that nent (or “skill”).
they have some level of knowledge of the particular skill being
Typically, a separate BKT model is fit for each skill. BKT models
exercised, it takes some students many problems to achieve this,
assume that there are only two states a student can be in for a
meaning they may see the same problem more than once if the
given skill: the known state or the unknown state. Using a stu-
skill-builder does not contain enough unique problems.
dent’s performance history on a given skill, a BKT model infers
To ensure this does not happen (or at least make it highly unlike- the probability that the student is in the known state on question t,
ly), ASSISTments has a templating system that facilitates creating P(Kt).
large numbers of similar problems quickly. The content creator
creates a question as normal, but specifies that it is a template and Fitting a BKT model involves estimating four probabilities:
uses variables in the problem statement and answer rather than 1. Prior Knowledge – P(L0): the probability the student
specific values. Then, they are able to generate 10 unique prob- knew the skill before answering the first question
lems at a time from that template, where each problem is random-
ly populated with specific values as prescribed by the template. 2. Learn Rate – P(T): the probability the student will know
This is especially useful for skill-builders, whose problems should the skill on the next question, given that they do not
theoretically all exercise the same skill. Figure 1 shows an exam- know the skill on the current question
ple of a template (a) and a problem generated from it (b).
3. Guess Rate – P(G): the probability the student will an-
swer the current question correctly despite not knowing
the skill
4. Slip Rate – P(S): the probability the student will answer
the current question incorrectly despite knowing the
skill
Note that forgetting is typically not modeled in BKT: it is as-
sumed that once a student learns a skill, they do not forget it. An
example of a BKT model, represented as a static unrolled Bayesi-
an network, is shown in Figure 2.

Figure 1. A template (top image) and a problem generated Figure 2. Static unrolled representation of Bayesian
from it (bottom). The variables ‘b’ and ‘c’ in the template are Knowledge Tracing. The Kt nodes along the top represent
replaced by ‘8’ and ‘23’ in the generated problem. latent knowledge, while the Ct nodes represent performance.

2.2 Data 3.2 Bayesian Knowledge Tracing – Same
In this work, we used ASSISTments skill-builder data from the
2009-2010 school year. This data set consists of 61,522 problem
Template
attempts by 1,579 students, spread across 67 different skill- The Bayesian Knowledge Tracing - Same Template (BKT-ST)
builders. A (student, skill-builder) pair was only included if the model differs from the regular BKT model in one way: it takes
student attempted three or more problems on that particular skill- into account whether the problem it’s about to predict was gener-
builder, and a skill-builder was included if it was used by at least ated from the same template as the previous problem the student
10 students and at least one of them completed it. worked on. This is modeled as a binary observed variable that
influences performance.
3. METHODS This results in six parameters to be learned per skill: the initial
In this section, we begin by describing Bayesian Knowledge Trac- knowledge rate, the learn rate, and two sets of guess and slip rates:
ing, and then move on to our modification of it, called Bayesian one set for when the previous problem and current problem were
Knowledge Tracing – Same Template. Finally, we describe the generated from the same template (P(G|Same) and P(S|Same)),
analyses we performed using these two models. and one for when they aren’t (P(G|Different) and P(S|Different)).
The model is shown in Figure 3.
3.1 Bayesian Knowledge Tracing
Bayesian Knowledge Tracing (BKT) [3] is a popular student
model that uses a dynamic Bayesian network to infer student
knowledge using only a student’s history of correct and incorrect
According to these results, BKT-ST outperforms BKT in all three
metrics. Statistical tests confirmed that these results were reliable
(MAE: p < .0001, t(1578) = 9.939; RMSE: p < .0001, t(1578) =
4.825; AUC: p < .0001, t(1314) = -11.095), though according to
the values in the table, the only noticeable gain was in AUC.

4.2 By Number of Templates
Next, we considered how well each model did based on the num-
ber of templates a skill-builder contained. The results are shown
in Figure 4.

Figure 3. Static unrolled representation of Bayesian
Knowledge Tracing – Same Template. The only difference
from BKT is the presence of the Dt nodes, which represent
whether the previous question was generated by the same
template as the current one.

3.3 Analyses
The first analysis in this work simply considers how well the two
models fit the data compared to each other overall. This is deter-
mined by fitting separate BKT and BKT-ST models for each skill
and then predicting unseen student data using five-fold student-
level cross-validation. Then, we evaluate each model’s ability to
predict next question correctness by computing the mean absolute
error (MAE), root mean squared error (RMSE) and area under the
curve (AUC) for each student and then averaging across students
for each type of model. Finally, two-tailed paired t-tests are used
to determine the significance of the differences in the metrics.
The second analysis considers what the metrics look like for each Figure 4. Graph of MAE, RMSE and AUC for the BKT and
model based on how many templates were used for each skill- BKT-ST models, plotted against the number of unique tem-
builder problem set. This is done by splitting the predictions made plates per skill.
in the first analysis by how many templates were used in the cor-
Interestingly, both BKT and BKT-ST decline rapidly in terms of
responding skill-builder. We did this to see when it would be
model goodness as the number of templates per skill-builder in-
worth using BKT-ST over BKT.
creases. This is likely the case because those with more templates
Finally we consider the parameter values learned for the BKT-ST are more likely to have more than one skill being tested within
model to determine any effects that seeing problems generated by them. Interestingly, although both models decline similarly in
the same template consecutively has on guessing and slipping. terms of MAE and RMSE, BKT-ST declines at a slower rate than
The BKT and BKT-ST models used in these analyses are fit using BKT does in terms of AUC. In fact, BKT-ST outperforms BKT in
the Expectation-Maximization (EM) algorithm in the Bayes Net terms of AUC for every group of skills with more than one tem-
Toolbox for Matlab (BNT) [7]. The initial values given to EM for plate. When grouping the skills by the number of templates they
BKT were 0.5 for P(L0) and 0.1 for the other three parameters. had, BKT-ST achieved an AUC of at least 0.0236 better than
This was also true for BKT-ST, except the slip rate was set to 0.2 BKT for each group that had more than one template, and
when the current and previous problems were generated from the achieved AUC values that were 0.1086 and 0.0980 better than
same template. BKT for skills with five and 10 templates, respectively. Addition-
ally, while BKT performs worse than chance (AUC < 0.5) on
skills with eight or more templates, BKT-ST never performs
4. RESULTS worse than chance.
In this section, we first present the overall comparison of BKT
and BKT-ST, then show how they compare to each other based on 4.3 Parameter Values
the number of templates used in each skill-builder. Finally, we To analyze the parameters learned by BKT-ST, for each skill, we
examine the learned parameters for the BKT-ST model. took the average value of each of the six parameters learned
across the five folds from the overall analysis.
4.1 Overall
The overall results comparing BKT to BKT-ST are shown in First, we computed the average value of each parameter across all
Table 1. 67 skills. These are shown in Table 2.

Table 1. Overall results of fitting BKT and BKT-ST models. Table 2. Means and standard deviations of BKT-ST parameter
values learned across 67 skill-builders
MAE RMSE AUC
BKT 0.3830 0.4240 0.5909 Parameter Mean SD
BKT-ST 0.3751 0.4205 0.6314 P(L0) 0.6030 0.2617
P(T) 0.2966 0.2500
P(G|Different) 0.1880 0.1655 student models to take the similarity of the problems students
encounter into account when trying to model student knowledge.
P(S|Different) 0.2941 0.1737
P(G|Same) 0.3337 0.2495 One direction for future work would be to try going back further
in the problem sequence to see how the similarity of problems
P(S|Same) 0.1514 0.0848 earlier in a student’s history affects their ability to answer the
current problem. Additionally, it would be interesting to deter-
mine whether the effect changes in certain situations. For exam-
From the results in Table 2, it appears that on average, seeing
ple, what is the effect of seeing two similar problems in a row,
consecutive questions generated from the same template both
followed by one that is different from both?
increases the guess rate (p < .0001, t(66) = -4.516) and decreases
the slip rate (p < .0001, t(66) = 7.186). Another area of interest would be to use a model that takes prob-
Next, we examined how these parameters changed with respect to lem similarity into account when trying to predict a longer-term
the number of templates used per skill-builder. The average values outcome, such as wheel-spinning [2], retention and transfer, as
of the performance parameters (guess and slip rates for same and opposed to simply predicting next question correctness.
different templates) are shown in the graph in Figure 5. The re- Finally, applying this model and others like it to other learning
sults for skills with one template are omitted since the environments and skill models of various grain sizes would be
P(G|Different) and P(S|Different) parameters are meaningless in helpful for understanding when it is useful. Presumably, if a skill
such cases. model is at the appropriate grain size, the difference in predictive
performance between BKT and BKT-ST would be reduced. The
same would be true of systems that fall to one of two extremes:
those whose problem sets are highly repetitive, and those whose
problem sets have a rich variety of problems.

6. ACKNOWLEDGMENTS
We acknowledge funding from NSF (#1316736, 1252297,
1109483, 1031398, 0742503), ONR's 'STEM Grand Challenges'
and IES (# R305A120125 & R305C100024).

7. REFERENCES
[1] Beck, J.E., Chang, K., Mostow, J., Corbett, A. Does help
help? Introducing the Bayesian Evaluation and Assessment
methodology. Intelligent Tutoring Systems, Springer Berlin
Heidelberg, 2008, 383-394.
[2] Beck, J. E., and Gong, Y. Wheel-Spinning: Students Who
Figure 5. Average value of each performance parameter for Fail to Master a Skill. In Artificial Intelligence in Education,
the number of templates used per skill-builder. pp. 431-440. Springer Berlin Heidelberg, 2013.
Although there is no clear pattern for any of the four performance [3] Corbett, A. and Anderson, J. Knowledge Tracing: Modeling
parameters shown in the graph, the average value of P(G|Same) is the Acquisition of Procedural Knowledge. User Modeling
always higher than that of P(G|Different), and that of P(S|Same) is and User-Adapted Interaction, 4(4), 253-278.
always lower than that of P(S|Different), with respect to the num- [4] Feng, M., Heffernan, N.T., Koedinger, K.R. Addressing the
ber of templates used per skill. This appears to reinforce the no- assessment challenge in an Intelligent Tutoring System that
tion that seeing consecutive problems generated from the same tutors as it assesses. User Modeling and User-Adapted Inter-
template makes the latter easier to solve, whether this is due to the action, 19(3), 243-266.
skill model being too coarse-grained or familiarity with a certain
type of problem within a skill inflating performance. [5] Gong, Y., Beck, J., Heffernan, N., Forbes-Summers, E, The
impact of gaming (?) on learning at the fine-grained level. in
Proceedings of the 10th International Conference on Intelli-
5. DISCUSSION AND FUTURE WORK gent Tutoring Systems, (Pittsburgh, PA, 2010), Springer,
From the results in this work, it appears that modifying Bayesian 194-203.
Knowledge Tracing to take similarity between consecutive prob-
lems into account moderately improves cross-validated predictive [6] Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M.A.
performance, especially in terms of AUC. Additionally, this work (1997). Intelligent Tutoring Goes To School in the Big City.
showed that seeing consecutive similar problems improves stu- International Journal of Artificial Intelligence in Education,
dent performance by both increasing the guess rate – the probabil- 8(1), 30-43.
ity of answering a question correctly despite not knowing the skill [7] Murphy, K. The bayes net toolbox for matlab. Computing
– and decreasing the slip rate – the probability of answering a science and statistics, 33(2), 1024-1034.
question incorrectly despite knowing the skill. Regardless of the
[8] Pardos, Z. A., Heffernan, N. T., & Anderson, B., Heffernan,
underlying reason for this, whether it is because the skill model is
C. L. Using Fine-Grained Skill Models to Fit Student Per-
too coarse-grained or simply that familiarity with a type of prob-
formance with Bayesian Networks. Proceedings of the Work-
lem within a skill improves performance, it appears important for
shop in Educational Data Mining held at the 8th Interna-
tional Conference on Intelligent Tutoring Systems. (Taiwan, In Lane, H.C., Yacef, K., Mostow, M., Pavlik, P. (Eds.)
2006). AIED 2013. LNCS, vol. 7926/2013, pp.41-50. Springer-
[9] San Pedro, M., Baker, R.S.J.d, Gowda, S.M., Heffernan, Verlag, Berlin Heidelberg.
N.T. Towards an Understanding of Affect and Knowledge
from Student Interaction with an Intelligent Tutoring System.