=Paper=
{{Paper
|id=Vol-1183/bkt20y_paper04
|storemode=property
|title= Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing
|pdfUrl=https://ceur-ws.org/Vol-1183/bkt20y_paper04.pdf
|volume=Vol-1183
|dblpUrl=https://dblp.org/rec/conf/edm/HawkinsH14
}}
== Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing==
Using Similarity to the Previous Problem to Improve Bayesian Knowledge Tracing William J. Hawkins Neil T. Heffernan Worcester Polytechnic Institute Worcester Polytechnic Institute 100 Institute Road 100 Institute Road Worcester, MA 01609 Worcester, MA 01609 whawkins90@gmail.com nth@wpi.edu ABSTRACT level, seeing similar problems consecutively as opposed to seeing Bayesian Knowledge Tracing (BKT) is a popular student model dissimilar problems may have effects on guessing and slipping, used extensively in educational research and in intelligent tutoring two important components of BKT models. For example, if a systems. Typically, a separate BKT model is fit per skill, but the student does not understand the skill they are working on, seeing a accuracy of such models is dependent upon the skill model, or certain type of question twice or more consecutively may improve mapping between problems and skills. It could be the case that the their chances of “guessing” the answer using a suboptimal proce- skill model used is too coarse-grained, causing multiple skills to dure that would not work on other questions from the same skill. all be considered the same skill. Additionally, even if the skill Whether the skill model is not at the appropriate level or seeing model is appropriate, having problems that exercise the same skill consecutive similar questions helps students succeed without fully but look different can have effects on student performance. There- learning a skill, it may be important to take problem similarity fore, this work introduces a student model based on BKT that into account in student models. In this work, we introduce the takes into account the similarity between the problem the student Bayesian Knowledge Tracing – Same Template (BKT-ST) model, is currently working on and the one they worked on just prior to a modification of BKT that considers problem similarity. Specifi- it. By doing this, the model can capture the effect of problem cally, using data from the ASSISTments system [4], the model similarity on performance, and moderately improve accuracy on takes into account whether the problem the student is currently skills with many dissimilar problems. working on was generated from the same template as the previous problem. Keywords The next section describes the ASSISTments system, its template Student modeling, Bayesian Knowledge Tracing, Problem Simi- system and the data used for this paper. Section 3 describes BKT larity and BKT-ST in more detail, and describes the analyses we per- formed on these models. The results are reported in Section 4, 1. INTRODUCTION followed by discussion and possible directions for future work in Bayesian Knowledge Tracing (BKT) [3] is a popular student Section 5. model used both in research and in actual intelligent tutoring systems. As a model that infers student knowledge, BKT has 2. TUTORING SYSTEM AND DATA helped researchers answer questions about the effectiveness of help within a tutor [1], the impact of “gaming the system” on 2.1 ASSISTments learning [5], and the relationship between student knowledge and ASSISTments [4] is a freely available web-based tutoring system affect [9], among others. Additionally, it has been used in the used primarily for middle and high school mathematics. In addi- Cognitive Tutors [6] to determine which questions should be tion to providing a way for teachers to assess their students, AS- presented to a student, and when a student no longer needs prac- SISTments also assists the students in a few different ways: tice on a given skill. through the use of series of on-demand hint messages that typical- ly end in the answer to the question (the “bottom-out hint”), However, BKT models are dependent upon the underlying skill “buggy” or feedback messages that appear when the student gives model of the system, as a separate BKT model is typically fit per a common wrong answer, and “scaffolding” questions that break skill. If a skill model is too coarse-grained or too fine-grained, it the original question into smaller questions that are easier to an- can make it more difficult for a BKT model to accurately infer swer. student knowledge [8]. While teachers are free to author their own content, ASSISTments Additionally, even when a skill model is tagged at the appropriate provides a library of approved content, which includes problem sets called skill-builders, which are meant to help students prac- tice a particular skill. While most problem sets contain a fixed number of problems that must all be completed for a student to finish, a skill-builder is a special type of problem set that assigns questions in a random order and that is considered complete once a student answers three consecutive questions correctly on the same day. While requiring students to answer three consecutive questions responses to questions that exercise a given knowledge compo- correctly on the same day to complete a skill-builder ensures that nent (or “skill”). they have some level of knowledge of the particular skill being Typically, a separate BKT model is fit for each skill. BKT models exercised, it takes some students many problems to achieve this, assume that there are only two states a student can be in for a meaning they may see the same problem more than once if the given skill: the known state or the unknown state. Using a stu- skill-builder does not contain enough unique problems. dent’s performance history on a given skill, a BKT model infers To ensure this does not happen (or at least make it highly unlike- the probability that the student is in the known state on question t, ly), ASSISTments has a templating system that facilitates creating P(Kt). large numbers of similar problems quickly. The content creator creates a question as normal, but specifies that it is a template and Fitting a BKT model involves estimating four probabilities: uses variables in the problem statement and answer rather than 1. Prior Knowledge – P(L0): the probability the student specific values. Then, they are able to generate 10 unique prob- knew the skill before answering the first question lems at a time from that template, where each problem is random- ly populated with specific values as prescribed by the template. 2. Learn Rate – P(T): the probability the student will know This is especially useful for skill-builders, whose problems should the skill on the next question, given that they do not theoretically all exercise the same skill. Figure 1 shows an exam- know the skill on the current question ple of a template (a) and a problem generated from it (b). 3. Guess Rate – P(G): the probability the student will an- swer the current question correctly despite not knowing the skill 4. Slip Rate – P(S): the probability the student will answer the current question incorrectly despite knowing the skill Note that forgetting is typically not modeled in BKT: it is as- sumed that once a student learns a skill, they do not forget it. An example of a BKT model, represented as a static unrolled Bayesi- an network, is shown in Figure 2. Figure 1. A template (top image) and a problem generated Figure 2. Static unrolled representation of Bayesian from it (bottom). The variables ‘b’ and ‘c’ in the template are Knowledge Tracing. The Kt nodes along the top represent replaced by ‘8’ and ‘23’ in the generated problem. latent knowledge, while the Ct nodes represent performance. 2.2 Data 3.2 Bayesian Knowledge Tracing – Same In this work, we used ASSISTments skill-builder data from the 2009-2010 school year. This data set consists of 61,522 problem Template attempts by 1,579 students, spread across 67 different skill- The Bayesian Knowledge Tracing - Same Template (BKT-ST) builders. A (student, skill-builder) pair was only included if the model differs from the regular BKT model in one way: it takes student attempted three or more problems on that particular skill- into account whether the problem it’s about to predict was gener- builder, and a skill-builder was included if it was used by at least ated from the same template as the previous problem the student 10 students and at least one of them completed it. worked on. This is modeled as a binary observed variable that influences performance. 3. METHODS This results in six parameters to be learned per skill: the initial In this section, we begin by describing Bayesian Knowledge Trac- knowledge rate, the learn rate, and two sets of guess and slip rates: ing, and then move on to our modification of it, called Bayesian one set for when the previous problem and current problem were Knowledge Tracing – Same Template. Finally, we describe the generated from the same template (P(G|Same) and P(S|Same)), analyses we performed using these two models. and one for when they aren’t (P(G|Different) and P(S|Different)). The model is shown in Figure 3. 3.1 Bayesian Knowledge Tracing Bayesian Knowledge Tracing (BKT) [3] is a popular student model that uses a dynamic Bayesian network to infer student knowledge using only a student’s history of correct and incorrect According to these results, BKT-ST outperforms BKT in all three metrics. Statistical tests confirmed that these results were reliable (MAE: p < .0001, t(1578) = 9.939; RMSE: p < .0001, t(1578) = 4.825; AUC: p < .0001, t(1314) = -11.095), though according to the values in the table, the only noticeable gain was in AUC. 4.2 By Number of Templates Next, we considered how well each model did based on the num- ber of templates a skill-builder contained. The results are shown in Figure 4. Figure 3. Static unrolled representation of Bayesian Knowledge Tracing – Same Template. The only difference from BKT is the presence of the Dt nodes, which represent whether the previous question was generated by the same template as the current one. 3.3 Analyses The first analysis in this work simply considers how well the two models fit the data compared to each other overall. This is deter- mined by fitting separate BKT and BKT-ST models for each skill and then predicting unseen student data using five-fold student- level cross-validation. Then, we evaluate each model’s ability to predict next question correctness by computing the mean absolute error (MAE), root mean squared error (RMSE) and area under the curve (AUC) for each student and then averaging across students for each type of model. Finally, two-tailed paired t-tests are used to determine the significance of the differences in the metrics. The second analysis considers what the metrics look like for each Figure 4. Graph of MAE, RMSE and AUC for the BKT and model based on how many templates were used for each skill- BKT-ST models, plotted against the number of unique tem- builder problem set. This is done by splitting the predictions made plates per skill. in the first analysis by how many templates were used in the cor- Interestingly, both BKT and BKT-ST decline rapidly in terms of responding skill-builder. We did this to see when it would be model goodness as the number of templates per skill-builder in- worth using BKT-ST over BKT. creases. This is likely the case because those with more templates Finally we consider the parameter values learned for the BKT-ST are more likely to have more than one skill being tested within model to determine any effects that seeing problems generated by them. Interestingly, although both models decline similarly in the same template consecutively has on guessing and slipping. terms of MAE and RMSE, BKT-ST declines at a slower rate than The BKT and BKT-ST models used in these analyses are fit using BKT does in terms of AUC. In fact, BKT-ST outperforms BKT in the Expectation-Maximization (EM) algorithm in the Bayes Net terms of AUC for every group of skills with more than one tem- Toolbox for Matlab (BNT) [7]. The initial values given to EM for plate. When grouping the skills by the number of templates they BKT were 0.5 for P(L0) and 0.1 for the other three parameters. had, BKT-ST achieved an AUC of at least 0.0236 better than This was also true for BKT-ST, except the slip rate was set to 0.2 BKT for each group that had more than one template, and when the current and previous problems were generated from the achieved AUC values that were 0.1086 and 0.0980 better than same template. BKT for skills with five and 10 templates, respectively. Addition- ally, while BKT performs worse than chance (AUC < 0.5) on skills with eight or more templates, BKT-ST never performs 4. RESULTS worse than chance. In this section, we first present the overall comparison of BKT and BKT-ST, then show how they compare to each other based on 4.3 Parameter Values the number of templates used in each skill-builder. Finally, we To analyze the parameters learned by BKT-ST, for each skill, we examine the learned parameters for the BKT-ST model. took the average value of each of the six parameters learned across the five folds from the overall analysis. 4.1 Overall The overall results comparing BKT to BKT-ST are shown in First, we computed the average value of each parameter across all Table 1. 67 skills. These are shown in Table 2. Table 1. Overall results of fitting BKT and BKT-ST models. Table 2. Means and standard deviations of BKT-ST parameter values learned across 67 skill-builders MAE RMSE AUC BKT 0.3830 0.4240 0.5909 Parameter Mean SD BKT-ST 0.3751 0.4205 0.6314 P(L0) 0.6030 0.2617 P(T) 0.2966 0.2500 P(G|Different) 0.1880 0.1655 student models to take the similarity of the problems students encounter into account when trying to model student knowledge. P(S|Different) 0.2941 0.1737 P(G|Same) 0.3337 0.2495 One direction for future work would be to try going back further in the problem sequence to see how the similarity of problems P(S|Same) 0.1514 0.0848 earlier in a student’s history affects their ability to answer the current problem. Additionally, it would be interesting to deter- mine whether the effect changes in certain situations. For exam- From the results in Table 2, it appears that on average, seeing ple, what is the effect of seeing two similar problems in a row, consecutive questions generated from the same template both followed by one that is different from both? increases the guess rate (p < .0001, t(66) = -4.516) and decreases the slip rate (p < .0001, t(66) = 7.186). Another area of interest would be to use a model that takes prob- Next, we examined how these parameters changed with respect to lem similarity into account when trying to predict a longer-term the number of templates used per skill-builder. The average values outcome, such as wheel-spinning [2], retention and transfer, as of the performance parameters (guess and slip rates for same and opposed to simply predicting next question correctness. different templates) are shown in the graph in Figure 5. The re- Finally, applying this model and others like it to other learning sults for skills with one template are omitted since the environments and skill models of various grain sizes would be P(G|Different) and P(S|Different) parameters are meaningless in helpful for understanding when it is useful. Presumably, if a skill such cases. model is at the appropriate grain size, the difference in predictive performance between BKT and BKT-ST would be reduced. The same would be true of systems that fall to one of two extremes: those whose problem sets are highly repetitive, and those whose problem sets have a rich variety of problems. 6. ACKNOWLEDGMENTS We acknowledge funding from NSF (#1316736, 1252297, 1109483, 1031398, 0742503), ONR's 'STEM Grand Challenges' and IES (# R305A120125 & R305C100024). 7. REFERENCES [1] Beck, J.E., Chang, K., Mostow, J., Corbett, A. Does help help? Introducing the Bayesian Evaluation and Assessment methodology. Intelligent Tutoring Systems, Springer Berlin Heidelberg, 2008, 383-394. [2] Beck, J. E., and Gong, Y. Wheel-Spinning: Students Who Figure 5. Average value of each performance parameter for Fail to Master a Skill. In Artificial Intelligence in Education, the number of templates used per skill-builder. pp. 431-440. Springer Berlin Heidelberg, 2013. Although there is no clear pattern for any of the four performance [3] Corbett, A. and Anderson, J. Knowledge Tracing: Modeling parameters shown in the graph, the average value of P(G|Same) is the Acquisition of Procedural Knowledge. User Modeling always higher than that of P(G|Different), and that of P(S|Same) is and User-Adapted Interaction, 4(4), 253-278. always lower than that of P(S|Different), with respect to the num- [4] Feng, M., Heffernan, N.T., Koedinger, K.R. Addressing the ber of templates used per skill. This appears to reinforce the no- assessment challenge in an Intelligent Tutoring System that tion that seeing consecutive problems generated from the same tutors as it assesses. User Modeling and User-Adapted Inter- template makes the latter easier to solve, whether this is due to the action, 19(3), 243-266. skill model being too coarse-grained or familiarity with a certain type of problem within a skill inflating performance. [5] Gong, Y., Beck, J., Heffernan, N., Forbes-Summers, E, The impact of gaming (?) on learning at the fine-grained level. in Proceedings of the 10th International Conference on Intelli- 5. DISCUSSION AND FUTURE WORK gent Tutoring Systems, (Pittsburgh, PA, 2010), Springer, From the results in this work, it appears that modifying Bayesian 194-203. Knowledge Tracing to take similarity between consecutive prob- lems into account moderately improves cross-validated predictive [6] Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M.A. performance, especially in terms of AUC. Additionally, this work (1997). Intelligent Tutoring Goes To School in the Big City. showed that seeing consecutive similar problems improves stu- International Journal of Artificial Intelligence in Education, dent performance by both increasing the guess rate – the probabil- 8(1), 30-43. ity of answering a question correctly despite not knowing the skill [7] Murphy, K. The bayes net toolbox for matlab. Computing – and decreasing the slip rate – the probability of answering a science and statistics, 33(2), 1024-1034. question incorrectly despite knowing the skill. Regardless of the [8] Pardos, Z. A., Heffernan, N. T., & Anderson, B., Heffernan, underlying reason for this, whether it is because the skill model is C. L. Using Fine-Grained Skill Models to Fit Student Per- too coarse-grained or simply that familiarity with a type of prob- formance with Bayesian Networks. Proceedings of the Work- lem within a skill improves performance, it appears important for shop in Educational Data Mining held at the 8th Interna- tional Conference on Intelligent Tutoring Systems. (Taiwan, In Lane, H.C., Yacef, K., Mostow, M., Pavlik, P. (Eds.) 2006). AIED 2013. LNCS, vol. 7926/2013, pp.41-50. Springer- [9] San Pedro, M., Baker, R.S.J.d, Gowda, S.M., Heffernan, Verlag, Berlin Heidelberg. N.T. Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System.