=Paper=
{{Paper
|id=Vol-1633/ws3-paper1
|storemode=property
|title=Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates
|pdfUrl=https://ceur-ws.org/Vol-1633/ws3-paper1.pdf
|volume=Vol-1633
|authors=Christopher J. MacLellan
|dblpUrl=https://dblp.org/rec/conf/edm/MacLellan16
}}
==Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates==
Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates Christopher J. MacLellan Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 cmaclell@cs.cmu.edu ABSTRACT AFM [4]. However, a through investigation of how slipping rates The Additive Factors Model (AFM), a widely used model of impact learning estimates has not been done. student learning, estimates students’ prior knowledge, the In order to investigate the impact of slipping parameters on difficulty of tutored skills, and the rates at which these skills are AFM’s model fit and parameter estimates, I created a learned. In contrast to Bayesian Knowledge Tracing (BKT), LearnSphere workflow component that implements both AFM another widely used model of student learning, AFM does not and the extension of AFM that includes slipping parameters. I have parameters for the slipping rates of learned skills; i.e., it does refer to this extension as AFM+S [3]. Using this component I fit not explicitly model situations where students know a skill, but both AFM and AFM+S models to five datasets from DataShop. I still apply it incorrectly. Thus, AFM assumes that as students get analyzed the output to determine which model best fits the data, more practice their probability of correctly applying a skill whether slipping was occurring in the datasets, and to compare the converges to 100%, whereas BKT allows convergence to lower parameter estimates of the two models to determine how the probabilities. This restriction constrains the range of values that slipping parameters affect the learning rate estimates. AFM parameters can take. In particular, when the asymptotic performance of a skill is less than 100%, AFM will estimate the Previous work has shown that AFM+S better fits the data better learning rate to be lower than if slipping was taken into account. than AFM and BKT on five different datasets [3]. I replicated this To investigate this phenomenon, I will created a LearnSphere analysis to show that the same results hold with the new workflow workflow component that implements AFM and a variant of AFM component. Further, in my analysis I took additional precautions with explicit slipping parameters (AFM+S). Using this to prevent Type I errors (i.e., identifying a significant difference component, I analyze multiple DataShop datasets to determine (1) when none exists). As a preliminary test that learning rates whether the model with slipping parameters better fits the data estimated by the AFM+S model will be higher than the learning and (2) how the addition of slipping parameters impacts the rates estimated by the AFM model, I fit both models to the parameter estimates returned by AFM. I show that, in general, Geometry Area 1996-1997 dataset accessed via DataShop [5] and AFM+S better fits the data than the AFM. Additionally, I show compared their learning rate estimates. I found that the mean that AFM+S estimates higher skill intercepts and learning rates learning rate for the AFM model was 0.18 logits, whereas the than AFM, whereas AFM estimates higher student intercepts than mean learning rate for the AFM+S model was 0.42, a significant AFM+S. difference (V=0, p < 0.01 via a paired Wilcoxon signed-rank test). These preliminary results suggested that adding slipping Keywords parameters to the model causes the estimated learning rates to be higher. However, I wanted to analyze the other four datasets to Cognitive Modeling, Statistical Models of Learning, Additive identify whether this was a systematic trend. In this paper I will Factors Model, Knowledge Tracing. present the results of this analysis. In particular, I show AFM+S better fits the five datasets than AFM on unstratified and stratified 1. INTRODUCTION cross validation and that the skill intercepts and slopes (i.e., The Additive Factors Model [1], or AFM, is a statistical model of learning rates) estimated by the AFM+S model are higher than student learning that can be fit to educational data in order to those estimated by the AFM model. Further, I also show that the estimate students’ prior knowledge, the difficulty of tutored skills, AFM model estimates the student intercepts to be higher than the and the rates at which these skills are learned. Unlike Bayesian AFM+S model. Knowledge Tracing [2], an alternative statistical model of student In addition to exploring these ideas, this paper showcases the new learning, AFM does not have explicit parameters to model the rate LearnSphere workflow component. Researchers can use this at which students incorrectly apply learned skills (i.e., slipping component in situations where they want to use AFM, but where parameters). they suspect slipping is occurring. BKT is one possible This lack of slipping parameters has an impact on both the model alternative, but is not a panacea. For example, BKT does not fit and the parameter estimates. If slipping is occurring, then support multiple skill labels per step, but AFM+S does. Further, model fits should improve by taking these parameters into there is evidence that AFM+S better fits many datasets than the account. Further, in situations where slipping is occurring, AFM traditional BKT [3]. A workflow component for AFM+S is a will underestimate learning rates so that it can fit the higher error contribution to the ecosystem of learning analytic models that rates in the tail of the learning curve [3]. There is some evidence researchers might like to use. that the learning rates estimated by BKT, an approach that takes slipping into account, tend to be higher than those estimated by 2. WORKFLOW COMPONENT 2.1 Data Inputs The AFM+S workflow component that I am created accepts the standard PSLC DataShop student-step rollup format. From these files the AFM+S model requires information about the student labels, the knowledge component labels, and the knowledge component opportunity counts. Depending on whether item cross- validation is to be performed, the model also needs the item labels. 2.2 Workflow Model The code for the AFM+S workflow component is implemented in Python and is publicly available on GitHub: Figure 1. The slipping rates of skills across the five datasets. https://github.com/cmaclell/pyAFM. This code implements a After assessing overall model fits, I fit each model (AFM and standard Logistic Regression classifier that accepts box- AFM+S) to each of the datasets using all of the available data and constraints (so learning rates can be constrained to be positive) recorded the parameter estimates from both models. I plotted the and L2 regularization parameters (so student intercepts can be slipping parameter values to determine which datasets are most pulled towards 0). It also implements Bounded Logistic affected by the slipping parameters (Figure 1). In situations where Regression, so that slipping parameters can be taken into account. there is little slipping, AFM+S should be identical to AFM. I then Using these classifiers, the code provides implementations of both compared each of the parameter types (skill intercepts, skill AFM and AFM+S as described in prior work [3]. slopes, and student intercepts) between models using a Paired Wilcoxon Signed-Rank test to determine if there were systematic 2.3 Workflow Outputs differences in parameter estimates produced by the models across The AFM+S workflow component has three possible outputs. the five dataset. First, it outputs metrics for assessing the fit of the model to data. In particular, it outputs unstratified, stratified, student, and item 4. RESULTS cross-validated root-mean-square error. Second, the model outputs Overall the AFM+S model better fits the data across the five predicted first-attempt performance for each student step, so that datasets and four cross-validation types (unstratified, stratified, the resulting learning curve can be plotted and compared to student, and item), via a Wilcoxon Signed-Rank Test paired by alternative models. Finally, the model outputs student intercept cross-validation type, dataset, run, and fold (V=1350.5, p < 0.01). parameter estimates, skill difficulty and learning rate parameter When dividing the data by cross-validation type, AFM+S better estimates, and skill slipping parameter estimates. fits the data across the five datasets for unstratified (V=213, p < The model fit statistics and parameter estimate outputs take the 0.01) and stratified (V=222, p < 0.01), but not student (V=8, p=1) form of tables or comma-separated value output files. The model and item (V=26, p >0.7) cross validation. When dividing the data predictions output takes the form of either a comma-separated by dataset, AFM+S better fits the data on Geometry (V=212, value output file or learning curve plots. These learning curve p<0.01) and Equation Solving (V=181, p < 0.02), but not Number plots are similar to those currently available on DataShop. Line (V=3, p = 1), Writing 1 (V=2, p = 1), or Writing 2 (V=32, p > 0.6). Figure 1 shows the skill slipping rates across the five datasets. The 3. METHOD slipping rates of skills on the Number Line, Writing, and Writing In order to investigate the impact of slipping parameters on AFM 2 datasets are effectively zero (the max slip rate for any skills in skill slopes, I used the new workflow component to fit both the these datasets is 9 x 10-9 percent), which explains why there is no AFM and AFM+S models to five datasets downloaded from significant difference in model fit for these datasets; i.e., the DataShop: Geometry [5], Equation Solving [6,7], Number Line AFM+S is practically identical to AFM for these datasets. Estimation [8], Writing 1 [9], and Writing 2 [10]. Further, it is likely that there was no difference on student and Before analyzing parameter differences, I assessed which model item cross validation because there was not enough statistical better fit the data using cross validation. For each model and power to detect a difference; i.e., I performed only 1 run of 2-fold dataset, I performed 5 runs of 2-fold stratified and unstratified cross validation and only two of the five datasets had skills with cross validation and 1 run of 2-fold student and item cross non-zero slipping rates. validation (i.e., where students and items are divided across the Across all five datasets AFM+S estimates higher skill intercepts folds). I then used a Paired Wilcoxon Signed-Rank test to (V=257.5, p < 0.01) and slopes (V=117, p< 0.01) than AFM, compare the model fits across the datasets, runs, and folds. I did whereas AFM estimates higher student intercepts (V=9226, p < not conduct more runs or folds because there is evidence that 0.01) than AFM+S (via a Wilcoxon Signed-Rank test paired by doing so increases the risk of Type I error due to the correlation in skill and dataset). Note, these results are being primarily driven by model fits between folds that share training data [11]. For student the Geometry and Equation Solving datasets because AFM and and item cross validation, I conducted only 1 run of 2 fold cross AFM+S are practically identical on the Number Line, Writing, validation because randomly splitting students and items between and Writing 2 datasets. fold, while balancing the number of training points between folds, is non-random and repeated runs also increases the likelihood of Type I error. 5. DISCUSSION In general, my results show that AFM+S better fits the data than the AFM model and that there are significant differences in the parameters estimated by the two models. In particular, the skill 7. REFERENCES intercepts and learning rate estimates from the AFM+S model are [1] Hao Cen, Kenneth R Koedinger, and Brian Junker. 2006. higher than those returned by the AFM model. Further, the student Learning Factors Analysis – A General Method for Cognitive intercept estimates from AFM+S are lower than those produced Model Evaluation and Improvement. 164–175. by AFM. These findings suggest that the AFM model might be compensating for skill slipping by adjusting the other parameters. [2] Albert T Corbett and John Robert Anderson. 1995. The implication of this finding is that researchers interpreting Knowledge tracing: Modeling the acquisition of procedural parameter estimates returned by AFM should be cautious in knowledge. User Modeling and User-Adapted Interaction 4, situations where skill slipping appears to be occurring. 4: 253–278. These results also suggest that, at least for these five datasets, the [3] Christopher J MacLellan, Ran Liu, and Kenneth R AFM+S model is generally preferable to the AFM model. In Koedinger. 2015. Accounting for Slipping and Other False situations where no slipping is occurring AFM+S reduces to the Negatives in Logistic Models of Student Learning. AFM model and returns statistically identical model fits. [4] Ran Liu. Personal Communication. 2016. However, when slipping occurs model fit improves with AFM+S. [5] Kenneth R Koedinger. Geometry Area 1996-1997. Dataset In conclusion, I have introduced a LearnSphere workflow 76 in DataShop. Retrieved from component and shown how this component can be used to pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=76. investigate the differences in model fits and parameter estimates [6] Ritter, S., Anderson, J.R., Koedinger, K.R., & Corbett, A. of the AFM and AFM+S models. My analysis shows that AFM+S 2007. The Cognitive Tutor: Applied research in mathematics better fits the data than AFM on datasets where slipping occurs education. Psychonomics Bulletin & Review, 14, 2: 249- and that there are significant differences between the parameter 255. estimates returned by the two models. These results suggest that researchers using the AFM model should consider transitioning to [7] Booth, J., & Ritter, S. 2009. Self Explanation sch_a3329ee9 the AFM+S model when they suspect slipping to be occurring. Winter 2008 (CL). Dataset 293 in DataShop. Retrieved from These results also showcase the capabilities of the new pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=293. LearnSphere AFM+S workflow component. [8] Derek Lomas. Digital Games for Improving Number Sense – Study 1. Dataset 445 in DataShop. Retrieved from 6. ACKNOWLEDGMENTS pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=445. We thank Erik Harpstead, Michael Yudelson, and Rony Patel for [9] Ruth Wylie. IWT Self-Explanation Study 1 (Spring 2009) their thoughts and comments when developing this work. This (tutors only). Dataset 313 in DataShop. Retrieved from work was supported in part by the Department of Education pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=313. (#R305B090023 and #R305B110003) and by the National [10] Ruth Wylie. IWT Self-Explanation Study 2 (Spring 2009) Science Foundation (#SBE-0836012). Finally, we thank Carnegie (tutors only). Dataset 372 in DataShop. Retrieved from Learning and all other data providers for making their data pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=372. available on DataShop. [11] Dietterich, T. G. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923. http://doi.org/10.1162/089976698300017197.