=Paper= {{Paper |id=Vol-1633/ws3-paper1 |storemode=property |title=Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates |pdfUrl=https://ceur-ws.org/Vol-1633/ws3-paper1.pdf |volume=Vol-1633 |authors=Christopher J. MacLellan |dblpUrl=https://dblp.org/rec/conf/edm/MacLellan16 }} ==Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates== https://ceur-ws.org/Vol-1633/ws3-paper1.pdf

Investigating the Impact of Slipping Parameters on
Additive Factors Model Parameter Estimates
Christopher J. MacLellan
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
cmaclell@cs.cmu.edu

ABSTRACT AFM [4]. However, a through investigation of how slipping rates
The Additive Factors Model (AFM), a widely used model of impact learning estimates has not been done.
student learning, estimates students’ prior knowledge, the In order to investigate the impact of slipping parameters on
difficulty of tutored skills, and the rates at which these skills are AFM’s model fit and parameter estimates, I created a
learned. In contrast to Bayesian Knowledge Tracing (BKT), LearnSphere workflow component that implements both AFM
another widely used model of student learning, AFM does not and the extension of AFM that includes slipping parameters. I
have parameters for the slipping rates of learned skills; i.e., it does refer to this extension as AFM+S [3]. Using this component I fit
not explicitly model situations where students know a skill, but both AFM and AFM+S models to five datasets from DataShop. I
still apply it incorrectly. Thus, AFM assumes that as students get analyzed the output to determine which model best fits the data,
more practice their probability of correctly applying a skill whether slipping was occurring in the datasets, and to compare the
converges to 100%, whereas BKT allows convergence to lower parameter estimates of the two models to determine how the
probabilities. This restriction constrains the range of values that slipping parameters affect the learning rate estimates.
AFM parameters can take. In particular, when the asymptotic
performance of a skill is less than 100%, AFM will estimate the Previous work has shown that AFM+S better fits the data better
learning rate to be lower than if slipping was taken into account. than AFM and BKT on five different datasets [3]. I replicated this
To investigate this phenomenon, I will created a LearnSphere analysis to show that the same results hold with the new workflow
workflow component that implements AFM and a variant of AFM component. Further, in my analysis I took additional precautions
with explicit slipping parameters (AFM+S). Using this to prevent Type I errors (i.e., identifying a significant difference
component, I analyze multiple DataShop datasets to determine (1) when none exists). As a preliminary test that learning rates
whether the model with slipping parameters better fits the data estimated by the AFM+S model will be higher than the learning
and (2) how the addition of slipping parameters impacts the rates estimated by the AFM model, I fit both models to the
parameter estimates returned by AFM. I show that, in general, Geometry Area 1996-1997 dataset accessed via DataShop [5] and
AFM+S better fits the data than the AFM. Additionally, I show compared their learning rate estimates. I found that the mean
that AFM+S estimates higher skill intercepts and learning rates learning rate for the AFM model was 0.18 logits, whereas the
than AFM, whereas AFM estimates higher student intercepts than mean learning rate for the AFM+S model was 0.42, a significant
AFM+S. difference (V=0, p < 0.01 via a paired Wilcoxon signed-rank test).
These preliminary results suggested that adding slipping
Keywords parameters to the model causes the estimated learning rates to be
higher. However, I wanted to analyze the other four datasets to
Cognitive Modeling, Statistical Models of Learning, Additive identify whether this was a systematic trend. In this paper I will
Factors Model, Knowledge Tracing. present the results of this analysis. In particular, I show AFM+S
better fits the five datasets than AFM on unstratified and stratified
1. INTRODUCTION cross validation and that the skill intercepts and slopes (i.e.,
The Additive Factors Model [1], or AFM, is a statistical model of learning rates) estimated by the AFM+S model are higher than
student learning that can be fit to educational data in order to those estimated by the AFM model. Further, I also show that the
estimate students’ prior knowledge, the difficulty of tutored skills, AFM model estimates the student intercepts to be higher than the
and the rates at which these skills are learned. Unlike Bayesian AFM+S model.
Knowledge Tracing [2], an alternative statistical model of student In addition to exploring these ideas, this paper showcases the new
learning, AFM does not have explicit parameters to model the rate LearnSphere workflow component. Researchers can use this
at which students incorrectly apply learned skills (i.e., slipping component in situations where they want to use AFM, but where
parameters). they suspect slipping is occurring. BKT is one possible
This lack of slipping parameters has an impact on both the model alternative, but is not a panacea. For example, BKT does not
fit and the parameter estimates. If slipping is occurring, then support multiple skill labels per step, but AFM+S does. Further,
model fits should improve by taking these parameters into there is evidence that AFM+S better fits many datasets than the
account. Further, in situations where slipping is occurring, AFM traditional BKT [3]. A workflow component for AFM+S is a
will underestimate learning rates so that it can fit the higher error contribution to the ecosystem of learning analytic models that
rates in the tail of the learning curve [3]. There is some evidence researchers might like to use.
that the learning rates estimated by BKT, an approach that takes
slipping into account, tend to be higher than those estimated by
2. WORKFLOW COMPONENT
2.1 Data Inputs
The AFM+S workflow component that I am created accepts the
standard PSLC DataShop student-step rollup format. From these
files the AFM+S model requires information about the student
labels, the knowledge component labels, and the knowledge
component opportunity counts. Depending on whether item cross-
validation is to be performed, the model also needs the item
labels.

2.2 Workflow Model
The code for the AFM+S workflow component is implemented in
Python and is publicly available on GitHub: Figure 1. The slipping rates of skills across the five datasets.
https://github.com/cmaclell/pyAFM. This code implements a After assessing overall model fits, I fit each model (AFM and
standard Logistic Regression classifier that accepts box- AFM+S) to each of the datasets using all of the available data and
constraints (so learning rates can be constrained to be positive) recorded the parameter estimates from both models. I plotted the
and L2 regularization parameters (so student intercepts can be slipping parameter values to determine which datasets are most
pulled towards 0). It also implements Bounded Logistic affected by the slipping parameters (Figure 1). In situations where
Regression, so that slipping parameters can be taken into account. there is little slipping, AFM+S should be identical to AFM. I then
Using these classifiers, the code provides implementations of both compared each of the parameter types (skill intercepts, skill
AFM and AFM+S as described in prior work [3]. slopes, and student intercepts) between models using a Paired
Wilcoxon Signed-Rank test to determine if there were systematic
2.3 Workflow Outputs differences in parameter estimates produced by the models across
The AFM+S workflow component has three possible outputs. the five dataset.
First, it outputs metrics for assessing the fit of the model to data.
In particular, it outputs unstratified, stratified, student, and item 4. RESULTS
cross-validated root-mean-square error. Second, the model outputs Overall the AFM+S model better fits the data across the five
predicted first-attempt performance for each student step, so that datasets and four cross-validation types (unstratified, stratified,
the resulting learning curve can be plotted and compared to student, and item), via a Wilcoxon Signed-Rank Test paired by
alternative models. Finally, the model outputs student intercept cross-validation type, dataset, run, and fold (V=1350.5, p < 0.01).
parameter estimates, skill difficulty and learning rate parameter When dividing the data by cross-validation type, AFM+S better
estimates, and skill slipping parameter estimates. fits the data across the five datasets for unstratified (V=213, p <
The model fit statistics and parameter estimate outputs take the 0.01) and stratified (V=222, p < 0.01), but not student (V=8, p=1)
form of tables or comma-separated value output files. The model and item (V=26, p >0.7) cross validation. When dividing the data
predictions output takes the form of either a comma-separated by dataset, AFM+S better fits the data on Geometry (V=212,
value output file or learning curve plots. These learning curve p<0.01) and Equation Solving (V=181, p < 0.02), but not Number
plots are similar to those currently available on DataShop. Line (V=3, p = 1), Writing 1 (V=2, p = 1), or Writing 2 (V=32, p
> 0.6).
Figure 1 shows the skill slipping rates across the five datasets. The
3. METHOD slipping rates of skills on the Number Line, Writing, and Writing
In order to investigate the impact of slipping parameters on AFM
2 datasets are effectively zero (the max slip rate for any skills in
skill slopes, I used the new workflow component to fit both the
these datasets is 9 x 10-9 percent), which explains why there is no
AFM and AFM+S models to five datasets downloaded from
significant difference in model fit for these datasets; i.e., the
DataShop: Geometry [5], Equation Solving [6,7], Number Line
AFM+S is practically identical to AFM for these datasets.
Estimation [8], Writing 1 [9], and Writing 2 [10].
Further, it is likely that there was no difference on student and
Before analyzing parameter differences, I assessed which model item cross validation because there was not enough statistical
better fit the data using cross validation. For each model and power to detect a difference; i.e., I performed only 1 run of 2-fold
dataset, I performed 5 runs of 2-fold stratified and unstratified cross validation and only two of the five datasets had skills with
cross validation and 1 run of 2-fold student and item cross non-zero slipping rates.
validation (i.e., where students and items are divided across the
Across all five datasets AFM+S estimates higher skill intercepts
folds). I then used a Paired Wilcoxon Signed-Rank test to
(V=257.5, p < 0.01) and slopes (V=117, p< 0.01) than AFM,
compare the model fits across the datasets, runs, and folds. I did
whereas AFM estimates higher student intercepts (V=9226, p <
not conduct more runs or folds because there is evidence that
0.01) than AFM+S (via a Wilcoxon Signed-Rank test paired by
doing so increases the risk of Type I error due to the correlation in
skill and dataset). Note, these results are being primarily driven by
model fits between folds that share training data [11]. For student
the Geometry and Equation Solving datasets because AFM and
and item cross validation, I conducted only 1 run of 2 fold cross
AFM+S are practically identical on the Number Line, Writing,
validation because randomly splitting students and items between
and Writing 2 datasets.
fold, while balancing the number of training points between folds,
is non-random and repeated runs also increases the likelihood of
Type I error. 5. DISCUSSION
In general, my results show that AFM+S better fits the data than
the AFM model and that there are significant differences in the
parameters estimated by the two models. In particular, the skill 7. REFERENCES
intercepts and learning rate estimates from the AFM+S model are [1] Hao Cen, Kenneth R Koedinger, and Brian Junker. 2006.
higher than those returned by the AFM model. Further, the student Learning Factors Analysis – A General Method for Cognitive
intercept estimates from AFM+S are lower than those produced Model Evaluation and Improvement. 164–175.
by AFM. These findings suggest that the AFM model might be
compensating for skill slipping by adjusting the other parameters. [2] Albert T Corbett and John Robert Anderson. 1995.
The implication of this finding is that researchers interpreting Knowledge tracing: Modeling the acquisition of procedural
parameter estimates returned by AFM should be cautious in knowledge. User Modeling and User-Adapted Interaction 4,
situations where skill slipping appears to be occurring. 4: 253–278.
These results also suggest that, at least for these five datasets, the [3] Christopher J MacLellan, Ran Liu, and Kenneth R
AFM+S model is generally preferable to the AFM model. In Koedinger. 2015. Accounting for Slipping and Other False
situations where no slipping is occurring AFM+S reduces to the Negatives in Logistic Models of Student Learning.
AFM model and returns statistically identical model fits. [4] Ran Liu. Personal Communication. 2016.
However, when slipping occurs model fit improves with AFM+S.
[5] Kenneth R Koedinger. Geometry Area 1996-1997. Dataset
In conclusion, I have introduced a LearnSphere workflow 76 in DataShop. Retrieved from
component and shown how this component can be used to pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=76.
investigate the differences in model fits and parameter estimates
[6] Ritter, S., Anderson, J.R., Koedinger, K.R., & Corbett, A.
of the AFM and AFM+S models. My analysis shows that AFM+S
2007. The Cognitive Tutor: Applied research in mathematics
better fits the data than AFM on datasets where slipping occurs
education. Psychonomics Bulletin & Review, 14, 2: 249-
and that there are significant differences between the parameter
255.
estimates returned by the two models. These results suggest that
researchers using the AFM model should consider transitioning to [7] Booth, J., & Ritter, S. 2009. Self Explanation sch_a3329ee9
the AFM+S model when they suspect slipping to be occurring. Winter 2008 (CL). Dataset 293 in DataShop. Retrieved from
These results also showcase the capabilities of the new pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=293.
LearnSphere AFM+S workflow component. [8] Derek Lomas. Digital Games for Improving Number Sense –
Study 1. Dataset 445 in DataShop. Retrieved from
6. ACKNOWLEDGMENTS pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=445.
We thank Erik Harpstead, Michael Yudelson, and Rony Patel for [9] Ruth Wylie. IWT Self-Explanation Study 1 (Spring 2009)
their thoughts and comments when developing this work. This (tutors only). Dataset 313 in DataShop. Retrieved from
work was supported in part by the Department of Education pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=313.
(#R305B090023 and #R305B110003) and by the National
[10] Ruth Wylie. IWT Self-Explanation Study 2 (Spring 2009)
Science Foundation (#SBE-0836012). Finally, we thank Carnegie
(tutors only). Dataset 372 in DataShop. Retrieved from
Learning and all other data providers for making their data
pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=372.
available on DataShop.
[11] Dietterich, T. G. 1998. Approximate statistical tests for
comparing supervised classification learning algorithms.
Neural Computation, 10(7), 1895–1923.
http://doi.org/10.1162/089976698300017197.