=Paper= {{Paper |id=Vol-1633/ws3-paper1 |storemode=property |title=Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates |pdfUrl=https://ceur-ws.org/Vol-1633/ws3-paper1.pdf |volume=Vol-1633 |authors=Christopher J. MacLellan |dblpUrl=https://dblp.org/rec/conf/edm/MacLellan16 }} ==Investigating the Impact of Slipping Parameters on Additive Factors Model Parameter Estimates== https://ceur-ws.org/Vol-1633/ws3-paper1.pdf
          Investigating the Impact of Slipping Parameters on
             Additive Factors Model Parameter Estimates
                                                       Christopher J. MacLellan
                                                         Carnegie Mellon University
                                                           5000 Forbes Avenue
                                                           Pittsburgh, PA 15213
                                                         cmaclell@cs.cmu.edu



ABSTRACT                                                                  AFM [4]. However, a through investigation of how slipping rates
The Additive Factors Model (AFM), a widely used model of                  impact learning estimates has not been done.
student learning, estimates students’ prior knowledge, the                In order to investigate the impact of slipping parameters on
difficulty of tutored skills, and the rates at which these skills are     AFM’s model fit and parameter estimates, I created a
learned. In contrast to Bayesian Knowledge Tracing (BKT),                 LearnSphere workflow component that implements both AFM
another widely used model of student learning, AFM does not               and the extension of AFM that includes slipping parameters. I
have parameters for the slipping rates of learned skills; i.e., it does   refer to this extension as AFM+S [3]. Using this component I fit
not explicitly model situations where students know a skill, but          both AFM and AFM+S models to five datasets from DataShop. I
still apply it incorrectly. Thus, AFM assumes that as students get        analyzed the output to determine which model best fits the data,
more practice their probability of correctly applying a skill             whether slipping was occurring in the datasets, and to compare the
converges to 100%, whereas BKT allows convergence to lower                parameter estimates of the two models to determine how the
probabilities. This restriction constrains the range of values that       slipping parameters affect the learning rate estimates.
AFM parameters can take. In particular, when the asymptotic
performance of a skill is less than 100%, AFM will estimate the           Previous work has shown that AFM+S better fits the data better
learning rate to be lower than if slipping was taken into account.        than AFM and BKT on five different datasets [3]. I replicated this
To investigate this phenomenon, I will created a LearnSphere              analysis to show that the same results hold with the new workflow
workflow component that implements AFM and a variant of AFM               component. Further, in my analysis I took additional precautions
with explicit slipping parameters (AFM+S). Using this                     to prevent Type I errors (i.e., identifying a significant difference
component, I analyze multiple DataShop datasets to determine (1)          when none exists). As a preliminary test that learning rates
whether the model with slipping parameters better fits the data           estimated by the AFM+S model will be higher than the learning
and (2) how the addition of slipping parameters impacts the               rates estimated by the AFM model, I fit both models to the
parameter estimates returned by AFM. I show that, in general,             Geometry Area 1996-1997 dataset accessed via DataShop [5] and
AFM+S better fits the data than the AFM. Additionally, I show             compared their learning rate estimates. I found that the mean
that AFM+S estimates higher skill intercepts and learning rates           learning rate for the AFM model was 0.18 logits, whereas the
than AFM, whereas AFM estimates higher student intercepts than            mean learning rate for the AFM+S model was 0.42, a significant
AFM+S.                                                                    difference (V=0, p < 0.01 via a paired Wilcoxon signed-rank test).
                                                                          These preliminary results suggested that adding slipping
Keywords                                                                  parameters to the model causes the estimated learning rates to be
                                                                          higher. However, I wanted to analyze the other four datasets to
Cognitive Modeling, Statistical Models of Learning, Additive              identify whether this was a systematic trend. In this paper I will
Factors Model, Knowledge Tracing.                                         present the results of this analysis. In particular, I show AFM+S
                                                                          better fits the five datasets than AFM on unstratified and stratified
1. INTRODUCTION                                                           cross validation and that the skill intercepts and slopes (i.e.,
The Additive Factors Model [1], or AFM, is a statistical model of         learning rates) estimated by the AFM+S model are higher than
student learning that can be fit to educational data in order to          those estimated by the AFM model. Further, I also show that the
estimate students’ prior knowledge, the difficulty of tutored skills,     AFM model estimates the student intercepts to be higher than the
and the rates at which these skills are learned. Unlike Bayesian          AFM+S model.
Knowledge Tracing [2], an alternative statistical model of student        In addition to exploring these ideas, this paper showcases the new
learning, AFM does not have explicit parameters to model the rate         LearnSphere workflow component. Researchers can use this
at which students incorrectly apply learned skills (i.e., slipping        component in situations where they want to use AFM, but where
parameters).                                                              they suspect slipping is occurring. BKT is one possible
This lack of slipping parameters has an impact on both the model          alternative, but is not a panacea. For example, BKT does not
fit and the parameter estimates. If slipping is occurring, then           support multiple skill labels per step, but AFM+S does. Further,
model fits should improve by taking these parameters into                 there is evidence that AFM+S better fits many datasets than the
account. Further, in situations where slipping is occurring, AFM          traditional BKT [3]. A workflow component for AFM+S is a
will underestimate learning rates so that it can fit the higher error     contribution to the ecosystem of learning analytic models that
rates in the tail of the learning curve [3]. There is some evidence       researchers might like to use.
that the learning rates estimated by BKT, an approach that takes
slipping into account, tend to be higher than those estimated by
2. WORKFLOW COMPONENT
2.1 Data Inputs
The AFM+S workflow component that I am created accepts the
standard PSLC DataShop student-step rollup format. From these
files the AFM+S model requires information about the student
labels, the knowledge component labels, and the knowledge
component opportunity counts. Depending on whether item cross-
validation is to be performed, the model also needs the item
labels.

2.2 Workflow Model
The code for the AFM+S workflow component is implemented in
Python       and    is     publicly    available   on     GitHub:        Figure 1. The slipping rates of skills across the five datasets.
https://github.com/cmaclell/pyAFM. This code implements a               After assessing overall model fits, I fit each model (AFM and
standard Logistic Regression classifier that accepts box-               AFM+S) to each of the datasets using all of the available data and
constraints (so learning rates can be constrained to be positive)       recorded the parameter estimates from both models. I plotted the
and L2 regularization parameters (so student intercepts can be          slipping parameter values to determine which datasets are most
pulled towards 0). It also implements Bounded Logistic                  affected by the slipping parameters (Figure 1). In situations where
Regression, so that slipping parameters can be taken into account.      there is little slipping, AFM+S should be identical to AFM. I then
Using these classifiers, the code provides implementations of both      compared each of the parameter types (skill intercepts, skill
AFM and AFM+S as described in prior work [3].                           slopes, and student intercepts) between models using a Paired
                                                                        Wilcoxon Signed-Rank test to determine if there were systematic
2.3 Workflow Outputs                                                    differences in parameter estimates produced by the models across
The AFM+S workflow component has three possible outputs.                the five dataset.
First, it outputs metrics for assessing the fit of the model to data.
In particular, it outputs unstratified, stratified, student, and item   4. RESULTS
cross-validated root-mean-square error. Second, the model outputs       Overall the AFM+S model better fits the data across the five
predicted first-attempt performance for each student step, so that      datasets and four cross-validation types (unstratified, stratified,
the resulting learning curve can be plotted and compared to             student, and item), via a Wilcoxon Signed-Rank Test paired by
alternative models. Finally, the model outputs student intercept        cross-validation type, dataset, run, and fold (V=1350.5, p < 0.01).
parameter estimates, skill difficulty and learning rate parameter       When dividing the data by cross-validation type, AFM+S better
estimates, and skill slipping parameter estimates.                      fits the data across the five datasets for unstratified (V=213, p <
The model fit statistics and parameter estimate outputs take the        0.01) and stratified (V=222, p < 0.01), but not student (V=8, p=1)
form of tables or comma-separated value output files. The model         and item (V=26, p >0.7) cross validation. When dividing the data
predictions output takes the form of either a comma-separated           by dataset, AFM+S better fits the data on Geometry (V=212,
value output file or learning curve plots. These learning curve         p<0.01) and Equation Solving (V=181, p < 0.02), but not Number
plots are similar to those currently available on DataShop.             Line (V=3, p = 1), Writing 1 (V=2, p = 1), or Writing 2 (V=32, p
                                                                        > 0.6).
                                                                        Figure 1 shows the skill slipping rates across the five datasets. The
3. METHOD                                                               slipping rates of skills on the Number Line, Writing, and Writing
In order to investigate the impact of slipping parameters on AFM
                                                                        2 datasets are effectively zero (the max slip rate for any skills in
skill slopes, I used the new workflow component to fit both the
                                                                        these datasets is 9 x 10-9 percent), which explains why there is no
AFM and AFM+S models to five datasets downloaded from
                                                                        significant difference in model fit for these datasets; i.e., the
DataShop: Geometry [5], Equation Solving [6,7], Number Line
                                                                        AFM+S is practically identical to AFM for these datasets.
Estimation [8], Writing 1 [9], and Writing 2 [10].
                                                                        Further, it is likely that there was no difference on student and
Before analyzing parameter differences, I assessed which model          item cross validation because there was not enough statistical
better fit the data using cross validation. For each model and          power to detect a difference; i.e., I performed only 1 run of 2-fold
dataset, I performed 5 runs of 2-fold stratified and unstratified       cross validation and only two of the five datasets had skills with
cross validation and 1 run of 2-fold student and item cross             non-zero slipping rates.
validation (i.e., where students and items are divided across the
                                                                        Across all five datasets AFM+S estimates higher skill intercepts
folds). I then used a Paired Wilcoxon Signed-Rank test to
                                                                        (V=257.5, p < 0.01) and slopes (V=117, p< 0.01) than AFM,
compare the model fits across the datasets, runs, and folds. I did
                                                                        whereas AFM estimates higher student intercepts (V=9226, p <
not conduct more runs or folds because there is evidence that
                                                                        0.01) than AFM+S (via a Wilcoxon Signed-Rank test paired by
doing so increases the risk of Type I error due to the correlation in
                                                                        skill and dataset). Note, these results are being primarily driven by
model fits between folds that share training data [11]. For student
                                                                        the Geometry and Equation Solving datasets because AFM and
and item cross validation, I conducted only 1 run of 2 fold cross
                                                                        AFM+S are practically identical on the Number Line, Writing,
validation because randomly splitting students and items between
                                                                        and Writing 2 datasets.
fold, while balancing the number of training points between folds,
is non-random and repeated runs also increases the likelihood of
Type I error.                                                           5. DISCUSSION
                                                                        In general, my results show that AFM+S better fits the data than
                                                                        the AFM model and that there are significant differences in the
parameters estimated by the two models. In particular, the skill         7. REFERENCES
intercepts and learning rate estimates from the AFM+S model are          [1] Hao Cen, Kenneth R Koedinger, and Brian Junker. 2006.
higher than those returned by the AFM model. Further, the student            Learning Factors Analysis – A General Method for Cognitive
intercept estimates from AFM+S are lower than those produced                 Model Evaluation and Improvement. 164–175.
by AFM. These findings suggest that the AFM model might be
compensating for skill slipping by adjusting the other parameters.       [2] Albert T Corbett and John Robert Anderson. 1995.
The implication of this finding is that researchers interpreting             Knowledge tracing: Modeling the acquisition of procedural
parameter estimates returned by AFM should be cautious in                    knowledge. User Modeling and User-Adapted Interaction 4,
situations where skill slipping appears to be occurring.                     4: 253–278.
These results also suggest that, at least for these five datasets, the   [3] Christopher J MacLellan, Ran Liu, and Kenneth R
AFM+S model is generally preferable to the AFM model. In                     Koedinger. 2015. Accounting for Slipping and Other False
situations where no slipping is occurring AFM+S reduces to the               Negatives in Logistic Models of Student Learning.
AFM model and returns statistically identical model fits.                [4] Ran Liu. Personal Communication. 2016.
However, when slipping occurs model fit improves with AFM+S.
                                                                         [5] Kenneth R Koedinger. Geometry Area 1996-1997. Dataset
In conclusion, I have introduced a LearnSphere workflow                      76 in DataShop. Retrieved from
component and shown how this component can be used to                        pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=76.
investigate the differences in model fits and parameter estimates
                                                                         [6] Ritter, S., Anderson, J.R., Koedinger, K.R., & Corbett, A.
of the AFM and AFM+S models. My analysis shows that AFM+S
                                                                             2007. The Cognitive Tutor: Applied research in mathematics
better fits the data than AFM on datasets where slipping occurs
                                                                             education. Psychonomics Bulletin & Review, 14, 2: 249-
and that there are significant differences between the parameter
                                                                             255.
estimates returned by the two models. These results suggest that
researchers using the AFM model should consider transitioning to         [7] Booth, J., & Ritter, S. 2009. Self Explanation sch_a3329ee9
the AFM+S model when they suspect slipping to be occurring.                  Winter 2008 (CL). Dataset 293 in DataShop. Retrieved from
These results also showcase the capabilities of the new                      pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=293.
LearnSphere AFM+S workflow component.                                    [8] Derek Lomas. Digital Games for Improving Number Sense –
                                                                             Study 1. Dataset 445 in DataShop. Retrieved from
6. ACKNOWLEDGMENTS                                                           pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=445.
We thank Erik Harpstead, Michael Yudelson, and Rony Patel for            [9] Ruth Wylie. IWT Self-Explanation Study 1 (Spring 2009)
their thoughts and comments when developing this work. This                  (tutors only). Dataset 313 in DataShop. Retrieved from
work was supported in part by the Department of Education                    pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=313.
(#R305B090023 and #R305B110003) and by the National
                                                                         [10] Ruth Wylie. IWT Self-Explanation Study 2 (Spring 2009)
Science Foundation (#SBE-0836012). Finally, we thank Carnegie
                                                                              (tutors only). Dataset 372 in DataShop. Retrieved from
Learning and all other data providers for making their data
                                                                              pslcdatashop.web.cmu.edu/DatasetInfo? datasetId=372.
available on DataShop.
                                                                         [11] Dietterich, T. G. 1998. Approximate statistical tests for
                                                                              comparing supervised classification learning algorithms.
                                                                              Neural Computation, 10(7), 1895–1923.
                                                                              http://doi.org/10.1162/089976698300017197.