Transferring an existing gaming detection model to
          different system using semi-supervised approach
                                                            Vedant Bahel
                                                G H Raisoni College of Engineering,
                                                      Nagpur 442001, India
                                                         vbahel@ieee.org

                                                            Seth A. Adjei
                                                    Northern Kentucky University,
                                                          KY 41099, USA
                                                         adjeis1@nku.edu

                                                           Ryan S. Baker
                                                     University of Pennsylvania,
                                                          PA 19104, USA
                                                       rybaker@upenn.edu

ABSTRACT                                                              learning the material [1,8]. Research in multiple learning
Many researchers in Educational Data Mining and Learning              environments [9] has linked gaming to poor learning outcomes
Analytics have worked on models for the detection of students         [10], increased boredom [14] and lower long-term levels of
who “game the system”, a behavior in which students misuse            academic attainment [10]. Many researchers have worked on
intelligent tutors or other online learning environments to           gaming detection methods for specific systems. Both Machine
complete problems or otherwise advance without learning. Such         Learning [1,5,14] and knowledge engineering [2,3,5,13]
detectors are mostly specific to a learning system that they are      approaches have been used for this purpose. Using knowledge
based on. Researchers popularly use knowledge engineering or          engineering, researchers develop models that are designed to
machine learning approach in designing the gaming detection           reproduce the knowledge we have about a specific learning
models. In this paper, we try to transfer knowledge from an           behaviour. This is often achieved by designing a set of rules that
existing detector made for a specific learning system to another,     matches a general common-sense definition of the behaviour [3]
using an unsupervised clustering-based machine learning               or by explicitly eliciting knowledge from an expert about how
approach. The goal is to check if the existing detector can be        they determine whether a student is exhibiting a specific
generalized across multiple learning systems with. Specifically,      behaviour. Most knowledge engineering models of gaming try to
we evaluate how well a gaming detector previously created for         identify two main gaming types: help abuse [12] and systematic
Cognitive Tutor Algebra functions adapts to a new learning            guessing [11]. Help abuse has mainly been modelled using
system, ASSISTments. The results obtained were not very               behaviours that include copying the answer from a hint and
satisfactory and have been discussed thoroughly in this paper.        repeated help requests. Systematic guessing has been defined
                                                                      operationally as the behaviour of quickly answering questions
Keywords                                                              after the error [2,4,13,15] and making successive errors [5]. A
                                                                      primary advantage of knowledge engineering is that, unlike
Gaming the system, Transfer Learning, Clustering, Semi-               machine learning, it does not require a large amount of coded data
supervised learning, ASSISTments, Cognitive Tutor.                    providing examples of students’ behaviours since the knowledge
                                                                      is acquired directly from experts. However, often KE models
1. INTRODUCTION                                                       focus only on 1-2 patterns of gaming [3,5], and it is reasonable to
In recent years, there has been considerable progress towards         question whether such a complex and ill-defined construct can be
designing methods to detect “gaming the system”. Gaming has           fully described by 2-3 simple rules [19]. Paquette et. al. worked to
been defined as a behaviour where students try to succeed by          develop a knowledge engineered model by identifying certain
exploiting the functionalities of a learning environment instead of   pattern features of student action that relate directly to gaming
                                                                      behaviour as observed by human experts.

                                                                      On the other hand, machine learning approaches attempt to
                                                                      resolve the challenge of implicit expertise by leveraging data
                                                                      driven algorithms to discover models from positive and negative
 Copyright © 2021 for this paper by its authors. Use
 permitted under Creative Commons License Attribution                 examples of a student's behaviour. Using this approach, a large
                                                                      amount of data is automatically inspected to find relationships
 4.0 International (CC BY 4.0)
                                                                      between the students’ fine-grained actions and higher-level
behaviours, avoiding the need to explicitly elicit knowledge about      student is given the answer (Refer Figure 1). Paquette’s model is
the behaviour [16]. In [20], Baker et al discusses machine learning     knowledge engineered on the data obtained from 59 students who
approaches to detect gaming the system. Specifically, the research      used CTA as a part of their regular mathematical curriculum. Data
discusses two primary methods for detecting gaming in Cognitive         from 12 tutor lessons was obtained and segmented in sequences of
Tutor: Latent response model and J48 decision tree. Baker et al in      5 actions, called clips, illustrating the student's behaviour. A total
[21] also uses step regression for detecting gaming in SQL-Tutor        of 10,397 clips from this dataset were randomly selected; the
system.                                                                 chance of a clip being selected was weighted for each lesson
                                                                        according to the total number of clips in that lesson. Those clips
Several researchers have attempted to apply transfer learning to        were previously coded by an expert to develop machine-learned
the problem of gaming detection across systems. In this context,        gaming models and contains 708 examples of gaming the system
Torrey and Shavlik define transfer learning “as the improvement         and 9,689 examples of behaviours that were not coded as gaming.
of learning in a new task through the transfer of knowledge from a
related task that has already been learned” [17]. Transfer learning
has been shown to improve the performance of machine learning
models where there is limited data [18]. The approach aims to
recognize knowledge in the source model and transfer it to the
target model. In this research, the source model used for gaming
detection is Paquette et. al [4] knowledge engineered gaming
detector model built on Cognitive Tutor Algebra (CTA) learning
system and the target model is built for the ASSISTments system
using a clustering-based semi-supervised approach.

                                                                         Figure 1. A student requested multiple hints in the Cognitive
In [8], Paquette et. al successfully attempted to generalize the
                                                                          Tutor Algebra system finally has been prompted with the
gaming detector cognitive model into a learning system                                         correct answer.
(Cognitive Tutor Middle School and ASSISTments) with a KE
approach. Generalization is important because the cost of building
detectors is high and there are hundreds of systems that could
benefit from including detectors of this type. Generalization of
detectors would make them widely useful across systems. In this
paper we attempt to answer the following question: How well
does Paquette’s transfer learning apply to a new dataset? Could
the labelling be recovered if we applied an unsupervised learning
technique like clustering? Answer these questions will imply that:
     1. Paquette’s gaming detection algorithm is truly
           transferable across systems (ASSISTments & Scatter
           Plot lesson of Cognitive Tutor for Middle School Math),
           and
     2. The characteristics of student gaming actions can be
           detected, even with unsupervised techniques, and are
           truly system agnostic.

2. DATASET & BACKGROUND
For this research, we used data collected from two systems:
Cognitive Tutor and ASSISTments. In this section, we describe
each of the systems and provide a description of the datasets that
were used.

2.1 Cognitive Tutor Algebra
The source model used in this paper for knowledge transfer is
Paquette’s knowledge engineered model for gaming detection [4].
This model is based on data from the Cognitive Tutor Algebra
(CTA) system [7]. The CTA system examines students on
advanced mathematical problems and records multiple parameters
of the student's learning and question-answer process. Cognitive
Tutors are a type of interactive learning environment which uses
cognitive modelling and artificial intelligence to adapt to
individual differences in student knowledge and learning. The             Figure 2. A screen showing a student getting “tutoring” to
Cognitive Tutor environment breaks down each mathematics                   help the student figure out how to solve a question in the
problem into the steps of the process used to solve the problem,                             ASSISTments system.
making the student’s thinking visible. If a student is struggling, he
or she can also request a hint. When the student requests a hint,       2.2 ASSISTments
the system first gives a conceptual hint. The student can request       The second dataset that we used was collected from the
further hints, which become more and more specific until the            ASSISTments learning system [6], an online system for teachers
to assign math homework to students and review student                   3.1 Seeding clusters
performance as they complete the assignments. This system is             Though clustering is an unsupervised machine learning method,
similar in many ways to CTA. The ASSISTments dataset contains            we seeded one of the clusters. making it a semi-supervised
data collected from 1,367 students’ interactions with the system.        approach. In traditional k-means clustering , a random set of
This dataset was used to test the generalizability of the gaming         centroids is chosen and further refined after several iterations of
model created from the CTA system. This data includes a total of         the k-means algorithm. In this paper we assign initial centroids
822,233 problem solving actions, which were segmented into               based on our prior knowledge of the gaming labels in the dataset,
240,450 clips (series of action). But unlike CTA, in                     a process we call cluster seeding. The seeding of calculated
ASSISTments, when students are presented with an “original''             parameters adds latent knowledge to the un-supervised approach
problem, they only need to provide its final answer. Individual          and thereby making it semi-supervised.
steps are not required of students who solve the problem on the
first attempt. However, students who do not provide the correct          3.2 Implementation
answer may be required to correctly answer scaffolding questions         For the overall goal of transfer learning, we first ran IBKE
to successfully complete the problem. Thus, ASSISTments                  (originally developed for the Cognitive Tutor) on the
provide an option of scaffolding and hints to students. Thus,            ASSISTments dataset and got the IBKE label for that dataset. The
ASSISTments problems can be solved in one step if the student’s          next goal was to use the clustering with IBKE labels as seeds for
first attempt is correct. As such, a specific clip in this system        the ASSISTments dataset. For the same, the average values of the
could have an arbitrarily large number of actions. All the clips         features were calculated for data points with IBKE labelled as
with more than 25 actions were removed, since those constituted          gaming and non-gaming, respectively. K-means clustering was
0.7% of the data and could have caused serious bias towards a            used to determine the naturally occurring groupings in the dataset,
different gaming pattern that was being identified by the expert.        using IBKE’s labels to seed the cluster generation algorithm. In
Thus, the resulting dataset consisted of 1060 clips labelled by the      doing so, we experimented with values of k ranging from 2
human expert which constituted 64 gaming clips (6.02%) and 996           through 9. This range of values was chosen due to the small size
non-gaming clips (93.70%) [1, 6, 8].                                     of the dataset. In each case, one cluster was seeded as a gaming
                                                                         cluster and the other clusters were seeded as non-gaming. In other
2.3 Paquette’s cognitive model (IBKE)                                    words, for each value of k, all the student actions which IBKE
The cognitive gaming detection model by Paquette et. al [4] is a         labelled as gaming were initially assigned to a single cluster, and
knowledge engineered model based on how a human expert                   the k-1 non-gaming clusters were created by randomly dividing
evaluates gaming behaviours exhibited by a student in a clip. The        the IBKE non-gaming data points into k-1 groups. We then run
model implemented was developed using data collected from                the k-means algorithm with the aim of detecting whether the
Cognitive Tutor Algebra (described earlier in this paper) and            gaming actions will end up within the same cluster after k-means
interview to analyse how an expert observes gaming behaviour.            converges.
Results indicated that the expert’s coding method could be
classified into two cognitive processes: interpreting the student’s
individual actions and identifying patterns of gaming across those       Each clustering was evaluated using recall and precision, based on
actions. Although the expert executes these in parallel, the             the cluster a point was assigned to and the actual gaming label
resulting cognitive model executes these as consecutive steps            from the coder. These metrics were chosen based on the fact that
without changing the fundamental reasoning process. As a result,         k-means clustering naturally generates a categorical classification
13 patterns of action were found to be associated with gaming            rather than a probability.
behaviour, each matching a predefined set of gaming constituents
identified in [1]. Finally, the model labelled any clip containing       The         code      repository        can      be        found
actions that match any of those 13 patterns as gaming. This model        https://github.com/vedantbahel/clustering-gaming-detection-edm.
is referred to as “Interview-Based Knowledge Engineering”
(IBKE) through this paper. It must be noted that we labelled             4. RESULT & DISCUSSION
Paquette’s model as such,                                                The results were inferred by comparing the labels obtained by
                                                                         clustering in ASSISTments with the original (ground truth) labels
3. METHOD                                                                by a human expert, as in [1]. The results of the k-means clustering
We implemented a clustering-based semi-supervised approach to            scheme is shown in the table below encoded as K#, where #
extract patterns identified by the IBKE in CTA and transfer it to        represents the number of clusters.
the ASSISTments dataset. In this approach, the gaming construct
was first transferred between systems, as-is. Then clustering was            Table 1. Performance of the various models across the
used to refine the gaming construct, to re-center it after bringing it                        clustering scheme
between data sets. We consider k-means clustering algorithm. k-
means is a popularly used clustering algorithm where ‘n’ clusters           Clustering Scheme           Recall             Precision
are created with random centroids. This algorithm is based on the
nearest distance method. All the data points in the dataset get             IBKE ASSISTment              0.484               0.234
allocated to the cluster with the least distance to the centroid.
Once all the points are associated with different clusters. The                     K2                   0.406               0.0704
mean value of features is re-calculated for each cluster and this
mean is allocated as the new centroid. This is done until no cluster
                                                                                    K3                   0.343               0.0721
changes its value after re-calculation. Thus, each centroid creates
segments in the data space like cells in a Voronoi diagram.
                                                                                    K4                   0.343               0.0698
           K5                    0.328               0.0766             7. REFERENCES
                                                                        [1] Baker, R. S., Corbett, A. T., Roll, I., & Koedinger, K. R.
                                                                            (2008). Developing a generalizable detector of when students
           K6                    0.281               0.0810
                                                                            game the system. User Modeling and User-Adapted
                                                                            Interaction, 18(3), 287-314.
           K7                    0.312               0.0738
                                                                        [2] Beal, C. R., Qu, L., & Lee, H. (2006, June). Classifying
                                                                            learner engagement through integration of multiple data
           K8                    0.140               0.0638
                                                                            sources. In AAAI (pp. 151-156).
           K9                    0.156               0.3703             [3] Muldner, K., Burleson, W., Van de Sande, B., VanLehn, K.
                                                                            (2011). An Analysis of Students’ Gaming Behaviors in an
                                                                            Intelligent Tutoring System: Predictors and Impact. User
                                                                            Modeling and User Adapted Interactions, 21, pp. 99–135.
For comparison, we also display the result of IBKE by comparing
                                                                        [4] Paquette, L., de Carvalho, A. M., & Baker, R. S. (2014,
IBKE labels to the ground-truth labels [4].
                                                                            July). Towards Understanding Expert Coding of Student
                                                                            Disengagement in Online Learning. In CogSci..
As it can be seen in Table 1, both the performance metrics
decreased with increasing numbers of clusters, except for K9. The       [5] Walonoski, J. A., & Heffernan, N. T. (2006, June). Detection
model generally performed substantially better before using                 and analysis of off-task gaming behavior in intelligent
clustering to shift the concept, suggesting that our approach was           tutoring systems. In International Conference on Intelligent
unsuccessful.                                                               Tutoring Systems (pp. 382-391). Springer, Berlin,
                                                                            Heidelberg.
5. CONCLUSION & FUTURE SCOPE                                            [6] Heffernan, N. T., & Heffernan, C. L. (2014). The
In this paper, we discussed our semi-supervised clustering-based            ASSISTments ecosystem: Building a platform that brings
approach to evaluate how well an existing gaming detector                   scientists and teachers together for minimally invasive
designed for Cognitive Tutor Algebra (CTA) system adapts to                 research on human learning and teaching. International
ASSISTments. We have considered Paquette et al’s gaming                     Journal of Artificial Intelligence in Education, 24(4), 470-
detector [4] (initially designed for CTA) as the source model for           497.
our transfer. Our approach was to consider knowledge from the
previous system as a seed for clustering models.                        [7] Ritter, S., Anderson, J. R., Koedinger, K. R., & Corbett, A.
                                                                            (2007). Cognitive Tutor: Applied research in mathematics
In conclusion, none of the clustering schemes was able to truly             education. Psychonomic bulletin & review, 14(2), 249-255.
outperform IBKE, thus seeding did not truly help with transferring      [8] Paquette, L., Baker, R. S., de Carvalho, A., & Ocumpaugh, J.
the knowledge. Some of the possible reasons for poor                        (2015, June). Cross-system transfer of machine learned and
performance might be:                                                       knowledge engineered models of gaming the system. In
                                                                            International Conference on User Modeling, Adaptation, and
(i) imbalanced data points in each category i.e., 64 gaming and             Personalization (pp. 183-194). Springer, Cham.
996 non-gaming data points.
                                                                        [9] Cocea, M., Hershkovitz, A., Baker, R. S. J. d. (2009). The
(ii) the nature of the clustering algorithm and how well it fits with
                                                                            Impact of Off-Task and Gaming Behaviors on Learning:
the data.
                                                                            Immediate or Aggregate? Proc of AIED 2009, 507-514.
                                                                        [10] San Pedro, M. O. Z., Baker, R. S. J. d., Bowers, A., J.,
The current findings have not been very conclusive. This suggests
                                                                             Heffernan, N. T. (2013). Predicting College Enrolment from
that further work needs to be carried out to comprehensively
                                                                             Student Interaction with an Intelligent Tutoring System in
answer the research questions we posed. For next steps, we plan
                                                                             Middle School. Proc of EDM 2013, 177-184
to follow up on other parametric and nonparametric clustering
algorithms. Although we did try Expectation-Maximization (EM)           [11] Baker, R. S. J. d., de Carvalho, A. M. J. A. (2008). Labeling
based gaussian mixture clustering, it was unsuccessful and                   Student Behavior Faster and More Precisely with Text
showed poorer results. We plan to try other parametric (like                 Replays. Proc of EDM 2008, 38-47.
DENCLUE, DBSCAN, etc) and nonparametric techniques (like                [12] Aleven, V., McLaren, B. M., Roll, I., Koedinger, K. R.
hierarchical, density-based clustering techniques) and look more             (2006). Toward Meta-Cognitive Tutoring: A Model of Help
into the k-means clustering method to understand how cluster                 Seeking with a Cognitive Tutor. Int'l Journal of Artificial
shifts in k-means and why it is failing in the current approach. We          Intelligence in Education, 16, 101-130.
plan to study the data points which are now being identified as
gaming to see what characterizes the false positives. Another           [13] Johns, J., & Woolf, B. (2006, January). A dynamic mixture
reason for the poor results could be class imbalance, as discussed           model to detect student motivation and proficiency. In AAAI
earlier. Some data pre-processing could potentially give a solution          (pp. 163-168).
to that problem.                                                        [14] Baker, R. S., D'Mello, S. K., Rodrigo, M. M. T., & Graesser,
                                                                             A. C. (2010). Better to be frustrated than bored: The
6. ACKNOWLEDGMENTS                                                           incidence, persistence, and impact of learners’ cognitive–
We would like to thank Luc Paquette for his support during this              affective states during interactions with three different
research.                                                                    computer-based learning environments. International
                                                                             Journal of Human-Computer Studies, 68(4), 223-241.
[15] Gong, Y., Beck, J., Heffernan, N. T., Forbes-Summers, E.      [19] Shih, B., Koedinger, K. R., & Scheines, R. (2011). A
     (2010). The Fine-Grained Impact of Gaming (?) on Learning.         response time model for bottom-out hints as worked
     Proc of ITS 2010, 194-203.                                         examples. Handbook of educational data mining, 201-212.
[16] Paquette, L., & Baker, R. S. (2019). Comparing machine        [20] Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R.,
     learning to knowledge engineering for student behavior             Aleven, V., Cocea, M., Hershkovitz, A., de Carvalho,
     modeling: a case study in gaming the system. Interactive           A.M.J.B., Mitrovic, A., Mathews, M. (2013) Modeling and
     Learning Environments, 27(5-6), 585-597.                           Studying Gaming the System with Educational Data Mining.
[17] Torrey, L., & Shavlik, J. (2010). Transfer learning. In            In Azevedo, R., & Aleven, V. (Eds.) International Handbook
     Handbook of research on machine learning applications and          of Metacognition and Learning Technologies. pp. 97-116.
     trends: algorithms, methods, and techniques (pp. 242-264).         New York, NY: Springer.
     IGI global.                                                   [21] Baker, R. S., Mitrović, A., & Mathews, M. (2010, June).
[18] Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O.        Detecting gaming the system in constraint-based tutors. In
     (2020). Transfer learning from deep neural networks for            International Conference on User Modeling, Adaptation, and
     predicting student performance. Applied Sciences, 10(6),           Personalization (pp. 267-278). Springer, Berlin, Heidelberg.
     2145.