Extending ACT-R to Tackle Deceptive Overgeneralization in
                                Intelligent Tutoring Systems
                                Marshall An
                                Carnegie Mellon University, Pittsburgh PA, 15213, USA


                                               Abstract
                                               This research extends the ACT-R cognitive architecture to tackle deceptive overgeneralization within Intelligent Tutoring
                                               Systems (ITS). Existing adaptive learning technologies, while effective, rely on learning data that may not fully capture
                                               the nuances of learner understanding, particularly in cases of deceptive overgeneralization. This phenomenon occurs
                                               when learners exhibit correct actions during monitored learning sessions, yet these actions are grounded in an incomplete
                                               understanding of the necessary conditions. Due to the reliance on observed correctness, ITS may falsely assess mastery,
                                               potentially ceasing to provide further necessary practice opportunities that could aid in the refinement of understanding.
                                               This study aims to identify ITS designs that may inadvertently foster such misconceptions and to develop methods for their
                                               detection, diagnosis, and correction. Utilizing experimental designs, think-aloud protocols, and educational data mining,
                                               the research seeks to refine the adaptivity of ITS and enable more accurate assessments of true skill mastery. This work
                                               contributes to Technology-Enhanced Learning (TEL) by enhancing the precision of automated assessments and supporting
                                               more reliable adaptive learning experiences.

                                               Keywords
                                               Adaptive Learning, Intelligent Tutoring Systems, Instructional design, Feedback, Educational Data Mining, Bayesian Knowl-
                                               edge Tracing


                                1. Introduction                                                                                         Learning (TEL) environments, especially those utilizing
                                                                                                                                        Intelligent Tutoring Systems (ITS) with adaptive capabil-
                                Adaptive learning technologies, powered by learning data                                                ities that dynamically select practice problems based on
                                and dynamically adjusting to individual learner needs,                                                  estimated skill mastery, might amplify the issue of decep-
                                have proven effective across various educational settings                                               tive overgeneralizations. Such environments may prema-
                                [1]. However, by definition, any type of adaptivity relies                                              turely cease providing further necessary practice oppor-
                                on data reflecting student learning [1, p. 523]. The ac-                                                tunities that aid in the refinement of understandings, leav-
                                curacy and completeness of learning data are therefore                                                  ing these inaccuracies unaddressed. The consequences
                                critical. There are instances, however, where the learn-                                                of failing to detect and address deceptive overgeneraliza-
                                ing data may fall short, particularly in cases of deceptive                                             tions can extend beyond academic performance, poten-
                                overgeneralization.                                                                                     tially affecting long-term educational pathways, career
                                   Deceptive overgeneralization describes an undesired                                                  trajectories, and in some cases, leading to dire conse-
                                learning state wherein a learner acquires a relevant but                                                quences.
                                incomplete subset of the conditions necessary for a skill,                                                 My doctoral research aims to investigate the mecha-
                                yet manages to perform the correct actions. Such over-                                                  nisms of deceptive overgeneralization by applying and ex-
                                generalization is “deceptive”, as it can lead to seemingly                                              tending the well-established cognitive architecture, Adap-
                                satisfactory performance during scrutinized learning ses-                                               tive Control of Thought – Rational (ACT-R) [2, 3]. This
                                sions, as the learner’s observable actions align with those                                             study aims to uncover how certain designs of ITS might
                                of individuals who have accurately mastered the skill.                                                  overlook subtle instances of deceptive overgeneraliza-
                                However, these actions are based on a flawed understand-                                                tion and to investigate design principles that can detect
                                ing of the underlying conditions.                                                                       and remedy them. Ultimately, my research seeks to con-
                                   Deceptive overgeneralization poses a significant chal-                                               tribute to the advancements of adaptive learning tech-
                                lenge, leading to false evaluations of mastery, which                                                   nologies, enhancing their effectiveness as educational
                                drives adaptivity. This can mislead learners, instructors,                                              solutions.
                                and researchers into getting prematurely convinced that
                                a skill has been mastered. Many Technology-Enhanced

                                Proceedings of the Doctoral Consortium of the 19th European Confer-
                                ence on Technology Enhanced Learning, 16th September 2024, Krems
                                an der Donau, Austria
                                Envelope-Open haokanga@andrew.cmu.edu (M. An)
                                Orcid 0009-0005-5165-640X (M. An)
                                         © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                         Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Literature Review                                           solving step against the possible actions generated by the
                                                               cognitive model, in order to provide individualized, just-
2.1. Adaptive Control of Thought –                             in-time learning support tailored to the learners’ specific
     Rational (ACT-R)                                          approach to a problem [16, p.142].
                                                                  In the task-loop, ITS employ knowledge tracing algo-
ACT-R, a cognitive architecture for understanding and          rithms such as Bayesian Knowledge Tracing (BKT) [17]
modeling human cognitive processes, posits that cog-           to dynamically adjust problem sequences based on real-
nitive behaviors are orchestrated by productions [4, 3].       time assessments of learner mastery. Each time a learner
A production can be represented as a condition-action          attempts a step in a practice problem, the system updates
pair [2, p.5], with the condition part specifies the circum-   its estimate of the learner’s mastery of the relevant pro-
stances under which the production can apply, and the          duction rule based on the correctness of the learner’s
action part specifies what should be done when produc-         action [16, p.143]. This ongoing assessment allows ITS
tion applies [5, p.3]. ACT-R has significantly influenced      to dynamically tailor the sequence of problems, ensuring
the development of ITS, which delivers personalized tu-        that each practice opportunity aligns with the learner’s
toring by adapting to the unique learning needs of each        current skill level and learning trajectory. When the sys-
learner. Empirical studies underpinning ACT-R have             tem reaches a high degree of certainty, typically exceed-
led to a proliferation of ITS that successfully enhance        ing a predefined threshold (e.g., 95%) [16, p.144], about
learning outcomes across diverse educational settings          a student’s mastery of a skill through repeated observa-
[6, 7, 8]. These systems, particularly cognitive tutors, re-   tions of correct actions, it ceases presenting tasks related
quire the development and integration of domain-specific       to that skill. This automated stopping rule optimizes
cognitive models that adhere to the ACT-R framework,           the balance between learning time and effort, preventing
to capture various learner strategies and potential mis-       overpractice and maximizing educational efficiency.
conceptions.                                                      Furthermore, the design-loop adaptivity involves data-
                                                               driven instructional (re)design, before and between itera-
2.2. Adaptivity of Intelligent Tutoring                        tions of ITS development, informed by learning data [1,
     Systems                                                   p.526].
                                                                  However, the adaptivity of ITS is not without limita-
Adaptive learning, fundamental to ITS efficacy, is sup-        tions. One key challenge lies in addressing deceptive
ported by various theoretical perspectives, such as Vy-        overgeneralization—where learners perform correct ac-
gotsky’s zone of proximal development [9], the cognitive       tions based on a flawed understanding of underlying
apprenticeship model [10], the expertise reversal effect       conditions. This phenomenon challenges the assessment
[11], and the assistance dilemma [12]. The efficacy of ITS     models of ITS, which typically rely on differentiating
in improving learning outcomes is largely attributable         between correct versus incorrect actions to gauge mas-
to its adaptivity, which allows for personalized learning      tery. As such, deceptive overgeneralization presents an
based on individual learner progress and needs.                intriguing area for further research.
   Adaptivity is not a binary property, but rather “a mat-
ter of degree” [1, p.523]. ITS distinguish themselves by
adapting across all three major time scales defined by the     3. Deceptive Overgeneralization as a
Adaptivity Grid: step, task, and design [1, p.525].               Possible Learning State
   Within the step-loop, ITS provides timely and targeted
feedback at each problem-solving step. Indeed, timely          Learning is typically characterized by a gradual and con-
feedback is critical to enable the learners to continu-        tinuous process rather than sudden transformative in-
ously monitor their learning and evaluate their problem-       sights [18]. The Knowledge-Learning-Instruction (KLI)
solving strategies and their current understanding [13].       framework views learning as the acquisition of Knowl-
The positive effects of feedback are well supported by the     edge Components (KCs), which are acquired units of
rich wealth of evidence in the literature review by Shute      cognitive functions or structures [19]. The KLI frame-
[14]. Feedback is most effective when it clearly highlights    work identifies induction and refinement as one primary
discrepancies between a learner’s current performance          type of learning processes, particularly for acquiring KCs
and the desired outcome, while offering actionable guid-       associated with variable conditions: for KCs with condi-
ance to help learners meet specific target criteria [15,       tions that can vary in form or value, learners must induce
p.139]. ITS embody these best practices of feedback, by        and refine KCs so that the acquired KCs are “accurate,
detecting and diagnosing observable discrepancies be-          appropriately general, and discriminating” [19]. As we
tween expected and actual actions at each step. With           consider the induction and subsequent refinement of a
a developed cognitive model, a cognitive tutor employs         KC as a continuous learning progression, learners may
model tracing to compare learner actions at each problem-      initially acquire an inaccurately generalized version of
Table 1
Examples of Correct and Inaccurate Generalization in Knowledge Components Across Various Disciplines
 Discipline / Topic                Correct KC                                    Referenced Inaccurate KC
 Math / Geometry                   IF the triangle is isosceles AND two angles   IF the triangle is isosceles AND two angles
                                   are at the base of the triangle THEN the      THEN the two angles are equal [20]
                                   two angles are equal
 Language / English Articles       IF single mountain name THEN zero article     IF mountain name THEN zero article [21]
 Statistics / Data Visualization   IF categorical data THEN choose pie chart     IF demographic data THEN choose pie
                                                                                 chart [22]


the target KC. This initial misunderstanding may either        overgeneralization involves learners who, during closely
be refined into an accurate KC through further practice,       monitored learning sessions, apply correct actions that
or it may persist as inaccurate due to a lack of practice      are based on incomplete understanding of the necessary
opportunities that support the refinement process.             conditions. These learners may later inappropriately ap-
                                                               ply these actions under unsuitable circumstances, often
3.1. Modeling of Deceptive                                     beyond the scrutiny of the initial learning. This high-
                                                               lights why deceptive overgeneralization is particularly
     Overgeneralization                                        “deceptive”: learners are still observed to take correct
A KC connects features of a problem to a corresponding         actions, despite their misconceptions.
response. A learner has acquired a KC that is considered          Furthermore, my research differs from prior studies
accurate, or “with high feature validity”, when all of the     that have primarily focused on distinguishing between
features are relevant to making the response and none          superficial and deep features in learning. Superficial fea-
of them are irrelevant [23]; otherwise, a KC is inaccurate     tures, also known as shallow or surface features, are
and requires further refinement. Inaccurate generaliza-        those that do not contribute to correct solution pathways
tion could be overgeneralization, undergeneralization, or      [22, 27, 28]. For example, a learner chose to use a pie chart
even more nuanced a mix of them. Indeed, inaccurate            because the data is demographic (superficial) rather than
generalization is a common phenomenon observed in              categorical (deep) [22]. In contrast, my research inves-
learning sciences research across various disciplines. Ta-     tigates scenarios in which learners take correct actions
ble 1 presents examples of incorrect generalization, along     based on a relevant yet incomplete set of features. Im-
with their corresponding accurate KCs, drawn from re-          portantly, unlike superficial features, all these features
search literature. Among these, deceptive overgeneral-         belong to the correct solution pathways, thereby making
ization is particularly intriguing to investigate.             the learners’ understanding appear deceptively correct.
   In ITS, specifically those developed using Cognitive
Tutor Authoring Tools (CTAT) [24, 25], each production’s       3.2. Stickiness of Deceptive
condition-action pair is structured as an IF-THEN state-
                                                                    Overgeneralization
ment [26]: IF <condition> THEN <action> . Overgen-
eralization occurs when a learner acquires production          The KLI framework delineates a relationship between ob-
rules whose IF part is overly broad compared to the cor-       servable and unobservable events: instructional events,
rect IF part. In computational or logical terms, overgen-      learning events, and assessment events [19]. Instruc-
eralization can happen due to the omission of logical AND      tional events cause learning events, which are unobserv-
operators in the IF part. Consider a target KC requiring       able processes that result in changes in KCs, such as
multiple conditions for its activation, represented as IF A    acquisition of new KCs or refinement of existing KCs.
AND B THEN <action> . Overgeneralization might arise           The changes of KCs, in turn, cause learner performances
when a learner acquires a KC that omits part of the con-       that are observable during assessment events. Given that
ditions, resulting in IF A THEN <action> .                     learning events are central yet unobservable, assessments
   It is crucial to distinguish the phenomenon of deceptive    are expected to be designed with the quality to accurately
overgeneralization from the broader concept of “miscon-        reflect the true nature of learning events. However, in
ceptions.” Consider a simple algebra problem: Anderson         cases of overgeneralization, certain designs may fail short.
describes an observation that a student incorrectly solves     Using set theory, overgeneralization can be visualized as
the equation 2𝑥 = 6 by subtracting 2 from both sides,          an inclusion relation and we can identify a specific type
erroneously resulting in 𝑥 = 4 instead of 𝑥 = 3 [18]. Such     of potential design flaw, as depicted in Figure 1.
misconceptions lead to actions that are clearly incorrect,        Many TEL environments, particularly those involv-
allowing for immediate observation, feedback provision,        ing ITS, leverage automated evaluation and feedback
and tailored subsequent training. In contrast, deceptive       mechanisms to deliver learning at scale. The reliance
                                                                      model suggests, while competence develops in a more-or-
                       OvergeneralizedIF                              less linear fashion, consciousness initially increases and
                                                                      then decreases, as both novices (in Stage 1) and experts
                                                                      (in Stage 4) operate in states of relative unconsciousness,
                               CorrectIF                              though for vastly different reasons [15, p.97]. I contend
                                                                      that deceptive overgeneralization may occur during any
                                                                      stage transition, including transitions towards Stage 4.
                                                                      Experts, as they develop their proficiency and automatic-
Figure 1: Overgeneralization occurs when a learner acquires           ity, may also be prone to forming inaccurate heuristics
production rules whose IF part is a superset of the correct           and cognitive shortcuts to enable fast task completion.
rule’s IF part, covering an overly extended range. This rela-             An example demonstrating that experts can form de-
tionship can be expressed as OvergeneralizedIF ⊇ CorrectIF.           ceptive overgeneralization, and that deceptive overgen-
Cross marks within CorrectIF represent practice activities that       eralization can lead to severe consequences, is the Cros-
cannot test for overgeneralization. If all practice activities fall   sair Flight 498 Crash. The official incident investigation
within CorrectIF, focusing solely on correct actions, the in-         report identified one human factor probable cause as fol-
structional design will fail to identify whether learners have        lows: “when interpreting the attitude display instruments
acquired the correct rule or an overgeneralization.                   under stress, the commander resorted to a reaction pat-
                                                                      tern (heuristics) which he had learned earlier” [29, p.10].
                                                                          As demonstrated in Figure 3, a Soviet attitude dis-
on these automated mechanisms can pose challenges for                 play indicates a left roll of the airplane with a counter-
all stakeholders regarding deceptive overgeneralization.              clockwise rotation. The appropriate response, detailed
TEL tools might mistakenly provide positive feedback                  in Algorithm 1, is to stabilize the airplane by rotating it
to learners who perform correct actions based on an                   right. This rule acts as a cognitive shortcut that simpli-
inaccurate understanding of conditions, inadvertently                 fies decision-making by minimizing the cognitive load
reinforcing misconceptions. Instructors and researchers               needed to interpret the display. However, errors can arise
employing learning analytics or educational data mining               if this shortcut is overgeneralized, omitting the condition
are similarly at risk of being misled by seemingly satis-             that it should only apply to Soviet displays, leading to
factory learning data, potentially missing opportunities              incorrect responses with other types of attitude displays.
for intervention and correction that address learners’ in-
correct understandings. Moreover, ITS, with its adaptive Algorithm 1 Correct Production Rule for Interpreting
capabilities that dynamically select practice problems and (Soviet) Attitude Display to Stabilize an Airplane
assess mastery, might amplify these issues. The reliance        if the goal is stabilize an airplane and attitude display
on observed correctness by knowledge tracing algorithms         rotates counter-clockwise and it is a Soviet display
can lead to premature conclusions about learner mastery,        then
halting further necessary practice that aids genuine skill          rotate the airplane right
development and refinement, leaving those misconcep-            end if
tions unaddressed. As what is captured and reported by
TEL tools appears correct, encouraging, and satisfactory,
                                                                For the first 20 years of his flying career, the com-
deceptive overgeneralization may be particularly “sticky”
                                                             mander received training that was “in theory compre-
and resistant to detection and change.
                                                             hensive,” exclusively at a flying school in the former
                                                             Soviet Union [29, p.18]. However, upon transitioning
3.3. Both Novices and Experts Could be                       to aircraft equipped with Western systems, no special
       Prone to Deceptive                                    differential training was provided to highlight the differ-
       Overgeneralization                                    ences between Eastern and Western systems, nor did the
                                                             commander undergo any unusual attitude training [29,
If “practice makes perfect” were true to the extent that p.19]. Therefore, the commander “had no opportunity to
well-developed expertise guarantee refined and accurate be trained in any other pattern of behavior” [29, p.96],
skills, then deceptive overgeneralization could be effec- meaning no opportunities to ever detect and correct the
tively addressed by providing ample practice opportuni- acquired deceptive overgeneralization. As the comman-
ties in favorable learning conditions. However, I argue der resorted to the overgeneralization in the scenario as
that even experts are not immune to deceptive overgen- illustrated in Figure 4, the commander kept rotating the
eralization, despite their considerable mastery of skills. airplane right (further) when the airplane was already
   Ambrose et al. [15, p.97] modeled mastery and its devel- rolling right, eventually resulting in a loss of control.
opment into four stages, as illustrated in Figure 2. As this    The acquisition of shortcuts can be modeled using the
     Do not know what            Recognize what they do not know        Act deliberately                       Act automatically
     they do not know                   and need to learn        with considerable competence                  and instinctively
            1                                   2                               3                                    4
Unconscious Incompetence             Conscious Incompetence         Conscious Competence                  Unconscious Competence

Figure 2: The Four Stages of Mastery. This model illustrates the progression from novice to expert, highlighting the
development of competence and the shifting levels of consciousness.


                           Sky
                                                                                              Sky
                                                                                                                          on
                                                                                                                      riz
                                                                                                                 Ho
                      Aircraft
                                                                                            Aircraft
             Left Wing      Right Wing
                                              Horizon                              Left Wing      Right Wing


                        Ground
                                                                                            Ground
            (a) Before counter-clockwise rotation
                           Sky                                                  (a) After counter-clockwise rotation
                                                                    Figure 4: A simplified depiction of a Western attitude display.
                                                                    The display reflects a “first-person view”, where the airplane
                                     ing                            stays fixed and the horizon rotates relative to the airplane.
                             ft    tW
                          cra Righ
                       Air                                          A counter-clockwise rotation (of the horizon relative to the
                                              Horizon               airplane) indicates that the airplane is rolling right.
                   W ing
              Left

                                                                    elimination of redundant subgoals. These optimizations
                        Ground                                      enhance the efficiency of the macro-production compared
                                                                    to the original series of separate productions [5, p.35].
             (b) After counter-clockwise rotation                      However, it is possible that even experts who have mas-
Figure 3: A simplified depiction of a Soviet attitude display.      tered accurate basic production rules may develop inaccu-
The display reflects a “third-person view”, where the horizon       rate “macro-productions” during the process of building
stays fixed, and the airplane’s position is shown relative to the   proficiency and automaticity if errors enter into the com-
horizon. A counter-clockwise rotation (of the airplane relative     pilation process. Although composition increases overall
to the horizon) indicates that the airplane is rolling left.        efficiency by pruning redundant conditions and actions,
                                                                    these composed macroproductions tend to grow larger,
                                                                    particularly with an increase in the size of the condition
process called knowledge compilation in the ACT-R the-              sides [2, p.239]. With an increasingly more complex and
ory, which serves to eliminate multiple production firings          composite condition side, it becomes more likely that
and the need for retrieval from declarative memory [4,              some conditions will be overlooked, potentially leading
p.169]. A primary compilation process, known as com-                to overgeneralization. While human compilation is grad-
position, is to takes sequences of productions that follow          ual (in contrast to computer compilation), which may
each other in solving a particular problem and collapses            provide some protection against errors of omitting con-
them into a single “macro-production” that has the effect           ditional tests from entering compilation, this protection
of the sequence [2, p.235]. For example, Algorithm 1                is not infallible and can only reduce, but not eliminate,
could be compiled as shown in Algorithm 2. These pro-               the possibility of condition omission [5, p.46].
duction rules are intentionally represented in pseudo                  Knowledge compilation in ACT-R theory suggests that
code, mimicking the implementation style of cognitive               new productions generated through knowledge compila-
tutors developed with CTAT [24]. This representation                tion do not replace, but rather coexist with old ones [2,
serves to highlight several benefits of composition: fewer          p.237]. A process known as conflict resolution then deter-
conditions and actions, fewer variables to track, and the           mines which productions to apply [2, p.132]. This raises
Algorithm 2 Knowledge Compilation for Interpreting            may be widespread, which highlights the importance of
Attitude Display to Stabilize an Airplane                     understanding their mechanisms through research.
  Rule P1:                                                       The commander’s extensive experience, amounting to
  Condition: goal == stabilizeAirplane AND rollDirec-         over 8,000 hours [29, p.15], categorizes him within Stage
  tion == unknown                                             4 of the mastery model illustrated in Figure 2, where in-
  Action: subgoal = identifyRollDirection                     dividuals are capable of acting automatically and instinc-
  Rule P2:                                                    tively. However, this incident starkly demonstrates that
  Condition: subgoal == identifyRollDirection AND dis-        such automatic actions performed by experts, when based
  playRotation == counterClockwise AND displayType            on deceptive overgeneralization, can lead to dire conse-
  == Soviet                                                   quences. A similar case, that exemplifies the dangers of
  Action: rollDirection = left                                overgeneralization in aviation training, is the American
  Rule P3:                                                    Airlines Flight 587 crash, where poorly-designed train-
  Condition: goal == stabilizeAirplane AND rollDirec-         ing led to deceptive overgeneralization, resulting in actions
  tion != unknown                                             deemed correct during training but were inappropriate for
  Action: subgoal = recoverAttitude                           actual conditions, ultimately leading to catastrophic out-
  Rule P4:                                                    comes. Specifically, the American Airlines Advanced Air-
  Condition: subgoal == recoverAttitude AND rollDirec-        craft Maneuvering Program included an excessive bank
  tion == left                                                angle simulator exercise intended to prepare pilots for
  Action: rotateAirplane(right)                               extreme wake turbulence. This equipped trainees with
  Composed Rule P1&P2&P3&P4:                                  aggressive roll upset recovery techniques. Unfortunately,
  Condition: goal == stabilizeAirplane AND displayRota-       the scenario used in training was overly extreme and
  tion == counterClockwise AND displayType == Soviet          not representative of the actual aircraft type involved.
  Action: rotateAirplane(right)                               This inappropriate training “enabled” the first officer to
  Efficiency Gain:                                            mistakenly apply these excessive techniques during a
  2 subgoals, 4 conditions, 3 intermediate cognitive ac-      moderate wake turbulence encounter, leading to the in-
  tions, and 2 variables get reduced by composition           flight separation of the vertical stabilizer and culminating
                                                              in a fatal plane nosedive [31]. It can be argued that had
                                                              the pilot not been trained to perform such aggressive ma-
                                                              neuvers, the disaster could have been entirely avoided.
the question of why the commander chose the overgen-
                                                                 In summary, acquiring a production rule that pairs cor-
eralized shortcut over the basic alternative productions.
                                                              rect actions with incorrect conditions is an undesirable
The ACT-R strengthening mechanism might provide an
                                                              learning outcome, which at best might later be rectified
explanation [2, p.250]. Production strength reflects the
                                                              without severe repercussions, and at worst, could result
frequency of successful past applications [2, p.133]. Over
                                                              in catastrophic outcomes.
the years, while flying Soviet aircraft, this shortcut—de-
spite being overgeneralized—consistently led to correct
actions within the context of Soviet attitude displays.       3.4. Summary
This increased production strength may have made this
                                                              This section presents the problem identification and ex-
shortcut the preferred choice during conflict resolution.
                                                              amination on the phenomenon of deceptive overgeneral-
   Another contributing factor to the commander’s selec-
                                                              ization through literature review and case studies, yield-
tion of the overgeneralization could be the medication
                                                              ing several key characteristics of deceptive overgeneral-
effects, which potentially limited the commander’s cogni-
                                                              ization that underscore the need for further investigation:
tive capacity [29, p.107]. The improved efficiency of the
composed shortcut may have prompted the commander                 1. Deceptive overgeneralization is prevalent across
to favor the overgeneralized macro-production over a                 various domains.
sequence of basic productions, especially under stress            2. Deceptive overgeneralization can be “sticky”, dif-
requiring immediate action, and possibly while multitask-            ficult to detect and resistant to change.
ing. Such demanding and stressful scenarios are common,           3. In certain cases, deceptive overgeneralization can
particularly in fields where individuals are considered              be worse learning outcomes than if the skill had
experts and carry critical responsibilities. Moreover, sit-          not been learned at all.
uations involving limited cognitive capacity can occur
                                                                  4. Both novices and experts could be prone to de-
to anyone. The ability to perform under conditions of
                                                                     ceptive overgeneralization.
stress, sleep deprivation, or fatigue is crucial, as is the
capability to effectively manage simultaneous secondary
tasks [30]. This indicates that overgeneralized shortcuts
Table 2
Summary of Methodologies for Each Research Question
          Research Question            Methodology
            RQ1: Formation             Experiments followed by Think-Aloud Studies; RCTs.
      RQ2: Detection and Diagnosis     RCTs
           RQ3: Remediation            RCTs
      RQ4: Retrospective Discovery     EDM techniques using both synthetic and authentic datasets


4. Research Questions                                        task-loop adaptivity. However, these systems are not
                                                             specifically designed to prevent deceptive overgeneral-
My doctoral research aims to investigate the mechanisms      ization. My experimental design draws inspiration from
of deceptive overgeneralization using the context of ITS     studies on the Einstellung effect, which describes how
and develop effective strategies for addressing deceptive    practice with a fixed method can bias individuals toward
overgeneralization. The proposed research questions          applying this method even when better alternatives ex-
are structured to methodically examine the formation,        ist [33]. In my experiments, learners will practice using
detection, remediation, and retrospective discovery of       ITS until they have achieved mastery as deemed by ITS.
deceptive overgeneralization:                                Subsequently, these learners will face tasks where the
   RQ1: Formation of Deceptive Overgeneralization.           actions they have learned are no longer suitable. As my
What types of production rules are most susceptible to       research contends that ITS may have limitations when
deceptive overgeneralization? Under what conditions do       it comes to accurately assessing true skill mastery, the
ITS risk promoting deceptive overgeneralization?             research plan will incorporate qualitative data collected
   RQ2: Detection and Diagnosis of Deceptive Over-           through think-aloud studies [34]. Specifically, “graduated
generalization. What features can be integrated into         novices”—learners who have completed training and are
ITS to detect and diagnose deceptive overgeneralization?     judged by the ITS to have mastered the content—will ver-
   RQ3: Remediation of Deceptive Overgeneraliza-             balize their understanding of the conditions during these
tion. What instructional strategies are effective at cor-    sessions, in order to identify instances of deceptive over-
recting deceptive overgeneralization?                        generalization. Next, to ascertain under what conditions
   RQ4: Retrospective Discovery of Past Deceptive            ITS may inadvertently promote deceptive overgeneral-
Overgeneralization. Can Educational Data Mining              ization and to identify which features of instructional
(EDM) techniques discover previously undetected decep-       design are most susceptible to fostering these errors, my
tive overgeneralization from existing education datasets?    research plan includes conducting randomized controlled
                                                             trials (RCTs) that compare different ITS interface designs
                                                             and problem sequencing.
5. Methodology                                                  RQ2: Detection and Diagnosis of Deceptive Over-
This section has outlined the research methodologies         generalization. To investigate features that can be inte-
corresponding to each of the research questions guid-        grated into ITS for effectively detecting and diagnosing
ing my doctoral study. To rigorously investigate the         deceptive overgeneralization, RCTs will be conducted
phenomenon of deceptive overgeneralization, a diverse        to compare different ITS interface designs and problem
methodological approach will be employed. The methods        sequencing.
range from experiments, think-aloud studies, and EDM            Traditionally, ITS interfaces are designed to guide
techniques, as summarized in Table 2.                        learners toward correct actions, potentially neglecting
   RQ1: Formation of Deceptive Overgeneraliza-               interface elements which represent potential incorrect
tions. The initial step in my research is to evaluate        actions that learners should avoid, as these elements do
the hypothesized design flaw, as illustrated in Figure 1.    not belong to the prescribed solution pathway. Con-
This hypothesis suggests that when a series of practice      sequently, learners might attempt to perform incorrect
activities only evaluate whether learners have performed     actions but find themselves unable to do so, making those
the expected actions, such instructional designs may not     mistakes undetected, uncorrected, and unlogged. One
adequately determine whether learners have internalized      hypothesized effective design is to provide practice op-
the correct rule or an overgeneralization.                   portunities where “lack of action” is the correct response.
   My research strategy includes conducting experiments      Although detecting non-actions poses more challenges
with ITS that adhere to best practices in ITS design, such   than evaluating actions, we may consider ITS design
as cognitive model development through Cognitive Task        that incorporates interface elements that learners should
Analysis (CTA) [32], tailored hints and feedback, and        avoid interacting with, in order to make “lack of action”
                                                             observable and test whether learners can appropriately
refrain from actions when the conditions do not warrant   tribute to the TEL community if there is evidence that the
them. This approach is similar to including distractor op-research findings can also generate actionable insights
tions in multiple-choice questions (MCQs), where learn-   using existing datasets. Therefore, the last research ques-
ers must correctly identify and decide against choosing   tion focuses on retrospective analysis to discover past
such options. Of course, the expertise reversal effect [11]
                                                          deceptive overgeneralizations, using learning datasets
suggests that such distractor interface elements should   already collected through standard procedures. My re-
only be introduced when learners have reached a certain   search plans to employ learning curve analysis facilitated
level of skill mastery, to ensure that cognitive workload by DataShop [35], which graphically represents changes
remains manageable.                                       in learner performance, visualizing any improvement or
   RQ3: Remediation of Deceptive Overgeneraliza-          stagnation as learners engage in repeated practice oppor-
tion. Similar to RQ2, RCTs that compare different ITS     tunities [36]. ITS systems developed with CTAT, which
interface designs and problem sequencing will be con-     typically store learning logs in DataShop, which are ready
ducted. One instructional design hypothesized to be       candidates for retrospective analysis.
effective involves providing side-by-side comparisons        To effectively visualize and demonstrate learning
between scenarios that do and do not warrant certain      curves that may indicate overgeneralization, I will start
actions. This approach requires learners to identify dif- with synthetic data. Synthetic data, artificially generated
ferences in problem features, facilitating a deeper under-by computer algorithms and not derived from real-world
standing of when specific actions are appropriate.        events, mimics authentic datasets. The ethical gener-
   Incorporating both RQ2 and RQ3, the problem sequenc-   ation and application of synthetic data is a widely ac-
ing design pattern illustrated in Algorithm 3 is hypoth-  cepted practice in learning sciences, particularly within
esized to aid both in initial induction and subsequent    the realm of Educational Data Mining (EDM), as evi-
refinement, and can detect, diagnose, and remedy decep-   denced by its use in numerous EDM research studies
tive overgeneralization. The checkSAI() function, as in   [37, 38, 39, 40, 41]. Synthetic data addresses the complex-
CTAT, represents the automated evaluation by ITS that     ities of authentic learner data, aiding in the validation of
compare learner actions with reference ones [24].         models for skill mastery assessment, and can faithfully
                                                          reflect reality when properly modeled [41].
Algorithm 3 Problem Sequencing Design Hypothesized           To examine how deceptive overgeneralization affects
to Aid in Initial Induction and Subsequent Refinement     learning trajectories, BKT was used to simulate per-
   Target Knowledge Component (KC):                       formance with problem sequencing illustrated in Algo-
   if 𝐴 AND 𝐵 then                                        rithm 3 with the following parameters: 𝑝𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = 0.5,
       <action>
                                                          𝑝𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 = 0.2, 𝑝𝑠𝑙𝑖𝑝 = 0.1, and 𝑝𝑔𝑢𝑒𝑠𝑠 = 0.2. The learning
   end if                                                 process is modeled with a single KC with three possible
   Potential Overgeneralization:                          states: Unlearned, Overgeneralized, and Learned. This
   if 𝐴 then                                              approach adheres to the BKT framework by treating the
       <action>
                                                          learning progression as a transition between states. As
   end if                                                 learners in the Unlearned state receive repeated practice
   Problem Type 1: Designed for Induction                 opportunities, they may either remain in the Unlearned
   if 𝐴 AND 𝐵 then                                        state, transition to an Overgeneralized state, or move
       checkSAI(<action>)
                                                          directly to the Learned state. Learners in the Overgeneral-
   end if                                                 ized state can only possibly progress to the Learned state
   Problem Type 2: Designed for Refinement                through problems designed for refinement. Another core
   Problem Subtype 2.1: Unsuitable Context                assumption made in the simulation is the probability of
   if 𝐴 AND NOT 𝐵 then                                    correct responses based on knowledge state and problem
       checkSAI(NO <action>)
                                                          phase, as illustrated in Table 3. Problems designed for in-
   end if                                                 duction can be correctly answered (unless a slip occurs)
   Problem Subtype 2.2: Insufficient Information          using either the correct generalization or an overgen-
   if 𝐴 AND Missing Info about 𝐵 then                     eralization. For the problems designed for refinement,
       checkSAI("Not Enough Info")
                                                          learners who either remain in the Unlearned state or who
   end if                                                 have adopted the overgeneralized rule are expected to an-
                                                          swer incorrectly most of the time. However, rather than
                                                          guessing like those in the Unlearned state, learners in
   RQ4: Retrospective Discovery of Past Deceptive the Overgeneralized state will answer incorrectly unless
Overgeneralization. In addition to designing and con- a slip occurs, which reflects how learners with decep-
ducting experiments specifically for investigating decep- tive overgeneralization will “confidently” make mistakes
tive overgeneralization, my research could better con- when the conditions do not actually warrant the actions.
Table 3
Probability of Correct Responses Based on Knowledge State and Problem Phase
          State          Induction Phase (Same as Original BKT)          Refinement Phase (Reverse Slip Model)
        Unlearned                       P_GUESS                                       P_GUESS
      Overgeneralized                  1 - P_SLIP                                       P_SLIP
         Learned                       1 - P_SLIP                                     1 - P_SLIP


                                                               ments and supporting more reliable adaptive learning
                                                               experiences.


                                                               References
                                                                [1] V. Aleven, E. A. McLaughlin, R. A. Glenn, K. R.
                                                                    Koedinger, Instruction based on adaptive learning
                                                                    technologies, Handbook of research on learning
                                                                    and instruction 2 (2016) 522–560.
                                                                [2] J. R. Anderson, The Architecture of Cognition, Har-
                                                                    vard University Press, USA, 1983.
                                                                [3] J. R. Anderson, Act: A simple theory of complex
                                                                    cognition., American psychologist 51 (1996) 355.
Figure 5: Simulated Performance Trends                          [4] J. R. Anderson, Automaticity and the ACT the-
                                                                    ory, The American journal of psychology (1992)
                                                                    165–180.
Figure 5 visualizes the simulated performance trends of         [5] J. R. Anderson, Acquisition of cognitive skill., Psy-
learners with the above assumptions. First, during the              chological review 89 (1982) 369.
induction phase, the performance is not distinguishable         [6] V. Aleven, B. M. McLaren, J. Sewall, M. Van Velsen,
between the “Ever Overgeneralized” group and the “Di-               O. Popescu, S. Demi, M. Ringenberg, K. R.
rectly Learned” group. Second, the “Ever Overgeneral-               Koedinger, Example-tracing tutors: Intelligent tu-
ized” group (red line), with learners who have ever ac-             tor development for non-programmers, Interna-
quired the Overgeneralized state, notably exhibits a signif-        tional Journal of Artificial Intelligence in Education
icant and sudden performance drop when transitioning                26 (2016) 224–269.
to the refinement phase, which corresponds to the prob-         [7] K. VanLehn, The behavior of tutoring systems,
lems designed to detect overgeneralization. This drop               International journal of artificial intelligence in ed-
starkly contrasts with the stable performance growth of             ucation 16 (2006) 227–265.
the “Directly Learned” group (blue line) with learners          [8] B. P. Woolf, Building intelligent interactive tutors:
who directly transited from Unlearned to Learned state.             Student-centered strategies for revolutionizing e-
The performance recovery of the “Ever Overgeneralized”              learning, Morgan Kaufmann, 2010.
group after the drop demonstrates the remediation of            [9] L. S. Vygotsky, M. Cole, Mind in society: Develop-
overgeneralization.                                                 ment of higher psychological processes, Harvard
   My future research plan is to transition from syn-               university press, 1978.
thetic to authentic datasets by collaborating with other       [10] A. Collins, J. S. Brown, A. Holum, et al., Cognitive
researchers to perform retrospective analysis on existing           apprenticeship: Making thinking visible, American
datasets.                                                           educator 15 (1991) 6–11.
                                                               [11] S. Kalyuga, The expertise reversal effect, in: Manag-
                                                                    ing cognitive load in adaptive multimedia learning,
6. Contribution to TEL                                              IGI Global, 2009, pp. 58–80.
                                                               [12] K. R. Koedinger, V. Aleven, Exploring the assistance
In my doctoral research, I plan to extend the ACT-R cog-            dilemma in experiments with cognitive tutors, Ed-
nitive architecture to tackle deceptive overgeneralization.         ucational Psychology Review 19 (2007) 239–264.
My research seeks to refine the adaptivity of ITS and          [13] N. R. Council, How People Learn: Brain, Mind,
enable more accurate assessments of true skill mastery.             Experience, and School: Expanded Edition, The
This work contributes to Technology-Enhanced Learning               National Academies Press, Washington, DC, 2000.
(TEL) by enhancing the precision of automated assess-               doi:10.17226/9853 .
[14] V. J. Shute, Focus on formative feedback, Review              371–416.
     of educational research 78 (2008) 153–189.               [29] A. A. I. Bureau, Final report of the aircraft accident
[15] S. A. Ambrose, M. W. Bridges, M. DiPietro, M. C.              investigation bureau on the accident to the saab
     Lovett, M. K. Norman, How learning works: Seven               340b aircraft, registration hb-akk of crossair flight
     research-based principles for smart teaching, John            crx 498 on 10 january 2000 near nassenwil/zh, 2002.
     Wiley & Sons, 2010.                                      [30] R. A. Schmidt, R. A. Bjork, New conceptualizations
[16] K. R. Koedinger, A. Corbett, et al., Cognitive tu-            of practice: Common principles in three paradigms
     tors: Technology bringing learning sciences to the            suggest new concepts for training, Psychological
     classroom, na, 2006.                                          science 3 (1992) 207–218.
[17] A. T. Corbett, J. R. Anderson, Knowledge tracing:        [31] S. Board, In-flight separation of vertical stabilizer
     Modeling the acquisition of procedural knowledge,             american airlines flight 587 airbus industrie a300-
     User modeling and user-adapted interaction 4 (1994)           605r, n14053 belle harbor, new york november 12,
     253–278.                                                      2001, National Transportation Safety Board 490
[18] J. R. Anderson, C. D. Schunn, Implications of the act-        (2001).
     r learning theory: No magic bullets, in: Advances        [32] J. M. Schraagen, S. F. Chipman, V. L. Shalin, Cogni-
     in instructional Psychology, Volume 5, Routledge,             tive task analysis, Psychology Press, 2000.
     2013, pp. 1–33.                                          [33] A. S. Luchins, Mechanization in problem solving:
[19] K. R. Koedinger, A. T. Corbett, C. Perfetti, The              The effect of einstellung., Psychological mono-
     knowledge-learning-instruction framework: Bridg-              graphs 54 (1942) i.
     ing the science-practice chasm to enhance robust         [34] K. A. Ericsson, H. A. Simon, How to study thinking
     student learning, Cognitive science 36 (2012)                 in everyday life: Contrasting think-aloud proto-
     757–798.                                                      cols with descriptions and explanations of thinking,
[20] V. A. Aleven, K. R. Koedinger, An effective metacog-          Mind, Culture, and Activity 5 (1998) 178–186.
     nitive strategy: Learning by doing and explaining        [35] K. R. Koedinger, R. S. Baker, K. Cunningham,
     with a computer-based cognitive tutor, Cognitive              A. Skogsholm, B. Leber, J. Stamper, A data repos-
     science 26 (2002) 147–179.                                    itory for the edm community: The pslc datashop,
[21] H. Zhao, K. Koedinger, J. Kowalski, Knowledge                 Handbook of educational data mining 43 (2010)
     tracing and cue contrast: Second language english             43–56.
     grammar instruction, in: Proceedings of the An-          [36] K. R. Koedinger, R. S. Baker, K. Cunningham,
     nual Meeting of the Cognitive Science Society, vol-           A. Skogsholm, B. Leber, J. Stamper, A data repos-
     ume 35, 2013.                                                 itory for the edm community: The pslc datashop,
[22] N. M. Chang, Learning to discriminate and gener-              Handbook of educational data mining 43 (2010)
     alize through problem comparisons, Ph.D. thesis,              43–56.
     Carnegie Mellon University, 2006.                        [37] M. M. Rahman, Y. Watanobe, T. Matsumoto, R. U. Ki-
[23] LearnLab, Feature validity, 2011. URL: https://               ran, K. Nakamura, Educational data mining to sup-
     learnlab.org/wiki/index.php?title=Feature_validity,           port programming learning using problem-solving
     [Online; accessed 29-May-2024].                               data, IEEE Access 10 (2022) 26186–26202.
[24] V. Aleven, B. McLaren, J. Sewall, K. R. Koedinger,       [38] N. Ndou, R. Ajoodha, A. Jadhav, Educational data-
     Example-tracing tutors: A new paradigm for intel-             mining to determine student success at higher edu-
     ligent tutoring systems (2009).                               cation institutions, in: 2020 2nd International Multi-
[25] V. Aleven, B. M. McLaren, J. Sewall, K. R. Koedinger,         disciplinary Information Technology and Engineer-
     The cognitive tutor authoring tools (CTAT): Prelim-           ing Conference (IMITEC), IEEE, 2020, pp. 1–8.
     inary evaluation of efficiency gains, in: Intelligent    [39] J. M. Patil, S. R. Gupta, Extracting knowledge in
     Tutoring Systems: 8th International Conference,               large synthetic datasets using educational data min-
     ITS 2006, Jhongli, Taiwan, June 26-30, 2006. Pro-             ing and machine learning models, in: Soft Comput-
     ceedings 8, Springer, 2006, pp. 61–70.                        ing for Intelligent Systems: Proceedings of ICSCIS
[26] K. R. Koedinger, J. R. Anderson, W. H. Hadley, M. A.          2020, Springer, 2021, pp. 167–175.
     Mark, Intelligent tutoring goes to school in the big     [40] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami,
     city, International Journal of Artificial Intelligence        L. J. Guibas, J. Sohl-Dickstein, Deep knowledge
     in Education 8 (1997) 30–43.                                  tracing, Advances in neural information processing
[27] M. T. Chi, P. J. Feltovich, R. Glaser, Categorization         systems 28 (2015).
     and representation of physics problems by experts        [41] M. C. Desmarais, I. Pelczer, On the faithfulness of
     and novices, Cognitive science 5 (1981) 121–152.              simulated student performance data, in: Educa-
[28] B. H. Ross, Remindings and their effects in learning          tional Data Mining 2010, 2010.
     a cognitive skill, Cognitive psychology 16 (1984)