Extending ACT-R to Tackle Deceptive Overgeneralization in Intelligent Tutoring Systems Marshall An Carnegie Mellon University, Pittsburgh PA, 15213, USA Abstract This research extends the ACT-R cognitive architecture to tackle deceptive overgeneralization within Intelligent Tutoring Systems (ITS). Existing adaptive learning technologies, while effective, rely on learning data that may not fully capture the nuances of learner understanding, particularly in cases of deceptive overgeneralization. This phenomenon occurs when learners exhibit correct actions during monitored learning sessions, yet these actions are grounded in an incomplete understanding of the necessary conditions. Due to the reliance on observed correctness, ITS may falsely assess mastery, potentially ceasing to provide further necessary practice opportunities that could aid in the refinement of understanding. This study aims to identify ITS designs that may inadvertently foster such misconceptions and to develop methods for their detection, diagnosis, and correction. Utilizing experimental designs, think-aloud protocols, and educational data mining, the research seeks to refine the adaptivity of ITS and enable more accurate assessments of true skill mastery. This work contributes to Technology-Enhanced Learning (TEL) by enhancing the precision of automated assessments and supporting more reliable adaptive learning experiences. Keywords Adaptive Learning, Intelligent Tutoring Systems, Instructional design, Feedback, Educational Data Mining, Bayesian Knowl- edge Tracing 1. Introduction Learning (TEL) environments, especially those utilizing Intelligent Tutoring Systems (ITS) with adaptive capabil- Adaptive learning technologies, powered by learning data ities that dynamically select practice problems based on and dynamically adjusting to individual learner needs, estimated skill mastery, might amplify the issue of decep- have proven effective across various educational settings tive overgeneralizations. Such environments may prema- [1]. However, by definition, any type of adaptivity relies turely cease providing further necessary practice oppor- on data reflecting student learning [1, p. 523]. The ac- tunities that aid in the refinement of understandings, leav- curacy and completeness of learning data are therefore ing these inaccuracies unaddressed. The consequences critical. There are instances, however, where the learn- of failing to detect and address deceptive overgeneraliza- ing data may fall short, particularly in cases of deceptive tions can extend beyond academic performance, poten- overgeneralization. tially affecting long-term educational pathways, career Deceptive overgeneralization describes an undesired trajectories, and in some cases, leading to dire conse- learning state wherein a learner acquires a relevant but quences. incomplete subset of the conditions necessary for a skill, My doctoral research aims to investigate the mecha- yet manages to perform the correct actions. Such over- nisms of deceptive overgeneralization by applying and ex- generalization is “deceptive”, as it can lead to seemingly tending the well-established cognitive architecture, Adap- satisfactory performance during scrutinized learning ses- tive Control of Thought – Rational (ACT-R) [2, 3]. This sions, as the learner’s observable actions align with those study aims to uncover how certain designs of ITS might of individuals who have accurately mastered the skill. overlook subtle instances of deceptive overgeneraliza- However, these actions are based on a flawed understand- tion and to investigate design principles that can detect ing of the underlying conditions. and remedy them. Ultimately, my research seeks to con- Deceptive overgeneralization poses a significant chal- tribute to the advancements of adaptive learning tech- lenge, leading to false evaluations of mastery, which nologies, enhancing their effectiveness as educational drives adaptivity. This can mislead learners, instructors, solutions. and researchers into getting prematurely convinced that a skill has been mastered. Many Technology-Enhanced Proceedings of the Doctoral Consortium of the 19th European Confer- ence on Technology Enhanced Learning, 16th September 2024, Krems an der Donau, Austria Envelope-Open haokanga@andrew.cmu.edu (M. An) Orcid 0009-0005-5165-640X (M. An) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Literature Review solving step against the possible actions generated by the cognitive model, in order to provide individualized, just- 2.1. Adaptive Control of Thought – in-time learning support tailored to the learners’ specific Rational (ACT-R) approach to a problem [16, p.142]. In the task-loop, ITS employ knowledge tracing algo- ACT-R, a cognitive architecture for understanding and rithms such as Bayesian Knowledge Tracing (BKT) [17] modeling human cognitive processes, posits that cog- to dynamically adjust problem sequences based on real- nitive behaviors are orchestrated by productions [4, 3]. time assessments of learner mastery. Each time a learner A production can be represented as a condition-action attempts a step in a practice problem, the system updates pair [2, p.5], with the condition part specifies the circum- its estimate of the learner’s mastery of the relevant pro- stances under which the production can apply, and the duction rule based on the correctness of the learner’s action part specifies what should be done when produc- action [16, p.143]. This ongoing assessment allows ITS tion applies [5, p.3]. ACT-R has significantly influenced to dynamically tailor the sequence of problems, ensuring the development of ITS, which delivers personalized tu- that each practice opportunity aligns with the learner’s toring by adapting to the unique learning needs of each current skill level and learning trajectory. When the sys- learner. Empirical studies underpinning ACT-R have tem reaches a high degree of certainty, typically exceed- led to a proliferation of ITS that successfully enhance ing a predefined threshold (e.g., 95%) [16, p.144], about learning outcomes across diverse educational settings a student’s mastery of a skill through repeated observa- [6, 7, 8]. These systems, particularly cognitive tutors, re- tions of correct actions, it ceases presenting tasks related quire the development and integration of domain-specific to that skill. This automated stopping rule optimizes cognitive models that adhere to the ACT-R framework, the balance between learning time and effort, preventing to capture various learner strategies and potential mis- overpractice and maximizing educational efficiency. conceptions. Furthermore, the design-loop adaptivity involves data- driven instructional (re)design, before and between itera- 2.2. Adaptivity of Intelligent Tutoring tions of ITS development, informed by learning data [1, Systems p.526]. However, the adaptivity of ITS is not without limita- Adaptive learning, fundamental to ITS efficacy, is sup- tions. One key challenge lies in addressing deceptive ported by various theoretical perspectives, such as Vy- overgeneralization—where learners perform correct ac- gotsky’s zone of proximal development [9], the cognitive tions based on a flawed understanding of underlying apprenticeship model [10], the expertise reversal effect conditions. This phenomenon challenges the assessment [11], and the assistance dilemma [12]. The efficacy of ITS models of ITS, which typically rely on differentiating in improving learning outcomes is largely attributable between correct versus incorrect actions to gauge mas- to its adaptivity, which allows for personalized learning tery. As such, deceptive overgeneralization presents an based on individual learner progress and needs. intriguing area for further research. Adaptivity is not a binary property, but rather “a mat- ter of degree” [1, p.523]. ITS distinguish themselves by adapting across all three major time scales defined by the 3. Deceptive Overgeneralization as a Adaptivity Grid: step, task, and design [1, p.525]. Possible Learning State Within the step-loop, ITS provides timely and targeted feedback at each problem-solving step. Indeed, timely Learning is typically characterized by a gradual and con- feedback is critical to enable the learners to continu- tinuous process rather than sudden transformative in- ously monitor their learning and evaluate their problem- sights [18]. The Knowledge-Learning-Instruction (KLI) solving strategies and their current understanding [13]. framework views learning as the acquisition of Knowl- The positive effects of feedback are well supported by the edge Components (KCs), which are acquired units of rich wealth of evidence in the literature review by Shute cognitive functions or structures [19]. The KLI frame- [14]. Feedback is most effective when it clearly highlights work identifies induction and refinement as one primary discrepancies between a learner’s current performance type of learning processes, particularly for acquiring KCs and the desired outcome, while offering actionable guid- associated with variable conditions: for KCs with condi- ance to help learners meet specific target criteria [15, tions that can vary in form or value, learners must induce p.139]. ITS embody these best practices of feedback, by and refine KCs so that the acquired KCs are “accurate, detecting and diagnosing observable discrepancies be- appropriately general, and discriminating” [19]. As we tween expected and actual actions at each step. With consider the induction and subsequent refinement of a a developed cognitive model, a cognitive tutor employs KC as a continuous learning progression, learners may model tracing to compare learner actions at each problem- initially acquire an inaccurately generalized version of Table 1 Examples of Correct and Inaccurate Generalization in Knowledge Components Across Various Disciplines Discipline / Topic Correct KC Referenced Inaccurate KC Math / Geometry IF the triangle is isosceles AND two angles IF the triangle is isosceles AND two angles are at the base of the triangle THEN the THEN the two angles are equal [20] two angles are equal Language / English Articles IF single mountain name THEN zero article IF mountain name THEN zero article [21] Statistics / Data Visualization IF categorical data THEN choose pie chart IF demographic data THEN choose pie chart [22] the target KC. This initial misunderstanding may either overgeneralization involves learners who, during closely be refined into an accurate KC through further practice, monitored learning sessions, apply correct actions that or it may persist as inaccurate due to a lack of practice are based on incomplete understanding of the necessary opportunities that support the refinement process. conditions. These learners may later inappropriately ap- ply these actions under unsuitable circumstances, often 3.1. Modeling of Deceptive beyond the scrutiny of the initial learning. This high- lights why deceptive overgeneralization is particularly Overgeneralization “deceptive”: learners are still observed to take correct A KC connects features of a problem to a corresponding actions, despite their misconceptions. response. A learner has acquired a KC that is considered Furthermore, my research differs from prior studies accurate, or “with high feature validity”, when all of the that have primarily focused on distinguishing between features are relevant to making the response and none superficial and deep features in learning. Superficial fea- of them are irrelevant [23]; otherwise, a KC is inaccurate tures, also known as shallow or surface features, are and requires further refinement. Inaccurate generaliza- those that do not contribute to correct solution pathways tion could be overgeneralization, undergeneralization, or [22, 27, 28]. For example, a learner chose to use a pie chart even more nuanced a mix of them. Indeed, inaccurate because the data is demographic (superficial) rather than generalization is a common phenomenon observed in categorical (deep) [22]. In contrast, my research inves- learning sciences research across various disciplines. Ta- tigates scenarios in which learners take correct actions ble 1 presents examples of incorrect generalization, along based on a relevant yet incomplete set of features. Im- with their corresponding accurate KCs, drawn from re- portantly, unlike superficial features, all these features search literature. Among these, deceptive overgeneral- belong to the correct solution pathways, thereby making ization is particularly intriguing to investigate. the learners’ understanding appear deceptively correct. In ITS, specifically those developed using Cognitive Tutor Authoring Tools (CTAT) [24, 25], each production’s 3.2. Stickiness of Deceptive condition-action pair is structured as an IF-THEN state- Overgeneralization ment [26]: IF THEN . Overgen- eralization occurs when a learner acquires production The KLI framework delineates a relationship between ob- rules whose IF part is overly broad compared to the cor- servable and unobservable events: instructional events, rect IF part. In computational or logical terms, overgen- learning events, and assessment events [19]. Instruc- eralization can happen due to the omission of logical AND tional events cause learning events, which are unobserv- operators in the IF part. Consider a target KC requiring able processes that result in changes in KCs, such as multiple conditions for its activation, represented as IF A acquisition of new KCs or refinement of existing KCs. AND B THEN . Overgeneralization might arise The changes of KCs, in turn, cause learner performances when a learner acquires a KC that omits part of the con- that are observable during assessment events. Given that ditions, resulting in IF A THEN . learning events are central yet unobservable, assessments It is crucial to distinguish the phenomenon of deceptive are expected to be designed with the quality to accurately overgeneralization from the broader concept of “miscon- reflect the true nature of learning events. However, in ceptions.” Consider a simple algebra problem: Anderson cases of overgeneralization, certain designs may fail short. describes an observation that a student incorrectly solves Using set theory, overgeneralization can be visualized as the equation 2𝑥 = 6 by subtracting 2 from both sides, an inclusion relation and we can identify a specific type erroneously resulting in 𝑥 = 4 instead of 𝑥 = 3 [18]. Such of potential design flaw, as depicted in Figure 1. misconceptions lead to actions that are clearly incorrect, Many TEL environments, particularly those involv- allowing for immediate observation, feedback provision, ing ITS, leverage automated evaluation and feedback and tailored subsequent training. In contrast, deceptive mechanisms to deliver learning at scale. The reliance model suggests, while competence develops in a more-or- OvergeneralizedIF less linear fashion, consciousness initially increases and then decreases, as both novices (in Stage 1) and experts (in Stage 4) operate in states of relative unconsciousness, CorrectIF though for vastly different reasons [15, p.97]. I contend that deceptive overgeneralization may occur during any stage transition, including transitions towards Stage 4. Experts, as they develop their proficiency and automatic- Figure 1: Overgeneralization occurs when a learner acquires ity, may also be prone to forming inaccurate heuristics production rules whose IF part is a superset of the correct and cognitive shortcuts to enable fast task completion. rule’s IF part, covering an overly extended range. This rela- An example demonstrating that experts can form de- tionship can be expressed as OvergeneralizedIF ⊇ CorrectIF. ceptive overgeneralization, and that deceptive overgen- Cross marks within CorrectIF represent practice activities that eralization can lead to severe consequences, is the Cros- cannot test for overgeneralization. If all practice activities fall sair Flight 498 Crash. The official incident investigation within CorrectIF, focusing solely on correct actions, the in- report identified one human factor probable cause as fol- structional design will fail to identify whether learners have lows: “when interpreting the attitude display instruments acquired the correct rule or an overgeneralization. under stress, the commander resorted to a reaction pat- tern (heuristics) which he had learned earlier” [29, p.10]. As demonstrated in Figure 3, a Soviet attitude dis- on these automated mechanisms can pose challenges for play indicates a left roll of the airplane with a counter- all stakeholders regarding deceptive overgeneralization. clockwise rotation. The appropriate response, detailed TEL tools might mistakenly provide positive feedback in Algorithm 1, is to stabilize the airplane by rotating it to learners who perform correct actions based on an right. This rule acts as a cognitive shortcut that simpli- inaccurate understanding of conditions, inadvertently fies decision-making by minimizing the cognitive load reinforcing misconceptions. Instructors and researchers needed to interpret the display. However, errors can arise employing learning analytics or educational data mining if this shortcut is overgeneralized, omitting the condition are similarly at risk of being misled by seemingly satis- that it should only apply to Soviet displays, leading to factory learning data, potentially missing opportunities incorrect responses with other types of attitude displays. for intervention and correction that address learners’ in- correct understandings. Moreover, ITS, with its adaptive Algorithm 1 Correct Production Rule for Interpreting capabilities that dynamically select practice problems and (Soviet) Attitude Display to Stabilize an Airplane assess mastery, might amplify these issues. The reliance if the goal is stabilize an airplane and attitude display on observed correctness by knowledge tracing algorithms rotates counter-clockwise and it is a Soviet display can lead to premature conclusions about learner mastery, then halting further necessary practice that aids genuine skill rotate the airplane right development and refinement, leaving those misconcep- end if tions unaddressed. As what is captured and reported by TEL tools appears correct, encouraging, and satisfactory, For the first 20 years of his flying career, the com- deceptive overgeneralization may be particularly “sticky” mander received training that was “in theory compre- and resistant to detection and change. hensive,” exclusively at a flying school in the former Soviet Union [29, p.18]. However, upon transitioning 3.3. Both Novices and Experts Could be to aircraft equipped with Western systems, no special Prone to Deceptive differential training was provided to highlight the differ- Overgeneralization ences between Eastern and Western systems, nor did the commander undergo any unusual attitude training [29, If “practice makes perfect” were true to the extent that p.19]. Therefore, the commander “had no opportunity to well-developed expertise guarantee refined and accurate be trained in any other pattern of behavior” [29, p.96], skills, then deceptive overgeneralization could be effec- meaning no opportunities to ever detect and correct the tively addressed by providing ample practice opportuni- acquired deceptive overgeneralization. As the comman- ties in favorable learning conditions. However, I argue der resorted to the overgeneralization in the scenario as that even experts are not immune to deceptive overgen- illustrated in Figure 4, the commander kept rotating the eralization, despite their considerable mastery of skills. airplane right (further) when the airplane was already Ambrose et al. [15, p.97] modeled mastery and its devel- rolling right, eventually resulting in a loss of control. opment into four stages, as illustrated in Figure 2. As this The acquisition of shortcuts can be modeled using the Do not know what Recognize what they do not know Act deliberately Act automatically they do not know and need to learn with considerable competence and instinctively 1 2 3 4 Unconscious Incompetence Conscious Incompetence Conscious Competence Unconscious Competence Figure 2: The Four Stages of Mastery. This model illustrates the progression from novice to expert, highlighting the development of competence and the shifting levels of consciousness. Sky Sky on riz Ho Aircraft Aircraft Left Wing Right Wing Horizon Left Wing Right Wing Ground Ground (a) Before counter-clockwise rotation Sky (a) After counter-clockwise rotation Figure 4: A simplified depiction of a Western attitude display. The display reflects a “first-person view”, where the airplane ing stays fixed and the horizon rotates relative to the airplane. ft tW cra Righ Air A counter-clockwise rotation (of the horizon relative to the Horizon airplane) indicates that the airplane is rolling right. W ing Left elimination of redundant subgoals. These optimizations Ground enhance the efficiency of the macro-production compared to the original series of separate productions [5, p.35]. (b) After counter-clockwise rotation However, it is possible that even experts who have mas- Figure 3: A simplified depiction of a Soviet attitude display. tered accurate basic production rules may develop inaccu- The display reflects a “third-person view”, where the horizon rate “macro-productions” during the process of building stays fixed, and the airplane’s position is shown relative to the proficiency and automaticity if errors enter into the com- horizon. A counter-clockwise rotation (of the airplane relative pilation process. Although composition increases overall to the horizon) indicates that the airplane is rolling left. efficiency by pruning redundant conditions and actions, these composed macroproductions tend to grow larger, particularly with an increase in the size of the condition process called knowledge compilation in the ACT-R the- sides [2, p.239]. With an increasingly more complex and ory, which serves to eliminate multiple production firings composite condition side, it becomes more likely that and the need for retrieval from declarative memory [4, some conditions will be overlooked, potentially leading p.169]. A primary compilation process, known as com- to overgeneralization. While human compilation is grad- position, is to takes sequences of productions that follow ual (in contrast to computer compilation), which may each other in solving a particular problem and collapses provide some protection against errors of omitting con- them into a single “macro-production” that has the effect ditional tests from entering compilation, this protection of the sequence [2, p.235]. For example, Algorithm 1 is not infallible and can only reduce, but not eliminate, could be compiled as shown in Algorithm 2. These pro- the possibility of condition omission [5, p.46]. duction rules are intentionally represented in pseudo Knowledge compilation in ACT-R theory suggests that code, mimicking the implementation style of cognitive new productions generated through knowledge compila- tutors developed with CTAT [24]. This representation tion do not replace, but rather coexist with old ones [2, serves to highlight several benefits of composition: fewer p.237]. A process known as conflict resolution then deter- conditions and actions, fewer variables to track, and the mines which productions to apply [2, p.132]. This raises Algorithm 2 Knowledge Compilation for Interpreting may be widespread, which highlights the importance of Attitude Display to Stabilize an Airplane understanding their mechanisms through research. Rule P1: The commander’s extensive experience, amounting to Condition: goal == stabilizeAirplane AND rollDirec- over 8,000 hours [29, p.15], categorizes him within Stage tion == unknown 4 of the mastery model illustrated in Figure 2, where in- Action: subgoal = identifyRollDirection dividuals are capable of acting automatically and instinc- Rule P2: tively. However, this incident starkly demonstrates that Condition: subgoal == identifyRollDirection AND dis- such automatic actions performed by experts, when based playRotation == counterClockwise AND displayType on deceptive overgeneralization, can lead to dire conse- == Soviet quences. A similar case, that exemplifies the dangers of Action: rollDirection = left overgeneralization in aviation training, is the American Rule P3: Airlines Flight 587 crash, where poorly-designed train- Condition: goal == stabilizeAirplane AND rollDirec- ing led to deceptive overgeneralization, resulting in actions tion != unknown deemed correct during training but were inappropriate for Action: subgoal = recoverAttitude actual conditions, ultimately leading to catastrophic out- Rule P4: comes. Specifically, the American Airlines Advanced Air- Condition: subgoal == recoverAttitude AND rollDirec- craft Maneuvering Program included an excessive bank tion == left angle simulator exercise intended to prepare pilots for Action: rotateAirplane(right) extreme wake turbulence. This equipped trainees with Composed Rule P1&P2&P3&P4: aggressive roll upset recovery techniques. Unfortunately, Condition: goal == stabilizeAirplane AND displayRota- the scenario used in training was overly extreme and tion == counterClockwise AND displayType == Soviet not representative of the actual aircraft type involved. Action: rotateAirplane(right) This inappropriate training “enabled” the first officer to Efficiency Gain: mistakenly apply these excessive techniques during a 2 subgoals, 4 conditions, 3 intermediate cognitive ac- moderate wake turbulence encounter, leading to the in- tions, and 2 variables get reduced by composition flight separation of the vertical stabilizer and culminating in a fatal plane nosedive [31]. It can be argued that had the pilot not been trained to perform such aggressive ma- neuvers, the disaster could have been entirely avoided. the question of why the commander chose the overgen- In summary, acquiring a production rule that pairs cor- eralized shortcut over the basic alternative productions. rect actions with incorrect conditions is an undesirable The ACT-R strengthening mechanism might provide an learning outcome, which at best might later be rectified explanation [2, p.250]. Production strength reflects the without severe repercussions, and at worst, could result frequency of successful past applications [2, p.133]. Over in catastrophic outcomes. the years, while flying Soviet aircraft, this shortcut—de- spite being overgeneralized—consistently led to correct actions within the context of Soviet attitude displays. 3.4. Summary This increased production strength may have made this This section presents the problem identification and ex- shortcut the preferred choice during conflict resolution. amination on the phenomenon of deceptive overgeneral- Another contributing factor to the commander’s selec- ization through literature review and case studies, yield- tion of the overgeneralization could be the medication ing several key characteristics of deceptive overgeneral- effects, which potentially limited the commander’s cogni- ization that underscore the need for further investigation: tive capacity [29, p.107]. The improved efficiency of the composed shortcut may have prompted the commander 1. Deceptive overgeneralization is prevalent across to favor the overgeneralized macro-production over a various domains. sequence of basic productions, especially under stress 2. Deceptive overgeneralization can be “sticky”, dif- requiring immediate action, and possibly while multitask- ficult to detect and resistant to change. ing. Such demanding and stressful scenarios are common, 3. In certain cases, deceptive overgeneralization can particularly in fields where individuals are considered be worse learning outcomes than if the skill had experts and carry critical responsibilities. Moreover, sit- not been learned at all. uations involving limited cognitive capacity can occur 4. Both novices and experts could be prone to de- to anyone. The ability to perform under conditions of ceptive overgeneralization. stress, sleep deprivation, or fatigue is crucial, as is the capability to effectively manage simultaneous secondary tasks [30]. This indicates that overgeneralized shortcuts Table 2 Summary of Methodologies for Each Research Question Research Question Methodology RQ1: Formation Experiments followed by Think-Aloud Studies; RCTs. RQ2: Detection and Diagnosis RCTs RQ3: Remediation RCTs RQ4: Retrospective Discovery EDM techniques using both synthetic and authentic datasets 4. Research Questions task-loop adaptivity. However, these systems are not specifically designed to prevent deceptive overgeneral- My doctoral research aims to investigate the mechanisms ization. My experimental design draws inspiration from of deceptive overgeneralization using the context of ITS studies on the Einstellung effect, which describes how and develop effective strategies for addressing deceptive practice with a fixed method can bias individuals toward overgeneralization. The proposed research questions applying this method even when better alternatives ex- are structured to methodically examine the formation, ist [33]. In my experiments, learners will practice using detection, remediation, and retrospective discovery of ITS until they have achieved mastery as deemed by ITS. deceptive overgeneralization: Subsequently, these learners will face tasks where the RQ1: Formation of Deceptive Overgeneralization. actions they have learned are no longer suitable. As my What types of production rules are most susceptible to research contends that ITS may have limitations when deceptive overgeneralization? Under what conditions do it comes to accurately assessing true skill mastery, the ITS risk promoting deceptive overgeneralization? research plan will incorporate qualitative data collected RQ2: Detection and Diagnosis of Deceptive Over- through think-aloud studies [34]. Specifically, “graduated generalization. What features can be integrated into novices”—learners who have completed training and are ITS to detect and diagnose deceptive overgeneralization? judged by the ITS to have mastered the content—will ver- RQ3: Remediation of Deceptive Overgeneraliza- balize their understanding of the conditions during these tion. What instructional strategies are effective at cor- sessions, in order to identify instances of deceptive over- recting deceptive overgeneralization? generalization. Next, to ascertain under what conditions RQ4: Retrospective Discovery of Past Deceptive ITS may inadvertently promote deceptive overgeneral- Overgeneralization. Can Educational Data Mining ization and to identify which features of instructional (EDM) techniques discover previously undetected decep- design are most susceptible to fostering these errors, my tive overgeneralization from existing education datasets? research plan includes conducting randomized controlled trials (RCTs) that compare different ITS interface designs and problem sequencing. 5. Methodology RQ2: Detection and Diagnosis of Deceptive Over- This section has outlined the research methodologies generalization. To investigate features that can be inte- corresponding to each of the research questions guid- grated into ITS for effectively detecting and diagnosing ing my doctoral study. To rigorously investigate the deceptive overgeneralization, RCTs will be conducted phenomenon of deceptive overgeneralization, a diverse to compare different ITS interface designs and problem methodological approach will be employed. The methods sequencing. range from experiments, think-aloud studies, and EDM Traditionally, ITS interfaces are designed to guide techniques, as summarized in Table 2. learners toward correct actions, potentially neglecting RQ1: Formation of Deceptive Overgeneraliza- interface elements which represent potential incorrect tions. The initial step in my research is to evaluate actions that learners should avoid, as these elements do the hypothesized design flaw, as illustrated in Figure 1. not belong to the prescribed solution pathway. Con- This hypothesis suggests that when a series of practice sequently, learners might attempt to perform incorrect activities only evaluate whether learners have performed actions but find themselves unable to do so, making those the expected actions, such instructional designs may not mistakes undetected, uncorrected, and unlogged. One adequately determine whether learners have internalized hypothesized effective design is to provide practice op- the correct rule or an overgeneralization. portunities where “lack of action” is the correct response. My research strategy includes conducting experiments Although detecting non-actions poses more challenges with ITS that adhere to best practices in ITS design, such than evaluating actions, we may consider ITS design as cognitive model development through Cognitive Task that incorporates interface elements that learners should Analysis (CTA) [32], tailored hints and feedback, and avoid interacting with, in order to make “lack of action” observable and test whether learners can appropriately refrain from actions when the conditions do not warrant tribute to the TEL community if there is evidence that the them. This approach is similar to including distractor op-research findings can also generate actionable insights tions in multiple-choice questions (MCQs), where learn- using existing datasets. Therefore, the last research ques- ers must correctly identify and decide against choosing tion focuses on retrospective analysis to discover past such options. Of course, the expertise reversal effect [11] deceptive overgeneralizations, using learning datasets suggests that such distractor interface elements should already collected through standard procedures. My re- only be introduced when learners have reached a certain search plans to employ learning curve analysis facilitated level of skill mastery, to ensure that cognitive workload by DataShop [35], which graphically represents changes remains manageable. in learner performance, visualizing any improvement or RQ3: Remediation of Deceptive Overgeneraliza- stagnation as learners engage in repeated practice oppor- tion. Similar to RQ2, RCTs that compare different ITS tunities [36]. ITS systems developed with CTAT, which interface designs and problem sequencing will be con- typically store learning logs in DataShop, which are ready ducted. One instructional design hypothesized to be candidates for retrospective analysis. effective involves providing side-by-side comparisons To effectively visualize and demonstrate learning between scenarios that do and do not warrant certain curves that may indicate overgeneralization, I will start actions. This approach requires learners to identify dif- with synthetic data. Synthetic data, artificially generated ferences in problem features, facilitating a deeper under-by computer algorithms and not derived from real-world standing of when specific actions are appropriate. events, mimics authentic datasets. The ethical gener- Incorporating both RQ2 and RQ3, the problem sequenc- ation and application of synthetic data is a widely ac- ing design pattern illustrated in Algorithm 3 is hypoth- cepted practice in learning sciences, particularly within esized to aid both in initial induction and subsequent the realm of Educational Data Mining (EDM), as evi- refinement, and can detect, diagnose, and remedy decep- denced by its use in numerous EDM research studies tive overgeneralization. The checkSAI() function, as in [37, 38, 39, 40, 41]. Synthetic data addresses the complex- CTAT, represents the automated evaluation by ITS that ities of authentic learner data, aiding in the validation of compare learner actions with reference ones [24]. models for skill mastery assessment, and can faithfully reflect reality when properly modeled [41]. Algorithm 3 Problem Sequencing Design Hypothesized To examine how deceptive overgeneralization affects to Aid in Initial Induction and Subsequent Refinement learning trajectories, BKT was used to simulate per- Target Knowledge Component (KC): formance with problem sequencing illustrated in Algo- if 𝐴 AND 𝐵 then rithm 3 with the following parameters: 𝑝𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = 0.5, 𝑝𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 = 0.2, 𝑝𝑠𝑙𝑖𝑝 = 0.1, and 𝑝𝑔𝑢𝑒𝑠𝑠 = 0.2. The learning end if process is modeled with a single KC with three possible Potential Overgeneralization: states: Unlearned, Overgeneralized, and Learned. This if 𝐴 then approach adheres to the BKT framework by treating the learning progression as a transition between states. As end if learners in the Unlearned state receive repeated practice Problem Type 1: Designed for Induction opportunities, they may either remain in the Unlearned if 𝐴 AND 𝐵 then state, transition to an Overgeneralized state, or move checkSAI() directly to the Learned state. Learners in the Overgeneral- end if ized state can only possibly progress to the Learned state Problem Type 2: Designed for Refinement through problems designed for refinement. Another core Problem Subtype 2.1: Unsuitable Context assumption made in the simulation is the probability of if 𝐴 AND NOT 𝐵 then correct responses based on knowledge state and problem checkSAI(NO ) phase, as illustrated in Table 3. Problems designed for in- end if duction can be correctly answered (unless a slip occurs) Problem Subtype 2.2: Insufficient Information using either the correct generalization or an overgen- if 𝐴 AND Missing Info about 𝐵 then eralization. For the problems designed for refinement, checkSAI("Not Enough Info") learners who either remain in the Unlearned state or who end if have adopted the overgeneralized rule are expected to an- swer incorrectly most of the time. However, rather than guessing like those in the Unlearned state, learners in RQ4: Retrospective Discovery of Past Deceptive the Overgeneralized state will answer incorrectly unless Overgeneralization. In addition to designing and con- a slip occurs, which reflects how learners with decep- ducting experiments specifically for investigating decep- tive overgeneralization will “confidently” make mistakes tive overgeneralization, my research could better con- when the conditions do not actually warrant the actions. Table 3 Probability of Correct Responses Based on Knowledge State and Problem Phase State Induction Phase (Same as Original BKT) Refinement Phase (Reverse Slip Model) Unlearned P_GUESS P_GUESS Overgeneralized 1 - P_SLIP P_SLIP Learned 1 - P_SLIP 1 - P_SLIP ments and supporting more reliable adaptive learning experiences. References [1] V. Aleven, E. A. McLaughlin, R. A. Glenn, K. R. Koedinger, Instruction based on adaptive learning technologies, Handbook of research on learning and instruction 2 (2016) 522–560. [2] J. R. Anderson, The Architecture of Cognition, Har- vard University Press, USA, 1983. [3] J. R. Anderson, Act: A simple theory of complex cognition., American psychologist 51 (1996) 355. Figure 5: Simulated Performance Trends [4] J. R. Anderson, Automaticity and the ACT the- ory, The American journal of psychology (1992) 165–180. Figure 5 visualizes the simulated performance trends of [5] J. R. Anderson, Acquisition of cognitive skill., Psy- learners with the above assumptions. First, during the chological review 89 (1982) 369. induction phase, the performance is not distinguishable [6] V. Aleven, B. M. McLaren, J. Sewall, M. Van Velsen, between the “Ever Overgeneralized” group and the “Di- O. Popescu, S. Demi, M. Ringenberg, K. R. rectly Learned” group. Second, the “Ever Overgeneral- Koedinger, Example-tracing tutors: Intelligent tu- ized” group (red line), with learners who have ever ac- tor development for non-programmers, Interna- quired the Overgeneralized state, notably exhibits a signif- tional Journal of Artificial Intelligence in Education icant and sudden performance drop when transitioning 26 (2016) 224–269. to the refinement phase, which corresponds to the prob- [7] K. VanLehn, The behavior of tutoring systems, lems designed to detect overgeneralization. This drop International journal of artificial intelligence in ed- starkly contrasts with the stable performance growth of ucation 16 (2006) 227–265. the “Directly Learned” group (blue line) with learners [8] B. P. Woolf, Building intelligent interactive tutors: who directly transited from Unlearned to Learned state. Student-centered strategies for revolutionizing e- The performance recovery of the “Ever Overgeneralized” learning, Morgan Kaufmann, 2010. group after the drop demonstrates the remediation of [9] L. S. Vygotsky, M. Cole, Mind in society: Develop- overgeneralization. ment of higher psychological processes, Harvard My future research plan is to transition from syn- university press, 1978. thetic to authentic datasets by collaborating with other [10] A. Collins, J. S. Brown, A. Holum, et al., Cognitive researchers to perform retrospective analysis on existing apprenticeship: Making thinking visible, American datasets. educator 15 (1991) 6–11. [11] S. Kalyuga, The expertise reversal effect, in: Manag- ing cognitive load in adaptive multimedia learning, 6. Contribution to TEL IGI Global, 2009, pp. 58–80. [12] K. R. Koedinger, V. Aleven, Exploring the assistance In my doctoral research, I plan to extend the ACT-R cog- dilemma in experiments with cognitive tutors, Ed- nitive architecture to tackle deceptive overgeneralization. ucational Psychology Review 19 (2007) 239–264. My research seeks to refine the adaptivity of ITS and [13] N. R. Council, How People Learn: Brain, Mind, enable more accurate assessments of true skill mastery. Experience, and School: Expanded Edition, The This work contributes to Technology-Enhanced Learning National Academies Press, Washington, DC, 2000. (TEL) by enhancing the precision of automated assess- doi:10.17226/9853 . [14] V. J. Shute, Focus on formative feedback, Review 371–416. of educational research 78 (2008) 153–189. [29] A. A. I. Bureau, Final report of the aircraft accident [15] S. A. Ambrose, M. W. Bridges, M. DiPietro, M. C. investigation bureau on the accident to the saab Lovett, M. K. Norman, How learning works: Seven 340b aircraft, registration hb-akk of crossair flight research-based principles for smart teaching, John crx 498 on 10 january 2000 near nassenwil/zh, 2002. Wiley & Sons, 2010. [30] R. A. Schmidt, R. A. Bjork, New conceptualizations [16] K. R. Koedinger, A. Corbett, et al., Cognitive tu- of practice: Common principles in three paradigms tors: Technology bringing learning sciences to the suggest new concepts for training, Psychological classroom, na, 2006. science 3 (1992) 207–218. [17] A. T. Corbett, J. R. Anderson, Knowledge tracing: [31] S. Board, In-flight separation of vertical stabilizer Modeling the acquisition of procedural knowledge, american airlines flight 587 airbus industrie a300- User modeling and user-adapted interaction 4 (1994) 605r, n14053 belle harbor, new york november 12, 253–278. 2001, National Transportation Safety Board 490 [18] J. R. Anderson, C. D. Schunn, Implications of the act- (2001). r learning theory: No magic bullets, in: Advances [32] J. M. Schraagen, S. F. Chipman, V. L. Shalin, Cogni- in instructional Psychology, Volume 5, Routledge, tive task analysis, Psychology Press, 2000. 2013, pp. 1–33. [33] A. S. Luchins, Mechanization in problem solving: [19] K. R. Koedinger, A. T. Corbett, C. Perfetti, The The effect of einstellung., Psychological mono- knowledge-learning-instruction framework: Bridg- graphs 54 (1942) i. ing the science-practice chasm to enhance robust [34] K. A. Ericsson, H. A. Simon, How to study thinking student learning, Cognitive science 36 (2012) in everyday life: Contrasting think-aloud proto- 757–798. cols with descriptions and explanations of thinking, [20] V. A. Aleven, K. R. Koedinger, An effective metacog- Mind, Culture, and Activity 5 (1998) 178–186. nitive strategy: Learning by doing and explaining [35] K. R. Koedinger, R. S. Baker, K. Cunningham, with a computer-based cognitive tutor, Cognitive A. Skogsholm, B. Leber, J. Stamper, A data repos- science 26 (2002) 147–179. itory for the edm community: The pslc datashop, [21] H. Zhao, K. Koedinger, J. Kowalski, Knowledge Handbook of educational data mining 43 (2010) tracing and cue contrast: Second language english 43–56. grammar instruction, in: Proceedings of the An- [36] K. R. Koedinger, R. S. Baker, K. Cunningham, nual Meeting of the Cognitive Science Society, vol- A. Skogsholm, B. Leber, J. Stamper, A data repos- ume 35, 2013. itory for the edm community: The pslc datashop, [22] N. M. Chang, Learning to discriminate and gener- Handbook of educational data mining 43 (2010) alize through problem comparisons, Ph.D. thesis, 43–56. Carnegie Mellon University, 2006. [37] M. M. Rahman, Y. Watanobe, T. Matsumoto, R. U. Ki- [23] LearnLab, Feature validity, 2011. URL: https:// ran, K. Nakamura, Educational data mining to sup- learnlab.org/wiki/index.php?title=Feature_validity, port programming learning using problem-solving [Online; accessed 29-May-2024]. data, IEEE Access 10 (2022) 26186–26202. [24] V. Aleven, B. McLaren, J. Sewall, K. R. Koedinger, [38] N. Ndou, R. Ajoodha, A. Jadhav, Educational data- Example-tracing tutors: A new paradigm for intel- mining to determine student success at higher edu- ligent tutoring systems (2009). cation institutions, in: 2020 2nd International Multi- [25] V. Aleven, B. M. McLaren, J. Sewall, K. R. Koedinger, disciplinary Information Technology and Engineer- The cognitive tutor authoring tools (CTAT): Prelim- ing Conference (IMITEC), IEEE, 2020, pp. 1–8. inary evaluation of efficiency gains, in: Intelligent [39] J. M. Patil, S. R. Gupta, Extracting knowledge in Tutoring Systems: 8th International Conference, large synthetic datasets using educational data min- ITS 2006, Jhongli, Taiwan, June 26-30, 2006. Pro- ing and machine learning models, in: Soft Comput- ceedings 8, Springer, 2006, pp. 61–70. ing for Intelligent Systems: Proceedings of ICSCIS [26] K. R. Koedinger, J. R. Anderson, W. H. Hadley, M. A. 2020, Springer, 2021, pp. 167–175. Mark, Intelligent tutoring goes to school in the big [40] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, city, International Journal of Artificial Intelligence L. J. Guibas, J. Sohl-Dickstein, Deep knowledge in Education 8 (1997) 30–43. tracing, Advances in neural information processing [27] M. T. Chi, P. J. Feltovich, R. Glaser, Categorization systems 28 (2015). and representation of physics problems by experts [41] M. C. Desmarais, I. Pelczer, On the faithfulness of and novices, Cognitive science 5 (1981) 121–152. simulated student performance data, in: Educa- [28] B. H. Ross, Remindings and their effects in learning tional Data Mining 2010, 2010. a cognitive skill, Cognitive psychology 16 (1984)