<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Large Language Models to Promote AI-Infused STEM Problem-Solving for Middle School Students</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ananya Rao</string-name>
          <email>arrao3@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krish Piryani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shiyan Jiang</string-name>
          <email>jiangshiyan2013@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tifany Barnes</string-name>
          <email>tmbarnes@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jennifer Albert</string-name>
          <email>jalbert@citadel.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marnie Hill</string-name>
          <email>mehill6@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bita Akram</string-name>
          <email>bakram@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State University</institution>
          ,
          <addr-line>NC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Citadel - The Military College of South Carolina</institution>
          ,
          <addr-line>SC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Pennsylvania</institution>
          ,
          <addr-line>PA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>AI-infused STEM problem-solving is becoming an increasingly important skill for the future STEM workforce, requiring innovative systems to support student learning. Integrating AI into STEM problem-solving requires building a strong foundation in students' computational thinking skills, which can be supported through carefully designed, technically advanced systems. In this paper, we propose augmenting a block-based programming environment specialized for AI-infused STEM problem-solving with large language model (LLM) capabilities to reinforce key computational thinking skills. Through our prior experimentation with students, we have identified three major computational thinking skills essential for mastering learning and transferring knowledge between contexts: abstraction, algorithmic thinking, and generalization. To this end, we enhance an LLM with knowledge about the high-level concept of breadth-first search (abstraction) and the ability to situate students' steps within the required steps of the BFS algorithm (algorithmic thinking). We then evaluate the LLM's ability to provide adaptive feedback as students implement BFS in various STEM contexts (generalization). We present a proof-of-concept evaluation demonstrating how an LLM trained with general BFS knowledge can provide adaptive, contextualized feedback in three diferent scientific scenarios: pathfinding (mathematics), contact-tracing (biology), and the time-to-live (TTL) package algorithm (networks). This functionality allows our environment to support students in developing abstraction, algorithmic thinking, and generalization skills when applying BFS to scientific problem-solving.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;K-12 AI Education</kwd>
        <kwd>Computational Thinking for AI</kwd>
        <kwd>Personalized Feedback</kwd>
        <kwd>LLMs for AI Education</kwd>
        <kwd>AI-Infused STEM Problem Solving</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial intelligence (AI) in science, technology, engineering, and mathematics (STEM) fields is applied
across a wide range of areas, from predictive modeling in climate science to automated data analysis in
genomics and intelligent robotics in engineering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These advancements underscore the importance of
equipping learners with AI literacy — a foundational understanding of AI concepts and their integration
into real-world problem solving [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For many years, research communities such as the AAAI task
force, EAAI, and those in education and computer science education have recognized the critical need
to broaden AI literacy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the advent of large language models (LLMs) has amplified this
need, as these technologies have seamlessly integrated into everyday life and work, making it vital for
all disciplines to engage with AI concepts efectively.
      </p>
      <p>
        A cornerstone of AI literacy is computational thinking (CT), a set of problem-solving skills rooted
in computer science that enables learners to approach complex challenges systematically [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
Despite its significance, many educational curricula fail to emphasize CT skills as a critical foundation
for AI-integrated problem solving. This gap is particularly notable in STEM education, where AI’s
transformative potential demands a deep understanding of its underlying principles and computational
approaches. Addressing this gap requires not only teaching AI concepts but also providing ample,
carefully crafted scafolding to support learners in building robust CT skills.
      </p>
      <p>The emergence of LLMs ofers unprecedented opportunities to reimagine educational practices in AI
and CT. These advanced models can enable scalable, engaging, and efective scafolding for learners,
making it possible to design innovative curricula that enhance CT skills within the context of AI-infused
problem solving. By leveraging LLMs, educators and researchers can create a new generation of tools
and strategies that empower students to navigate the evolving landscape of STEM and AI-driven
innovation.</p>
      <p>
        In this paper, we draw on AI education literature and our experience integrating AI into biological
sciences for middle school students to highlight the role of three essential computational thinking
skills—abstraction, algorithmic thinking, and generalization—in empowering students to independently
apply AI in STEM problem-solving [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We first introduce our learning environment, I-SAIL, a visual
programming platform designed specifically for AI-infused STEM problem-solving [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. I-SAIL is
accompanied by an integrated curriculum that teaches the BFS algorithm and its application to a
variety of STEM problems. Our curriculum follows a use-modify-create approach: during the use phase,
students learn the fundamentals of AI; during the modify and create phases, they apply and adapt their
knowledge to solve STEM-focused problems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Finally, we describe our plans to design a tailored and fine-tuned LLM that will augment I-SAIL with
narratives to support abstraction, algorithmic thinking, and generalization. First, the LLM will provide
dynamic, context-sensitive narrations to reinforce students’ abstract understanding of the algorithm.
Then, the LLM will deliver feedback to guide students as they apply algorithmic thinking to implement
BFS using a visual coding scheme. Finally, the LLM will support students with targeted feedback and
narration to help them generalize their understanding of BFS to a novel STEM context. We discuss the
steps we take to train an LLM for dynamic generation of the narration and how we augment it with
tools to process students’ problem-solving steps to provide adaptive scafolding.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. AI-Infused STEM Problem-Solving for K-12</title>
        <p>
          AI-infused STEM curricula aim to integrate AI concepts and tools into traditional K-12 STEM learning
environments, empowering students to engage with real-world problems through computational and
data-driven approaches [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. These curricula emphasize interdisciplinary thinking by embedding AI
principles such as machine learning, data analysis, and algorithm design into STEM lessons, preparing
students to understand AI’s transformative role in scientific and technical fields [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. However, existing
eforts often overemphasize machine learning while neglecting the broader landscape of AI
methodologies and their impact on scientific inquiry and problem-solving [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Additionally, little attention has
been given to how AI reshapes STEM practices, influencing hypothesis generation, experimentation,
and the ethical dimensions of computational models. Addressing these gaps is essential for fostering a
more comprehensive understanding of AI and its implications within and beyond STEM disciplines.
        </p>
        <p>
          Learning technologies provide students with tangible tools to explore AI concepts, yet many fail to
bridge the gap between AI algorithms and STEM problem-solving. Block-based platforms like MIT’s
Scratch, now enhanced with AI extensions [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], ofer an intuitive introduction, while tools like Google’s
Teachable Machine [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] allow students to train models without delving into algorithmic mechanisms.
Data science platforms such as Orange support deeper engagement by allowing students to visualize
and analyze datasets, yet they often lack guidance on selecting appropriate algorithms for specific STEM
challenges[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. For example, while neural networks may be ideal for image classification in biology,
decision trees or linear regression may be more efective for interpretable predictions in chemistry
or environmental science. Without explicit support for understanding these trade-ofs, students may
struggle to critically evaluate, adapt, and apply AI models in meaningful STEM contexts. To fully
realize the potential of AI-infused STEM education, learning technologies must go beyond black-box
implementations, fostering students’ ability to connect algorithmic reasoning with domain-specific
problem-solving.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Scafolding</title>
        <p>
          Adaptive scafolding in visual programming environments has been widely studied as a means of
supporting K-12 learners in developing computational thinking and programming skills. These systems
leverage various techniques, such as real-time feedback, personalized hints, and task-specific guidance,
to address the diverse needs of learners [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For example, adaptive systems like ProgSnap2 [14]
provide context-aware feedback and debugging support, allowing students to identify and correct
errors while building problem-solving strategies. This body of research highlights the importance of
balancing guidance with promoting independence, as overly directive scafolding can hinder learners’
ability to develop autonomy and critical thinking skills. Additionally, studies [15] emphasize the role of
adaptive scafolding in fostering engagement and persistence, particularly for students with limited prior
programming experience. While these systems have demonstrated significant promise in enhancing
learning outcomes, they often rely on pre-programmed rules or static models of learner behavior,
limiting their ability to ofer the dynamic, nuanced support that LLMs can provide [16].
        </p>
        <p>
          Recent advancements in LLMs have opened opportunities to scafold learning in visual programming
environments by providing AI-powered assistance tailored to the needs of learners in K-12 education.
For instance, systems like ChatScratch[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] integrate LLMs to empower children aged 6-12 to engage
in autonomous programming by generating suggestions, explanations, and solutions based on their
programming goals. Similarly, Scratch Copilot [17] explores the potential of AI to support creative
coding for families, emphasizing collaborative learning scenarios. This approach demonstrates how
AI can scafold the programming process by suggesting next steps, debugging errors, and providing
explanations to enhance understanding, particularly in multi-user learning environments. Both systems
illustrate how LLMs can serve as co-creators and tutors, fostering creative exploration and computational
skill-building. Building on these studies, the potential of LLMs for scafolding visual programming can
be further explored to support AI-infused STEM problem-solving.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. I-SAIL Integrated Learning Environment for AI-Infused STEM</title>
    </sec>
    <sec id="sec-4">
      <title>Problem Solving</title>
      <sec id="sec-4-1">
        <title>3.1. I-SAIL Learning Environment</title>
        <p>The I-SAIL learning environment builds on the strengths of block-based programming (BBP) to lower
entry barriers for novice learners while enabling AI-powered problem-solving in scientific contexts.
Extending Snap! [18], I-SAIL incorporates pre-programmed AI infrastructures, customizable scientific
models, and adaptive learning pathways, to engage students with real-world scientific challenges.
Similar to existing block-based platforms such as BlockPy [19] for data analysis and NetsBlox [20] for
networking, I-SAIL specialized BBP to integrate AI-specific functionalities and make AI concepts more
accessible and applicable to interdisciplinary STEM problems. A key innovation of I-SAIL lies in its
dual approach: it introduces custom AI blocks that simplify complex programming without sacrificing
conceptual depth and integrates dynamic simulations and AI-driven modeling scenarios to support
AI learning within STEM contexts. By bridging AI and STEM education, I-SAIL aligns with AI4K12
progression standards [21], fostering an inclusive and interdisciplinary learning environment that
prepares students for AI-driven scientific problem-solving.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. I-SAIL Pedagogy</title>
        <p>Our curriculum integrates the Next Generation Science Standards (NGSS) and the AI4K12 framework
[21] to ensure alignment with disciplinary best practices while fostering engagement at the
intersection of science, engineering, and artificial intelligence. Research shows that integrating AI and
STEM activities enhances student learning by reinforcing interdisciplinary connections and deepening
conceptual understanding [22]. To make AI-infused STEM education more accessible, we emphasize
contextualizing learning within culturally and personally relevant problem spaces, allowing students to
see AI as a practical tool for solving real-world challenges. This approach not only enhances motivation
and engagement, particularly among underrepresented groups in computing, but also reinforces AI’s
applicability in domains such as climate science, health, and engineering [23]. By framing AI as both
a scientific tool and a subject of study, our curriculum highlights the reciprocal relationship between
scientific inquiry and computational modeling, enabling students to develop authentic problem-solving
skills while critically examining AI’s role in advancing scientific discovery. Embedding AI learning
within familiar STEM contexts fosters a sense of belonging in computationally rich fields, bridging gaps
between traditional STEM education and emerging AI-driven technologies.</p>
        <sec id="sec-4-2-1">
          <title>3.2.1. Activity Design</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Path Finding</title>
        <p>The activities in the Use section are designed to help students develop a deep and practical understanding
of the targeted AI algorithm. For instance, in the case of Breadth-First Search (BFS), students engage
with a series of interactive demonstrations that illustrate the algorithm’s mechanics. These activities
guide students in exploring how BFS operates by analyzing how it identifies the shortest path between
two cities on a map. Pathfinding is chosen as a learning context because it provides a relatable, concrete,
and visually intuitive example, allowing students to understand the step-by-step execution of BFS and
its real-world applications.</p>
      </sec>
      <sec id="sec-4-4">
        <title>3.4. Contact Tracing APP</title>
        <p>Beyond pathfinding, we integrate scientific and computing-rich contexts to further enhance students’
understanding of BFS. One such example is contact tracing in public health, where students design an
app to notify individuals exposed to an infectious disease. The app employs a graph-based representation
of a social network and utilizes BFS to determine the shortest path from an infected individual to others
in the network, thereby identifying those at highest risk and needing self-quarantine.</p>
      </sec>
      <sec id="sec-4-5">
        <title>3.5. Package Time-to-Live</title>
        <p>Additionally, we introduce Time to Live (TTL) as a computing-centric context for learning BFS. TTL is
widely used in network routing to prevent data packets from looping indefinitely by setting a limit on
the number of hops a packet can take. By examining how BFS is applied in TTL, students gain insights
into its relevance in network protocols, data transmission, and system eficiency.</p>
      </sec>
      <sec id="sec-4-6">
        <title>3.6. LLM Integration</title>
        <p>In recent years, LLMs have demonstrated remarkable adaptability across various domains, making them
promising tools for delivering contextualized and adaptive feedback to students. However, integrating
LLMs into K-12 classrooms presents unique challenges, requiring careful training and guidance to
ensure they provide meaningful support. In this work, we focus on training an LLM to interpret student
problem-solving steps within the broader context of the Breadth-First Search (BFS) algorithm. Our
approach involves equipping the model with an understanding of the problem-solving activities in the
I-SAIL environment and guiding it to generate individualized feedback that is efective, relevant, and
aligned with computational thinking principles.</p>
      </sec>
      <sec id="sec-4-7">
        <title>3.7. Contextualization</title>
        <p>We trained a large language model (LLM) to provide context-aware, adaptive feedback by providing
detailed information about search algorithms and structured problem-solving in the I-SAIL learning
environment. The first stage of training involved introducing the LLM to search as a fundamental
concept in artificial intelligence, emphasizing its role in systematically exploring problem spaces to find
optimal solutions. This foundation allowed the model to recognize structured search strategies and
diferentiate between correct and incorrect problem-solving approaches.</p>
        <p>Next, we refined the LLM’s ability to interpret student interactions in the I-SAIL system. Since
students interact with a graph-based simulation, making decisions on which elements to explore and
in what order, the LLM needed to evaluate these decisions in real-time. Training prompts defined the
mechanics of student engagement, such as adding elements to a search queue and processing them step
by step, ensuring the model could assess correctness, detect errors, and provide relevant feedback.</p>
        <p>To further improve accuracy, we simulated a range of student interactions, incorporating both correct
and incorrect search sequences. This enabled the LLM to recognize common mistakes (e.g., skipping
steps, exploring elements out of order, or revisiting processed elements) and adjust its responses
accordingly. The model was also refined to ensure feedback was pedagogically efective—guiding
students without directly providing solutions, thereby reinforcing structured reasoning and independent
problem-solving.</p>
      </sec>
      <sec id="sec-4-8">
        <title>3.8. Synthetic Data Generation</title>
        <p>We generated synthetic data comprising graphs with varying BFS problem complexities and simulated
student problem-solving sequences in the I-SAIL environment. These datasets included both correct
and incorrect solution paths, enabling the LLM to learn structured problem-solving patterns, identify
errors, and refine its feedback mechanisms for real student interactions.</p>
        <sec id="sec-4-8-1">
          <title>3.8.1. Graph Generation</title>
          <p>To provide students with multiple opportunities to practice BFS, we prompted the LLM to generate
new graphs of varying dificulty. Specifically, we provided the LLM with multiple adjacency matrices
from previously used BFS exercises to establish a baseline understanding of graph structures relevant
to the learning environment. Based on these examples, the LLM generated two new graphs: one with a
simpler structure, which was later adapted into the contact tracing activity (see Figure 2), and another
more complex graph, which formed the basis for the Time-to-Live (TTL) package activity (see Figure 3).
In the future, this mechanism can be used to automatically generate diverse problem-solving scenarios,
enabling students to achieve mastery learning.</p>
        </sec>
        <sec id="sec-4-8-2">
          <title>3.8.2. Problem-Solving Scenario Generation</title>
          <p>To generate realistic student problem-solving scenarios, we employed prompt engineering to guide
the LLM in producing both correct and incorrect BFS-based solutions. The prompt design focused on
simulating step-by-step student interactions, incorporating common mistakes, and ensuring adherence
to BFS principles while introducing controlled errors for feedback generation.</p>
          <p>The initial prompts instructed the LLM to generate ten problem-solving sequences per scenario,
ensuring a balance between correct and incorrect solutions. Each sequence simulated student
decisionmaking while applying BFS, capturing variations in how students might approach the problem. Incorrect
solutions were structured to reflect common errors, such as selecting an unreachable node, skipping
neighbors, and mismanaging the BFS fringe. A key aspect of the prompt design was to maintain BFS
constraints while allowing for realistic errors that students commonly make.</p>
          <p>Refinements were necessary to improve the accuracy and diversity of the generated sequences.
Early outputs sometimes violated BFS ordering, requiring adjustments to emphasize strict level-order
traversal. Additionally, incorrect sequences initially overrepresented certain mistake types, leading
to prompt modifications that ensured a broader distribution of errors across the dataset. Another
refinement involved preventing logically impossible outputs, as some mistakes generated by the LLM
were inconsistent with the given adjacency matrix. These refinements improved the model’s ability
to produce both valid and pedagogically meaningful scenarios. After scenario generation, manual
evaluations were conducted to verify correctness and instructional relevance.</p>
        </sec>
        <sec id="sec-4-8-3">
          <title>3.8.3. Individualized Contextualized Feedback Generation</title>
          <p>Large language models can be leveraged to dynamically adapt narratives based on students’ interactions
with the learning environment, serving as co-authors of high-level explanations that guide them through
problem-solving. While LLMs demonstrate a strong capacity for explaining scientific concepts, their
efectiveness depends on ensuring that (1) explanations match the student’s level, (2) they encourage
abstract thinking, and (3) they provide suficient information to help students translate abstractions
into algorithmic implementations. To achieve this, the feedback system was designed to provide
contextualized and individualized guidance. By evaluating each student action in real time, the system
identifies deviations from BFS principles and generates feedback that aligns with the specific problem
scenario, reinforcing both conceptual understanding and problem-solving skills.</p>
          <p>A key aspect of the design was ensuring that feedback is context-aware, meaning that it adapts to the
domain of the problem. For example, in a pathfinding activity, feedback references cities and roads, while
in a contact tracing scenario, it refers to people and social connections. This contextualization helps
students relate algorithmic concepts to real-world applications, reinforcing computational thinking and
problem-solving skills.</p>
          <p>Additionally, the feedback system is designed to be non-directive, providing strategic hints rather
than explicitly stating errors. The intent is to encourage students to reflect on their mistakes and refine
their approach, promoting active learning. To maintain logical consistency, feedback is structured to
assume that all previous steps were correct, preventing confusion and allowing students to focus on
resolving the immediate issue.</p>
        </sec>
        <sec id="sec-4-8-4">
          <title>3.8.4. An Exemplary Scenario and Feedback</title>
          <p>In this section, we discuss one of the LLM-generated problem-solving scenarios along with its
corresponding feedback. We further show examples of feedback generated in other domains for a similar
mistake.</p>
          <p>In a pathfinding scenario 1, a student applies BFS to find the shortest route from New Orleans (NOL) to
Raleigh but incorrectly fails to add all neighboring cities before exploring(dequeuing) another, violating
BFS principles. The student’s actions follow this sequence:</p>
          <p>NOL-G (Add New Orleans to the queue), NOL-F (Explore New Orleans), MOB-G (Add Mobile to the
queue), MOB-F (Explore Mobile), MON-G (Add Montgomery to the queue), TAL-G (Add Tallahassee late),
MON-F (Explore Montgomery), BIR-G (Add Birmingham to the queue), BIR-F (Explore Birmingham).</p>
          <p>Here, -G indicates adding a city to the queue (fringe), while -F represents fully exploring a city by
dequeuing it and expanding its neighbors. The mistake occurs when Mobile is explored before all of
New Orleans’ direct neighbors (like Tallahassee) are added to the queue, disrupting BFS’s level-order
traversal. The generated feedback states:</p>
          <p>“A city was not added before dequeuing another. BFS requires that all children of a city be added to
the fringe before moving on. Every path with the same distance from the start should have a chance to
be explored before moving further.”</p>
          <p>A similar mistake occurs in contact tracing, where a student fails to add all direct contacts before
investigating the next exposed individual, prompting the feedback:</p>
          <p>“An individual was skipped before confirming their exposure risk. BFS guarantees that every contact
at the same level is processed before moving to the next level. Ensure that all individuals at the same
exposure level are traced before continuing.”</p>
          <p>In network TTL, this mistake translates to forwarding a packet before processing all at the current
TTL level, leading to the feedback:</p>
          <p>“Before processing node 5 (F-action), ensure that all its required neighbors have been added to the
TTL queue. BFS requires all directly connected nodes to be added before continuing to process another
node. Skipping an addition step before expanding node 5 could result in ineficient routing.”</p>
          <p>These examples illustrate how BFS principles apply across domains, with contextualized feedback
ensuring students correctly implement BFS in diferent problem-solving contexts.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Evaluation</title>
      <p>Evaluating LLM-generated feedback for students developing computational thinking (CT)
skills—particularly abstraction, algorithmic thinking, and generalization—requires a structured rubric to
assess its conceptual accuracy, depth, actionability, generalization, and clarity. Efective feedback should
not only be conceptually sound but also specific, actionable, and transferable to various computational
problems (Grover &amp; Pea, 2013). The rubric (see table) evaluates feedback along five key dimensions: (1)
conceptual accuracy, explanations of BFS are correct; (2) depth and specificity, determining whether
feedback provides detailed, context-specific insights; (3) actionability, assessing whether students receive
clear next steps; (4) contextualization &amp; transferability, measuring how well the feedback connects BFS
concepts to broader STEM applications; and (5) coherence &amp; clarity, the feedback is logically structured
and easy to understand.</p>
      <p>To conduct the evaluation, two experts independently graded 30% of scenarios, followed by a
discussion to resolve discrepancies. After achieving consensus, one expert proceeded to evaluate all 30
problem-solving scenarios. The overall evaluation results indicate that the LLM-generated feedback
performed most efectively in Depth and Specificity (2.73) and Actionability (2.83), suggesting that
feedback provided clear, structured explanations and actionable next steps. The results show a Coherence
score of (2.70), implying that while the feedback was generally well-structured, there were occasional
inconsistencies in logical flow.</p>
      <p>The overall evaluation results indicate that the feedback system performed most efectively in
Actionability (2.83/3.0), Depth and Specificity (2.73/3.0), and Contextualization (2.70/3.0). The system
successfully provided clear, structured feedback, ofering actionable next steps that aligned with students’
problem-solving processes. However, Conceptual Accuracy (2.47/3.0) was slightly lower, suggesting
occasional misinterpretations of BFS principles or inconsistencies in algorithmic explanations. Similarly,
Coherence (2.70/3.0) was generally strong but showed some instances where feedback could have been
more logically structured or explicitly linked to prior student actions.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>This study explores the integration of large language models (LLMs) into AI-infused STEM education
to enhance students’ learning with regard to computational thinking skills, including abstraction,
algorithmic thinking, and generalization. Our approach involved synthetic data generation to model
student problem-solving behaviors, leveraging an LLM to generate contextualized and individualized
feedback tailored to student interactions. By embedding these capabilities into our learning environment,
we aimed to support students in understanding and applying BFS-based problem-solving strategies
across diverse STEM contexts.</p>
      <p>In evaluating LLM-generated feedback, we found that the system efectively provided actionable,
structured guidance, particularly in reinforcing algorithmic thinking and generalization. The feedback
was well-aligned with student actions, ofering strategic hints rather than direct answers, which
encouraged deeper engagement with the problem-solving process. However, conceptual accuracy varied,
with some responses misinterpreting BFS principles or oversimplifying explanations. Additionally,
balancing specificity with generalization proved challenging, as overly detailed feedback sometimes
constrained students’ ability to transfer knowledge across contexts. Refining prompt design to improve
the clarity, depth, and adaptability of explanations is a key area for future work, ensuring that feedback
remains pedagogically sound and tailored to diferent learning needs.</p>
      <p>Ultimately, this work demonstrates the potential for LLM-enhanced learning environments to provide
scalable, personalized support in computational thinking education. By continuously refining synthetic
data generation, prompt engineering, and feedback mechanisms, we aim to create a system that
empowers students to engage in AI-driven problem-solving while fostering transferable computational
skills applicable across STEM disciplines.</p>
      <p>Criteria
3 (Strong)
2 (Moderate)</p>
      <p>1 (Weak)
Feedback provides
accurate and precise
explanations of BFS.</p>
      <p>Feedback is detailed
and context-specific,
clearly addressing
the student’s
computational thinking
skill gaps.</p>
      <p>Feedback includes
clear, actionable
steps for students
to improve their
abstraction,
algorithmic thinking, or
generalization skills.</p>
      <p>Feedback is partially
correct but includes minor
inaccuracies or lacks
precision.</p>
      <p>Feedback contains major
conceptual errors or
misrepresents BFS concepts.</p>
      <p>Feedback is somewhat
specific, but lacks depth
or relevant details.</p>
      <p>Feedback is too vague or
generic, failing to guide
learning.</p>
      <p>Feedback suggests im- Feedback lacks
actionprovements, but may be able suggestions,
maktoo general or require in- ing it unclear how to
imterpretation. prove.</p>
      <p>Feedback clearly con- Feedback partially
supnects BFS to diverse ports contextualization
STEM contexts and but may lack tangible
problem-solving sce- connections.
narios.</p>
      <p>Feedback fails to put
the problem-solving
scenario in the scientific
context.</p>
      <p>Feedback is logically
structured and easy
to understand.</p>
      <p>Feedback is somewhat
clear, but may contain
minor ambiguities.</p>
      <p>Feedback is confusing,
disorganized, or dificult
to interpret.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion and Future Work</title>
      <p>This study explored the use of large language models (LLMs) to enhance AI-infused STEM
problemsolving, using breadth-first search (BFS) as a case study. We developed a structured approach that
leverages synthetic data generation to create problem-solving scenarios and LLM-generated feedback
to provide targeted, context-aware guidance. Our evaluation demonstrated that the system efectively
produced actionable feedback, aligning with structured problem-solving strategies. However,
variability in conceptual accuracy and the ability to capture nuanced student reasoning remain areas for
improvement.</p>
      <p>Future work will focus on refining synthetic data generation by incorporating real student interactions
to enhance authenticity and diversity in problem-solving scenarios. Additionally, we aim to optimize
LLM-generated feedback by fine-tuning prompt engineering techniques to improve explanation clarity
and conceptual accuracy. Through these advancements, we aim to create a scalable and intelligent
support system that enhances AI-driven problem-solving in STEM education for a diverse range of
students.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research was supported by the National Science Foundation (NSF) under Grant DUE-2405862.
Any opinions, findings, and conclusions expressed in this material are those of the authors and do not
necessarily reflect the views of the NSF.</p>
    </sec>
    <sec id="sec-9">
      <title>7. Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT, Grammarly in order to: Grammar
and spelling check, Paraphrase and reword. After using this tool/service, the author(s) reviewed and
edited the content as needed and take(s) full responsibility for the publication’s content.
programming: Design and evaluation, IEEE Transactions on Learning Technologies 15 (2022)
406–420.
[14] T. W. Price, D. Hovemeyer, K. Rivers, G. Gao, A. C. Bart, A. M. Kazerouni, B. A. Becker, A. Petersen,
L. Gusukuma, S. H. Edwards, et al., Progsnap2: A flexible format for programming process data,
in: Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science
Education, 2020, pp. 356–362.
[15] W. Min, B. Mott, J. Lester, Adaptive scafolding in an intelligent game-based learning environment
for computer science, in: Proceedings of the workshop on AI-supported education for computer
science (AIEDCS) at the 12th international conference on intelligent tutoring systems, 2014, pp.
41–50.
[16] M. A. Razafinirina, W. G. Dimbisoa, T. Mahatody, Pedagogical alignment of large language models
(llm) for personalized learning: a survey, trends and challenges, Journal of Intelligent Learning
Systems and Applications 16 (2024) 448–480.
[17] S. Druga, N. Otero, Scratch copilot evaluation: Assessing ai-assisted creative coding for families,
arXiv preprint arXiv:2305.10417 (2023).
[18] B. Harvey, D. D. Garcia, T. Barnes, N. Titterton, D. Armendariz, L. Segars, E. Lemon, S. Morris,
J. Paley, Snap!(build your own blocks), in: Proceeding of the 44th ACM technical symposium on
Computer science education, 2013, pp. 759–759.
[19] A. C. Bart, J. Tibau, E. Tilevich, C. A. Shafer, D. Kafura, Blockpy: An open access data-science
environment for introductory programmers, Computer 50 (2017) 18–26.
[20] B. Broll, A. Ledeczi, Distributed programming with netsblox is a snap!, in: Proceedings of the
2017 ACM SIGCSE Technical Symposium on Computer Science Education, 2017, pp. 640–640.
[21] D. Touretzky, C. Gardner-McCune, D. Seehorn, Machine learning and the five big ideas in ai,</p>
      <p>International Journal of Artificial Intelligence in Education 33 (2023) 233–266.
[22] E. Hestness, R. C. McDonald, W. Breslyn, J. R. McGinnis, C. Mouza, Science teacher professional
development in climate change education informed by the next generation science standards,
Journal of Geoscience Education 62 (2014) 319–329.
[23] A. Eguchi, H. Okada, Y. Muto, Contextualizing ai education for k-12 students to enhance their
learning of ai literacy through culturally responsive approaches, KI-Künstliche Intelligenz 35
(2021) 153–161.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , K. DesPortes, Y. Bergner,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Moore</surname>
          </string-name>
          , Y. Cheng, B.
          <string-name>
            <surname>Perret</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Walsh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Guggenheim</surname>
          </string-name>
          , et al.,
          <article-title>Agents, models, and ethics: Importance of interdisciplinary explorations in ai education</article-title>
          ,
          <source>in: Proceedings of the 16th International Conference of the Learning Sciences-ICLS</source>
          <year>2022</year>
          , pp.
          <fpage>1763</fpage>
          -
          <lpage>1770</lpage>
          , International Society of the Learning Sciences,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Akram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yoder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Boorugu</surname>
          </string-name>
          , I. Aderemi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Towards an ai-infused interdisciplinary curriculum for middle-grade classrooms</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>36</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>12681</fpage>
          -
          <lpage>12688</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Touretzky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seehorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breazeal</surname>
          </string-name>
          , T. Posner, Special session:
          <article-title>Ai for k-12 guidelines initiative</article-title>
          ,
          <source>in: Proceedings of the 50th ACM technical symposium on computer science education</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>492</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cateté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lytle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Boulden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Akram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Houchins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mott</surname>
          </string-name>
          , et al.,
          <article-title>Infusing computational thinking into middle grade science classrooms: lessons learned</article-title>
          ,
          <source>in: Proceedings of the 13th workshop in primary and secondary computing education</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lytle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cateté</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Boulden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Akram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Houchins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          , E. Wiebe,
          <article-title>Ceo: A triangulated evaluation of a modeling-based ct-infused cs activity for non-cs middle grade students</article-title>
          ,
          <source>in: Proceedings of the ACM Conference on Global Computing Education</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yoder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tatar</surname>
          </string-name>
          , I. Aderemi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Boorugu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Akram</surname>
          </string-name>
          ,
          <article-title>Gaining insight into efective teaching of ai problem-solving through csedm: A case study</article-title>
          ,
          <source>in: 5th Workshop on computer science educational data mining</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Houchins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Boulden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Boyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. N.</given-names>
            <surname>Wiebe</surname>
          </string-name>
          ,
          <article-title>How use-modify-create brings middle grades students to computational thinking</article-title>
          ,
          <source>International Journal of Designs for Learning</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. DiPaola, C. Breazeal,
          <article-title>Developing middle school students' ai literacy</article-title>
          ,
          <source>in: Proceedings of the 52nd ACM technical symposium on computer science education</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jantaraweragul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Glazewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Scribner</surname>
          </string-name>
          , A. OttenbreitLeftwich, C. E.
          <string-name>
            <surname>Hmelo-Silver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lester</surname>
          </string-name>
          ,
          <article-title>Investigating a visual interface for elementary students to formulate ai planning tasks</article-title>
          ,
          <source>Journal of Computer Languages</source>
          <volume>73</volume>
          (
          <year>2022</year>
          )
          <fpage>101157</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wu</surname>
          </string-name>
          , L. Sun,
          <string-name>
            <surname>Chatscratch:</surname>
          </string-name>
          <article-title>An ai-augmented system toward autonomous visual programming learning for children aged 6-12</article-title>
          , in:
          <source>Proceedings of the CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Webster</surname>
          </string-name>
          , I. Alvarado,
          <string-name>
            <given-names>K.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Howell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grifith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jongejan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pitaru</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Chen,</surname>
          </string-name>
          <article-title>Teachable machine: Approachable web-based tool for exploring machine learning classification, in: Extended abstracts of the 2020 CHI conference on human factors in computing systems</article-title>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vaishnav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <article-title>Comparison of machine learning algorithms and fruit classification using orange data mining tool</article-title>
          , in:
          <year>2018</year>
          <article-title>3rd international conference on inventive computation technologies (ICICT)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>603</fpage>
          -
          <lpage>607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Akram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <article-title>Adaptive immediate feedback for block-based</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>