<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Interactive-Constructive- Active-Passive Framework,” Journal of Educational Computing
Research</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1177/07356331221136888</article-id>
      <title-group>
        <article-title>Towards Learning Analytics for Interdisciplinary Learning: Leveraging Knowledge-empowered Fine-Tuned GPT Models⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tianlong Zhong</string-name>
          <email>TIANLONG001@e.ntu.edu.sg</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaoxia Zhu</string-name>
          <email>gaoxia.zhu@nie.edu.sg</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Swee Chiat Low</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Siyuan Liu</string-name>
          <email>syliu@ntu.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computing &amp; Data Science, Nanyang Technological University</institution>
          ,
          <addr-line>50 Nanyang Ave</addr-line>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Energy Research Institute @ NTU, Graduate College, Nanyang Technological University</institution>
          ,
          <addr-line>50 Nanyang Ave</addr-line>
          ,
          <country country="SG">Singapore</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Institute of Education, Nanyang Technological University</institution>
          ,
          <addr-line>1 Nanyang Walk</addr-line>
          ,
          <country country="SG">Singapore</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>61</volume>
      <issue>5</issue>
      <fpage>334</fpage>
      <lpage>353</lpage>
      <abstract>
        <p>GPT models' ability to automatically score students' writing makes them promising to assess students' interdisciplinary learning quality, a significant but unaddressed gap. While standard GPT models have challenges in understanding contextual knowledge, previous research suggests that knowledge-empowered fine-tuned (KEFT) GPT models can overcome the limitations. This study examined 1) whether KEFT GPT models can accurately label interdisciplinary learning quality based on learning process and outcome data, and 2) how to implement these models within a learning analytics (LA) platform, including three major steps. First, to establish a ground truth dataset, two pairs of researchers independently coded and discussed the interdisciplinary learning quality of 400 online posts and 190 sections from 16 essays based on an interdisciplinary learning quality codebook. Second, we employed KEFT GPT models to evaluate interdisciplinary learning quality. Results indicated that the models achieved accuracy comparable to human researchers. Third, the models were integrated into an LA platform, TopicWise, which automates evaluation and provides tailored feedback. This study showcased the feasibility of applying KEFT GPT models in LA to analyse student learning processes and outcomes. Next, we will conduct user studies to examine TopicWise's impact on students' interdisciplinary learning and identify areas for improvement.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;GPT</kwd>
        <kwd>Prompt engineering</kwd>
        <kwd>Fine-tuning</kwd>
        <kwd>Interdisciplinary learning</kwd>
        <kwd>Learning analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Interdisciplinary learning combines perspectives, methods, and strategies from various disciplines to
address complex issues that cannot be fully understood within a single field [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
approach can engage students with real-world challenges, foster critical thinking, creativity, and
critical problem-solving skills, and enhance their career readiness [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A significant challenge in
this domain is assessing the quality of interdisciplinary learning based on both the learning process
and outcome data, as itorigi often requires posthoc labour-intensive qualitative analysis of textual
data from multiple perspectives, such as diversity, cognitive advancement, disciplinary grounding,
and integration [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This challenge limits the possibility of effectively providing students with
timely feedback.
      </p>
      <p>
        ChatGPT, a chatbot powered by foundation large language models (LLMs) like GPT-3.5 and
GPT4o, developed by OpenAI [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], has shown promise in addressing the issue of effectively analysing
student text and providing feedback. For instance, Lee et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] applied chain-of-thought in automatic
essay scoring with accuracy above 60%. Latif and Zhai [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]used fine-tuned GPT models for
autoscoring in science education and achieved an average accuracy of 83.8%. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] utilised GPT-3 and
      </p>
      <p>
        GPT-4 to provide feedback to student essays and found the GPT models can provide more readable
and consistent feedback than human teachers in data science courses. These studies show promising
results in applying LLMs to automatically evaluate students' learning processes and outcomes and
provide feedback, an important research topic of LA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        LA is a research area that focuses on gathering, analysing, and reporting data about learners and
their environments to gain insights and improve both the learning experience and the conditions
that support it [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. However, developing effective LA to provide tailored feedback to users
requires the backend models to “acquire” task-specific knowledge, which standard GPT models lack.
Zhong et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] adopted "knowledge-empowered approaches" to integrate domain knowledge and
codebook rules into prompts to enhance LLM performance. They found that such approaches could
enhance the GPT-3.5 model's performance in evaluating students' interdisciplinary learning quality
on short posts (learning process data), but the effects of knowledge-empowered GPT models in essay
(learning outcome data) evaluation remain unclear. Furthermore, even though the importance of
interdisciplinary learning is well recognised, there is a lack of interdisciplinary LA that can provide
auto-assessment and real-time feedback. To address the research gaps, this ongoing work takes
initiative steps to develop interdisciplinary LA and explore the following two questions (RQs):
      </p>
      <p>RQ1: Can knowledge-empowered approaches increase GPT models' accuracy in analysing
interdisciplinary learning quality?</p>
      <p>RQ2: How can a prototype LA platform be designed to leverage GPT models for providing
automated feedback on students' interdisciplinary learning quality?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <sec id="sec-2-1">
        <title>2.1. Interdisciplinary Learning and LA</title>
        <p>
          Interdisciplinary learning refers to the process of incorporating knowledge and perspectives from
multiple disciplines to solve problems or explain phenomena beyond the boundary of a singular
discipline [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. However, this domain faces challenges in analysing and reporting students'
interdisciplinary learning quality, which calls for more rigorous and robust methods [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Qualitative
analysis, such as essay evaluation, is commonly used for interdisciplinary learning assessments. For
instance, Boix-Mansilla et al. [19] introduced the rubric for interdisciplinary writing, encompassing
four key dimensions: purposefulness, disciplinary grounding, integration, and critical awareness.
Kidron and Kali [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] expanded the integration dimension into the following sub-dimensions:
integrative lens, idea connection, disciplinary analysis through an integrative lens, and synthesis,
and use the updated rubric to assess students' essays. However, these assessments are post-hoc and
occur after data collection is done. There remains a gap in analysing learning process data in real
time and providing just-in-time feedback to guide and enhance students' interdisciplinary learning.
        </p>
        <p>
          LA can effectively analyse real-time learning processes by collecting process data from digital
tools like learning management systems (LMS), automatically analysing data with algorithms, and
providing visualised dashboards and personalised feedback [20]. Various applications of LA have
been utilised in interdisciplinary learning. For instance, Lee et al. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] used machine learning
methods to analyse STEM learning behaviours, categorising them as passive, active, constructive, or
interactive. Iku-Silan et al. [21] created a chatbot powered by natural language processing (NLP)
technology to support interdisciplinary learning. This chatbot provided students with personalised
advice and resources sourced from an interdisciplinary knowledge database. Tang et al. [22] designed
a platform aimed at enhancing K-12 STEM education by integrating machine learning into scientific
lesson plans. For instance, their platform used machine learning to analyse data related to heart
disease risk factors, enabling students to engage in scientific discovery more interactively. Yet, LA
tools that can analyse the interdisciplinary learning quality of students’ generated data are lacking.
A few techniques for enhancing GPT performances have been explored in the literature. Prompt
engineering is an important strategy for improving a model's performance by designing and
optimising model input [23]. Studies have shown that designing prompts can enhance the
performance of GPT for various tasks, including classification and reasoning [23], [24]. Moreover,
for some complex tasks, chain-of-thought (CoT) prompting is regarded as a useful technique of
prompt engineering [25]. CoT induces the model to solve a problem step-by-step, thus mimicking a
chain of thought and improving the model's reasoning ability [26].
        </p>
        <p>However, GPT's expertise in specific domains may be limited, which can result in nonsensical or
inappropriate responses to specific prompts [27]. Fine-tuning is a technique that can help mitigate
this limitation and improve GPT performance on specific tasks. Fine-tuning refers to the additional
training of pre-trained models to customise them for specific tasks or datasets [28]. One benefit of
this approach is the ability to tailor models to enhance their performance in specific tasks, which
requires only 50 to 100 examples for training [29]. The fine-tuned GPT models have been shown to
be effective in several studies. Chae and Davidson [30] suggested that fine-tuning is an optimal
solution for researchers due to its relatively high accuracy and low cost.</p>
        <p>
          However, fine-tuning methods rely heavily on the pre-training data [29], which may limit their
ability to handle tasks requiring knowledge not included in their initial training set. The
knowledgeempowered method, which incorporates external knowledge into the model, may further improve
GPT performance on specific tasks by expanding the model's understanding beyond the pre-training
dataset [31]. The basic premise of this technique is that by integrating additional information, models
can enhance their comprehension of content and generate better output [32]. Hu et al. [33] combined
domain knowledge (geo-knowledge) with GPT and showed that external knowledge is indispensable
for guiding the behaviour of GPT models. Similarly, Yang et al. [34] conducted a study to use external
knowledge bases to enhance pre-trained language models for machine reading comprehension. They
found that incorporating structured knowledge from knowledge bases significantly improved
models' accuracy on benchmarks like ReCoRD and SQuAD1.1. Overall, these studies have shown
that by integrating external knowledge into models, the performance of models in specific tasks
significantly improved [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>The promising results highlight the potential of using knowledge-empowered strategies to
analyse the interdisciplinary learning quality of students' work. In a recent study, we employed
knowledge-empowered approaches—such as dictionary-based knowledge to address terminology
that GPT models struggle to understand, and rule-based knowledge to capture implicit mechanisms
outlined in codebooks—to fine-tune GPT models. The findings revealed that these strategies
significantly enhanced the performance of GPT-3.5 in evaluating interdisciplinary learning process
data (e.g., online posts). However, this approach has yet to be applied to GPT-4 models or to learning
outcome data such as final essays. Building on these strategies, this study seeks to develop an
interdisciplinary LA platform capable of processing real-time data. The platform will provide
students with timely feedback on the quality of their interdisciplinary learning and offer actionable
suggestions to foster improvement.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The prototype learning analytics platform</title>
      <p>The following sections will present a prototype interdisciplinary LA platform, TopicWise, by giving
an overview of the platform, detailing how Knowledge-empowered fine-tuned GPT models have
been trained and their performance, and showing the user interface design.</p>
      <sec id="sec-3-1">
        <title>3.1. Overview of the LA platform</title>
        <p>
          This platform is designed to evaluate students’ interdisciplinary learning and provide actionable
feedback for improvement. As Figure 1 shows, students can upload files (e.g., essays) or short texts
(e.g., posts, discussions) through the user interface. After that, the text will be delivered to
knowledge-empowered fine-tuned models for processing to generate relevant feedback, including
comments on the text and suggestions for improvement from four dimensions of interdisciplinary
learning quality: diversity, cognitive advancement, disciplinary grounding, and integration [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The
feedback will be shown on the user interface, and the text data and feedback will be saved in the
Mongo database. Students can access their feedback in real-time and review them anytime, which
can potentially help them better understand the strengths and weaknesses of their writing and help
them improve.
        </p>
        <sec id="sec-3-1-1">
          <title>End User Subsystem</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>UI Controller</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>TopicWise Software Subsystem</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Software Controller</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>KEFT GPT models</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>User Interface</title>
        </sec>
        <sec id="sec-3-1-7">
          <title>Routes</title>
        </sec>
        <sec id="sec-3-1-8">
          <title>Services</title>
        </sec>
        <sec id="sec-3-1-9">
          <title>Data Access Object</title>
        </sec>
        <sec id="sec-3-1-10">
          <title>Mongo DB</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Knowledge-empowered fine-tuned GPT models</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.1. The Dataset</title>
        <p>To prepare ground truth data for model training and testing, we manually analysed 400 posts
collected from the Miro platform and 16 essays. The existing literature on manual content analysis
of essays indicates that smaller units, rather than entire texts, are more appropriate for studying the
general standard of essays [35]. Therefore, to retain the consistency and integrity of ideas, we
analysed students' essays at their most granular level, focusing on the smallest sections, which
typically consist of several paragraphs and represent the deepest layer of content organisation. This
approach divided the 16 essays into 190 data points.</p>
        <p>Thereafter, two human coders independently labelled students' posts. Another two coders
labelled essays, all using the codebook of interdisciplinary learning quality, which consists of
diversity (the number of disciplines represented in the text), cognitive advancement (the depth and
clarity of the articulated viewpoints), disciplinary grounding (the extent to which the text applies
disciplinary knowledge), and integration (the degree to which perspectives from multiple disciplines
are synthesised). We used Cohen's Kappa score, as shown in Table 1, to evaluate the inter-rater
reliability between human raters on each dimension of interdisciplinary learning quality. Human
coders subsequently discussed and settled their differences, reaching an agreement on each item,
which was regarded as the ground truth of interdisciplinary learning quality.</p>
        <p>
          For both the post and the essay dataset, 80% of the data were randomly selected as training data,
which were applied to fine-tune the GPT-3.5 and GPT-4o-mini models, while the remaining 20% of
the data were tested, which is explained in detail in Section 3.2.2. The frequency of each code for
each dimension in the training dataset and the testing dataset is displayed in Table 2. The dimensions
refer to the elements of the interdisciplinary learning quality [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], each of these dimensions is further
divided into three levels.
        </p>
        <p>Table 1
Inter-rater reliability (Cohen’s Kappa) between human raters on notes and essays
Data Diversity Cognitive Disciplinary Integration Overall
advancement grounding
Post 0.83 0.82 0.83 0.73 0.83
Essay 0.84 0.71 0.67 0.57 0.72</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.2. Knowledge-empowered Fine-tuning Strategy</title>
        <p>We systematically crafted the prompts for GPT models following the stages in Figure 2. We first
adopted the interdisciplinary learning codebook, drawing upon educational theories. Following that,
we created a template to translate the natural language of the codebook into a structure that GPT
could process. For instance, we used a standardised format like a conditional statement (i.e., if-else)
to represent the rules in the codebook. Ultimately, each tailored prompt was generated using the
template and contained the following components (see Table 4): (1) A system message defining GPT’s
persona; (2) A tailored task instruction outlining the task and its needs; and (3) A rule derived from
the codebook, offering guidelines and examples relevant to different levels of a specific dimension.</p>
        <p>We also utilised CoT to effectively instruct GPT models with step-by-step tasks. Guided by CoT,
there are three main steps (see Table 3) in the prompts: Firstly, task clarification provides essential
details such as the requirements and desired output. Secondly, in the task breakdown, the tasks are
divided into smaller, manageable parts. Lastly, the logical sequence instruction guides GPT in
understanding the relations and mechanisms among these breakdown tasks. Through these three
steps, we created a structured framework designed to address complex tasks with CoT methods.
Tailored Prompt System messages "You are an encyclopaedia that can precisely
evaluate the disciplines reflected in the
following notes."
Rules sourced from the "Please see all the information as a single
Codebook paragraph and evaluate the cognitive
advancement level of students essays. Return
only numerical values 0, 1, and 2."
Tailored task instructions "Return 2 if the content provides detailed
reasoning and specific examples to
demonstrate a deep understanding of the
topic."
Chain-of- Task clarification "Please see all the information as a single
Thought paragraph and answer the below two
Prompting questions about the cognitive advancement of
essays. Please return yes or no."
Task breakdown "Question 1: Does the paragraph have basic
explanations or causalities or examples or
mechanisms or elaborations of phenomena.";
"Question 2: Does the paragraph provide
detailed reasoning and specific examples to
demonstrate a deep understanding of the
topic?"
Logical sequence instructions "Question 2 is an extended one based on</p>
        <p>Question 1."</p>
        <p>In this study, two types of knowledge, dictionary-based and rule-based knowledge, were
integrated into prompts to enhance the models' performance.</p>
        <p>Dictionary-based knowledge includes specific words with predefined categories [35]. For
example, in analysing the diversity dimension, a discipline dictionary was provided. The dictionary
included eleven disciplines: 'Arts and humanities'; 'Business and economics'; 'Clinical, pre-clinical
and health'; 'Computer science'; 'Education'; 'Engineering and technology'; 'Law'; 'Life sciences';
'Physical sciences'; 'Psychology'; and 'Social sciences'. If a student mentions terms like "copyright,"
GPT might struggle to classify them correctly. To address this, prompts were structured as: "If
students mention terms such as WORD (copyright), it reflects content related to the LABEL 'Law.'"
This method was applied across other dimensions of interdisciplinary learning quality to improve
accuracy. This study applied Dictionary-based knowledge on Diversity and Cognitive advancement
dimensions because GPT models do not have context knowledge about these two dimensions and
thereby need specific examples. The dictionary-based knowledge can also be applied to other
circumstances when LLMs cannot understand specific instances.</p>
        <p>Rule-based knowledge, on the other hand, uses task-specific logic derived from relationships
outlined in codebooks. For instance, if no disciplines are mentioned in a student's text (Diversity =
0), disciplinary grounding should also be 0, as no disciplinary knowledge is present. Similarly, if
fewer than two disciplines are mentioned (Diversity &lt; 2), Integration is likely 0, as interdisciplinary
synthesis is absent. These rules were embedded into prompts using structures like: "IF DIMENSION
A (Diversity) is 0, THEN DIMENSION B (Disciplinary Grounding) is likely to be 0." By encoding such
logic, rule-based knowledge ensures the model considers the interplay between dimensions,
enhancing its ability to perform deductive coding effectively. This study applied rule-based
knowledge on Disciplinary grounding and Integration dimensions because these two dimensions
rely on the outcomes of Diversity. The rule-based knowledge can also be applied to other
circumstances when LLMs cannot understand the implicit rules in the task, especially in tasks with
interdependence.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2.3. Model performance</title>
        <p>Experiments were operated on GPT-3.5 and GPT-4o-mini models. To answer RQ1, we tested the
models in four modes: prompts (directly use prompts), fine-tuning (apply fine-tuning),
knowledgeempowered prompts (embed knowledge in prompts) and knowledge-empowered fine-tuning (apply
knowledge-empowered prompts and fine-tuning). Cohen's Kappa scores, presented in Tables 4 and
5, were used to measure the agreement between GPT-generated labels and human-coded ground
truth for both posts and essays, with human inter-rater reliability serving as the benchmark.</p>
        <p>The results indicated that knowledge-empowered approaches enhance both prompt-based and
fine-tuning methods. Knowledge-empowered methods demonstrated clear improvements for posts
(learning process data), validating their effectiveness. For essays (learning outcome data), these
methods enhanced performance on Diversity and Disciplinary grounding for knowledge-empowered
prompts and augmented accuracy on most dimensions, except for Diversity (0.91 vs. 0.78) in
finetuned models.</p>
        <p>Overall, integrating knowledge-empowered strategies with fine-tuning significantly increased
GPT models' agreement with human coders, achieving or surpassing expert-level proficiency in
analysing interdisciplinary learning quality
Table 4
Cohen’s Kappa scores in student posts</p>
        <p>Diversity Integration
3.3. User interface
After getting the models, we implemented them into an LA platform we are developing: TopicWise
(https://a-ori-topic-wise.vercel.app/).</p>
        <p>Figure 3 presents a screenshot of TopicWise for providing scores and feedback for essays. In the
"Scores Comparison", students can view their scores across the four interdisciplinary learning
dimensions—diversity, cognitive advancement, disciplinary grounding, and integration—and
compare them with the average scores from the database. This comparative feature intends to help
students understand their performance in the context of their peers and highlight areas for
improvement. The "Paragraph Annotations" section provides a more granular analysis, offering
scores for each specific paragraph in the essay. Additionally, the platform explains the reasoning
behind each score and provides targeted feedback for improvement. This detailed breakdown
identifies strengths and weaknesses in students’ writing, aiming to foster a deeper understanding of
interdisciplinary learning principles and guiding their revisions. The platform also supports
realtime feedback for online discussion posts, as shown in Figure 4. When students upload their posts to
TopicWise, the system quickly analyses the content, assigns scores for interdisciplinary dimensions,
and delivers immediate feedback. This instant evaluation enables students to refine their posts during
discussions, promoting more effective engagement with interdisciplinary concepts and improving
learning outcomes over time.</p>
        <p>TopicWise’s ability to deliver timely feedback gives it the potential to support students in
interdisciplinary learning through reflective essay writing and interactive online discussion. It has
the potential to provide students with accessible, actionable insights into their performance,
empowering them to make meaningful progress in their interdisciplinary learning.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusion</title>
      <p>The study found that knowledge-empowered approaches can enhance the performance of GPT-3.5
and GPT-4o-mini models, achieving accuracy comparable to human experts. The trained models
were subsequently integrated into a prototype LA platform and were able to offer automated scoring
and feedback on students’ interdisciplinary learning quality. By leveraging these advancements,
students can gain insights into their performance in real-time. The use of knowledge-based
finetuning highlights its potential as a robust method for enhancing LA, making it a promising approach
in educational contexts.</p>
      <p>Knowledge-empowered approaches amplified both prompt-based and fine-tuned model
performance. By incorporating small-scale domain-specific dictionaries and rule-based logic, the
study extended findings on external knowledge integration in prompt engineering in automated
essay analysis [35]. This method also addresses the issue of relying on large knowledge graphs [36]
or extensive knowledge bases [34], emphasising the efficacy of tailored knowledge enhancements
during fine-tuning. Interestingly, this study found that knowledge-empowered fine-tuning was more
effective for evaluating posts than essays. Posts typically require less complex reasoning than essays;
reasoning demands deeper critical thinking and subject-specific expertise, making them more
challenging for GPT models to analyse [37]. Although CoT prompting was used to assist with
reasoning tasks, essays’ intricate structure and nuanced content posed greater difficulties for the
models. This highlights a gap in the capabilities of GPT models when handling more cognitively
demanding tasks, suggesting the need for further refinement to support evaluations requiring
advanced reasoning skills.</p>
      <p>The study also introduces TopicWise, a prototype interdisciplinary LA platform that automates
the evaluation process and provides tailored feedback to students. This platform not only aims to
provide automated scoring but also aims to deliver tailored feedback to students, helping them
understand and improve their performance. By enabling dynamic feedback and continuous
monitoring, the platform has the potential to enhance students’ interdisciplinary learning practices.
User studies need to be conducted next to evaluate and refine the tool.</p>
      <p>However, this study acknowledges several limitations. First, the models are trained based on a
relatively small dataset of online posts and essays from a specific cohort of undergraduate students.
Whether the methods can be generalised to other datasets needs further research, raising questions
about generalizability. Further research is needed to determine whether these methods are applicable
to broader and more diverse datasets. Second, the study only tested GPT-3.5 and GPT-4o-mini,
leaving unexplored the potential of other language models, such as LLaMA and Gemini, which could
offer different perspectives or improved capabilities. Third, the LA platform has not been tested by
users like instructors and students. We plan to conduct user studies after further refining the tool.</p>
      <p>Despite these limitations, the study highlights the potential of combining fine-tuning with
knowledge-empowered strategies for evaluating both learning process data (e.g., online posts) and
outcome data (e.g., essays). The integration of these trained models into an LA platform further
enhances the approach by providing immediate, data-driven feedback. The platform has the potential
to support educators in fostering interdisciplinary skills while optimising the assessment process.
Future work will focus on expanding the dataset, testing additional models, and conducting user
studies to ensure that the platform meets the needs of educators and students.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The research is conducted with the support of the Energy Research Institute @ NTU,
Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools in essay writing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Boix-Mansilla</surname>
          </string-name>
          , “
          <article-title>Learning to Synthesize: The Development of Interdisciplinary Understanding,” in The Oxford Handbook of Interdisciplinarity</article-title>
          , R. Frodeman,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mitcham</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Holbrook</surname>
          </string-name>
          , Eds., Oxford University Press,
          <year>2010</year>
          , pp.
          <fpage>288</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Bybee</surname>
          </string-name>
          ,
          <article-title>The case for STEM education: challenges and opportunities</article-title>
          .
          <source>Arlington, VA: National Science Teachers Association</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ivanitskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Clark</surname>
          </string-name>
          , G. Montgomery, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Primeau</surname>
          </string-name>
          , “
          <article-title>Interdisciplinary Learning: Process and Outcomes,” Innovative Higher Education</article-title>
          , vol.
          <volume>27</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>111</lpage>
          , Dec.
          <year>2002</year>
          , doi: 10.1023/A:
          <fpage>1021105309984</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brassler</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          , “
          <article-title>How to Enhance Interdisciplinary Competence-Interdisciplinary Problem-Based Learning versus Interdisciplinary Project-Based Learning,”</article-title>
          <source>Interdisciplinary Journal of Problem-Based Learning</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>2</issue>
          ,
          <string-name>
            <surname>Art</surname>
          </string-name>
          . no.
          <issue>2</issue>
          ,
          <string-name>
            <surname>Jul</surname>
          </string-name>
          .
          <year>2017</year>
          , doi: 10.7771/
          <fpage>1541</fpage>
          -
          <lpage>5015</lpage>
          .
          <fpage>1686</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Madden</surname>
          </string-name>
          et al.,
          <source>“Rethinking STEM Education: An Interdisciplinary STEAM Curriculum,” Procedia Computer Science</source>
          , vol.
          <volume>20</volume>
          , pp.
          <fpage>541</fpage>
          -
          <lpage>546</lpage>
          , Jan.
          <year>2013</year>
          , doi: 10.1016/j.procs.
          <year>2013</year>
          .
          <volume>09</volume>
          .316.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I. E. F.</given-names>
            <surname>Gvili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weissburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Helms</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Tovey</surname>
          </string-name>
          , “
          <article-title>Development of scoring rubric for evaluating integrated understanding in an undergraduate biologically-inspired design course</article-title>
          ,”
          <source>International Journal of Engineering Education</source>
          ,
          <year>2016</year>
          , Accessed: Jul.
          <volume>25</volume>
          ,
          <year>2023</year>
          . [Online]. Available: https://www.semanticscholar.org/paper/Development-of
          <article-title>-scoring-rubric-forevaluating-in-an-Gvili-Weissburg/53fb00b8bf56209192de2da3528aa31adafc5f66</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , G. Zhu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Fan</surname>
          </string-name>
          , “
          <article-title>The influences of ChatGPT on undergraduate students' demonstrated and perceived interdisciplinary learning,” Educ Inf Technol</article-title>
          , May
          <year>2024</year>
          , doi: 10.1007/s10639-024-12787-9.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] OpenAI, “GPT-4
          <source>Technical Report,” Mar. 27</source>
          ,
          <year>2023</year>
          , arXiv: arXiv:
          <fpage>2303</fpage>
          .08774. Accessed: Apr.
          <volume>03</volume>
          ,
          <year>2023</year>
          . [Online]. Available: http://arxiv.org/abs/2303.08774
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Latif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          , “
          <article-title>Applying large language models and chain-ofthought for automatic scoring</article-title>
          ,
          <source>” Computers and Education: Artificial Intelligence</source>
          , vol.
          <volume>6</volume>
          , p.
          <fpage>100213</fpage>
          ,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2024</year>
          , doi: 10.1016/j.caeai.
          <year>2024</year>
          .
          <volume>100213</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Latif</surname>
          </string-name>
          and
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          , “
          <article-title>Fine-tuning ChatGPT for automatic scoring</article-title>
          ,
          <source>” Computers and Education: Artificial Intelligence</source>
          , vol.
          <volume>6</volume>
          , p.
          <fpage>100210</fpage>
          ,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2024</year>
          , doi: 10.1016/j.caeai.
          <year>2024</year>
          .
          <volume>100210</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dai</surname>
          </string-name>
          et al.,
          <article-title>“Assessing the proficiency of large language models in automatic feedback generation: An evaluation study</article-title>
          ,
          <source>” Computers and Education: Artificial Intelligence</source>
          , vol.
          <volume>7</volume>
          , p.
          <fpage>100299</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2024</year>
          , doi: 10.1016/j.caeai.
          <year>2024</year>
          .
          <volume>100299</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Alalawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Athauda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chiong</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Renner</surname>
          </string-name>
          , “
          <article-title>Evaluating the student performance prediction and action framework through a learning analytics intervention study,” Educ Inf Technol</article-title>
          , Aug.
          <year>2024</year>
          , doi: 10.1007/s10639-024-12923-5.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and P. Jiao, “
          <article-title>Integration of artificial intelligence performance prediction and learning analytics to improve student learning in online engineering course</article-title>
          ,”
          <source>International Journal of Educational Technology in Higher Education</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>4</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1186/s41239-022-00372-4.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lang</surname>
          </string-name>
          et al., Eds.,
          <source>Handbook of Learning Analytics, First. Society for Learning Analytics Research (SoLAR)</source>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .18608/hla17.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Siemens</surname>
          </string-name>
          , “
          <article-title>Learning Analytics: The Emergence of a Discipline,” American Behavioral Scientist</article-title>
          , vol.
          <volume>57</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>1380</fpage>
          -
          <lpage>1400</lpage>
          , Oct.
          <year>2013</year>
          , doi: 10.1177/0002764213498851.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cai</surname>
          </string-name>
          , G. Zhu, and M. Ma, “
          <article-title>Enhancing the Analysis of Interdisciplinary Learning Quality with GPT Models: Fine-Tuning and</article-title>
          <string-name>
            <surname>Knowledge-Empowered</surname>
            <given-names>Approaches</given-names>
          </string-name>
          ,” in
          <source>Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials</source>
          , Industry and
          <string-name>
            <given-names>Innovation</given-names>
            <surname>Tracks</surname>
          </string-name>
          , Practitioners, Doctoral Consortium and
          <string-name>
            <given-names>Blue</given-names>
            <surname>Sky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Olney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.-A.</given-names>
            <surname>Chounta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. C.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. I</surname>
          </string-name>
          . Bittencourt, Eds., Cham: Springer Nature Switzerland,
          <year>2024</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>165</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -64312-5_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kidron</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kali</surname>
          </string-name>
          , “
          <article-title>Promoting interdisciplinary understanding in asynchronous online higher education courses: a learning communities approach</article-title>
          ,” Instr Sci, pp.
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          , Jun.
          <year>2023</year>
          , doi: 10.1007/s11251-023-09635-7.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.-Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          , Y.-P. Cheng, W.-S. Wang,
          <string-name>
            <surname>C.-J. Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Y.-M. Huang</surname>
          </string-name>
          , “
          <article-title>Exploring the Learning Process and Effectiveness of STEM Education via Learning Behavior Analysis and the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>