<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A model for improving the accuracy of educational content created by generative AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleh V. Talaver</string-name>
          <email>olegtalaver@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana A. Vakaliuk</string-name>
          <email>tetianavakaliuk@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>PCWrEooUrckResehdoinpgs ISSNc1e6u1r-3w-0s0.o7r3g</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Academy of Cognitive and Natural Sciences</institution>
          ,
          <addr-line>54 Universytetskyi Ave., Kryvyi Rih, 50086</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Digitalisation of Education of the NAES of Ukraine</institution>
          ,
          <addr-line>9 M. Berlynskoho Str., Kyiv, 04060</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kryvyi Rih State Pedagogical University</institution>
          ,
          <addr-line>54 Universytetskyi Ave., Kryvyi Rih, 50086</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Zhytomyr Polytechnic State University</institution>
          ,
          <addr-line>103 Chudnivsyka Str., Zhytomyr, 10005</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>149</fpage>
      <lpage>158</lpage>
      <abstract>
        <p>Advancements in artificial intelligence (AI) are reshaping education, enabling personalized and adaptive learning experiences, yet ensuring the reliability of AI-generated content remains a critical challenge. This study addresses this gap by developing a system for text processing and factual claims verification, focusing on extracting factual claims, retrieving evidence from authoritative sources, verifying content, and rewriting it to ensure accuracy while maintaining pedagogical efectiveness. The system is designed to complement manual peer review processes by providing detailed annotations and evidence-based notes related to facts retrieved from diferent sources. The proposed system employs a multi-layered approach to content verification. At its core, it focuses on three key processes: (1) extracting and refining factual claims from text using sophisticated natural language processing techniques, (2) retrieving and analyzing evidence from multiple authoritative sources, and (3) verifying and rewriting content that will be proposed as a hint to ensure accuracy while maintaining pedagogical efectiveness. It integrates prompt engineering, multi-stage evidence analysis, diferent source information retrieval, and vectorbased embeddings to get and classify evidence as supporting, contradicting, or neutral, using weighted credibility scoring and a proposed decision certainty level for each verification outcome. By employing a multi-layered approach, the system ofers a practical and scalable solution for enhancing AI-generated content verification, paving the way for reliable applications in education and organizational knowledge management, and enabling educators and content creators to make informed decisions about content revision.</p>
      </abstract>
      <kwd-group>
        <kwd>generative AI</kwd>
        <kwd>education content creation</kwd>
        <kwd>prompt engineering</kwd>
        <kwd>factual validation</kwd>
        <kwd>model hallucinations</kwd>
        <kwd>verifyand-edit frameworks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial intelligence (AI) advancements are reshaping every facet of life, with education standing at
the forefront of this transformation [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. By integrating subject matter experts (SMEs) directly into
the process, AI can potentially enhance the accessibility, quality, and relevance of learning materials in
transformative ways. Large language models (LLMs) may ensure tailored and up-to-date educational
materials by generating high-quality content that evolves with feedback [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. AI further personalises
learning by adapting to individual needs, such as dynamically adjusting programming exercises’
complexity and providing immediate feedback [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. However, alongside this promise comes a critical
challenge: ensuring the depth, reliability, and correctness of AI-generated content.
      </p>
      <p>
        This study approaches education from the perspective of organisational needs, where agility and
alignment with evolving trends are paramount. Unlike traditional academic institutions, which often
retain historical materials to support foundational knowledge [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], organisations must rapidly produce
and update training materials to remain competitive in fast-paced industries. However, the established
model for content development—where learning and development (L&amp;D) professionals serve as
intermediaries between SMEs and learners—often struggles to meet these demands. While ensuring
pedagogical rigour, this process is plagued by lengthy development timelines, high costs, and outdated
materials due to iterative feedback loops and disconnected workflows. Recently, many organisations
have shifted to a model where SMEs create learning materials directly, bypassing the need for L&amp;D
intermediaries. While this approach allows for faster development and greater alignment with real-time
organisational needs, it introduces significant challenges. SMEs, though experts in their fields, often
lack instructional design expertise, which can result in unstructured or overly dense content that lacks
pedagogical clarity. Additionally, SME-created materials may fail to account for diverse learner needs,
leading to inconsistencies in quality and engagement. Without the support of L&amp;D professionals, these
materials may fail to provide a cohesive learning experience [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Recent advancements in generative AI ofer a powerful alternative to L&amp;D support. These tools
can transform raw SME input into structured, learner-focused content, acting as virtual instructional
designers. Despite these innovations, the risks of bias, inaccuracies, and ethical concerns remain
significant. Addressing these challenges is vital to harnessing the full potential of AI in education
[
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>The primary objective of this research is to develop a system for processing user-provided text
and flagging inaccuracies while minimising biases and ensuring the reliability of content, including
AI-generated content. This objective is critical to addressing the challenges posed by inaccuracies and
inconsistencies and reducing the need for extensive manual reviews.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Theoretical background and related works</title>
      <p>
        Integrating artificial intelligence (AI) in education has reshaped traditional paradigms, introducing
adaptive learning environments, enhanced accessibility, and real-time content personalisation. AI
systems, such as intelligent tutoring tools and automated assessment platforms, have demonstrated the
potential to foster equitable education by addressing diverse learner needs. Chiu et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] emphasise the
transformative impact of AI-driven adaptability in tailoring educational materials to individual learning
contexts. The role of LLMs, including GPT-based systems, has been pivotal in this transformation.
These models can generate dynamic course content, provide personalised instruction, and ofer instant
feedback, fulfilling roles such as tutors, mentors, and collaborators [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Despite these advancements,
there remain challenges in aligning AI tools with pedagogical goals, as highlighted by Kasneci et al.
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], who warn of biases and inaccuracies in AI-generated content.
      </p>
      <p>
        SMEs play a crucial role in enriching educational content with domain-specific knowledge, yet their
contributions often lack pedagogical structure when instructional design principles are not applied
[
        <xref ref-type="bibr" rid="ref14 ref6 ref8">6, 8, 14</xref>
        ]. Generative AI ofers a solution by acting as a virtual instructional designer, reorganising SME
inputs into structured, learner-centred materials [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Nazar et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] illustrate how tools like CourseGPT
facilitate the learning process by providing real-time, course-specific support to students, ensuring
consistency and alignment with educational goals. Moreover, generative AI enhances personalisation
by adapting content dificulty to learner proficiency and providing instant feedback mechanisms, as
demonstrated in automated programming exercises [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, the absence of oversight can lead
to unstructured and inefective content, necessitating collaborative frameworks where SMEs and AI
systems work together to achieve pedagogical soundness, content richness and accuracy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Adopting generative AI in education has risks, including biases in training data, content inaccuracies,
and model hallucinations. These concerns are compounded by the tendency of AI systems to produce
stylistically convincing but factually erroneous outputs [
        <xref ref-type="bibr" rid="ref16 ref17 ref9">9, 16, 17</xref>
        ].
      </p>
      <p>
        Tonmoy et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] have developed a comprehensive taxonomy of hallucination mitigation techniques,
categorising methods into two primary domains: prompt engineering and model development. Prompt
engineering encompasses methods such as retrieval-augmented generation (RAG), which integrates
external knowledge sources before, during, and after generation to enhance output grounding. Specific
subcategories like the Decompose-and-Query Framework and EVER Framework employ iterative
validation during generation to resolve complex reasoning challenges, while post-generation tools
such as RARR retrofit outputs for improved factual accuracy. Model development strategies introduce
architectural innovations, such as Context-Aware Decoding and faithfulness-based loss functions, to
align AI outputs with verified truths. Additionally, supervised fine-tuning techniques incorporate
counterfactual datasets and knowledge graphs to strengthen model fidelity.
      </p>
      <p>
        Similarly, Huang et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] identify core issues such as insuficient domain-specific data and
overreliance on incomplete training corpora as major contributors to AI inaccuracies. These limitations
often result in hallucinations that can be categorised into factuality and faithfulness types. Factuality
hallucinations occur when outputs conflict with known truths or fabricate unverifiable information,
while faithfulness hallucinations involve deviations from input instructions, contextual misalignments,
or logical errors. Detection approaches such as factuality, and faithfulness validation employs
factchecking, uncertainty estimation, and QA-based systems, supported by benchmarks like TruthfulQA and
HaluEval, to rigorously evaluate AI outputs. Mitigation strategies span data-level interventions, such as
ifltering misinformation during training, to advanced inference-level techniques, including constrained
sampling and logical consistency checks, thereby addressing these multifaceted challenges. Broader
societal implications are explored in “Promises and challenges of generative artificial intelligence for
human learning” [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which underscores equity and ethical transparency as critical factors in the
deployment of AI in education.
      </p>
      <p>
        Considering these risks, ensuring the reliability of AI-generated materials necessitates robust,
multilayered validation frameworks. Current gaps in these processes highlight the importance of SME
involvement in verifying content accuracy and contextual relevance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Complementing these eforts,
Shamsujjoha et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] advocate for a layered runtime guardrail approach inspired by the Swiss Cheese
Model, wherein distinct defensive layers tandem to intercept and correct errors at multiple stages. This
model emphasises the importance of redundancy and multifaceted defences, ensuring that vulnerabilities
in one layer are mitigated by the following.
      </p>
      <p>
        Dong et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] extensively reviews existing guardrail solutions, highlighting tools like Llama
Guard, Nvidia NeMo, and Guardrails AI. Llama Guard demonstrates adaptability by fine-tuning specific
taxonomies but depends on LLM understanding for predictive accuracy. Nvidia NeMo incorporates
vector-based embeddings and KNN for content moderation and hallucination prevention. At the same
time, Guardrails AI uniquely employs XML-based RAIL specifications to define output quality guarantees,
including automated error correction. Despite their utility, these solutions share common weaknesses,
such as lacking generalisation and holistic multi-requirement designs, which underscores the need
for more robust methodologies. Dong also identifies technical challenges, particularly unintended
responses, which are detected through adversarial prompt testing and mitigated via adversarial training,
safety reinforcement, and input/output rephrasing. Fairness remains a critical issue, with biases
categorised into gender, cultural, and dataset origins. Protective measures, including fine-tuning and
fairness-driven prompt engineering, are essential but require continuous learning and diversification of
datasets. Privacy concerns, such as PII leakage, are addressed through diferential privacy methods and
personalised watermarking to ensure ownership and confidentiality. Hallucinations pose significant
risks; detection strategies like self-consistency checks and external validations are complemented by
protective measures such as retrieval-augmented generation and Verify-and-Edit frameworks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
These approaches balance robustness and adaptability, integrating rigorous software development life
cycles and Pareto optimisation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for practical guardrail implementations.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>
        The system was developed to process user-provided text, perform factual verification, and generate
revisions where inconsistencies were identified. Building on established methodologies [
        <xref ref-type="bibr" rid="ref15 ref17 ref20">15, 17, 20</xref>
        ],
the workflow incorporated multi-stage processing, structured claim extraction, evidence classification,
and revision strategies. Prompt engineering techniques were employed, including prompt tuning with
prompt chaining and multiple stages of results refinement. Most importantly, verification occurs after
generation, meaning that the system receives and analyses any text. Decisions are made based on the
retrieved information from various sources, including vectorised documents. The design emphasised
adherence to factual accuracy while minimising stylistic alterations.
      </p>
      <p>Testing was conducted using articles that sometimes included altered factual information to evaluate
the system’s ability to detect inaccuracies. Purely opinion-driven texts were excluded to ensure the
dataset’s alignment with the system’s verification scope.</p>
      <p>The input text is segmented into paragraphs and sentences. Next, claims are extracted from each
sentence and then refined on the whole paragraph level to remove duplicates and improve the context
for already-created claims.</p>
      <p>Claims extraction prompt:
Extract all factual claims from the given text. A factual claim is any
statement that asserts something as a fact and can be verified.
Guidelines:
1. The ’claim_text’ should distill the factual assertion while preserving
its meaning. Remove unrelated phrases or context.
2. The ’original_text’ must exactly match the corresponding part of the
input text from which the claim was derived.
3. Each claim must contain only 1 factual information and enough context to
be unambiguous and easy to search, with context information, specific dates,
names, and people etc.
4. The claim must question single fact, not multiple facts, so that it’s
easy to search but have enough context to be unambiguous.
6. Select a particular part of the original text when referencing it, not
the entire sentence
7. Do not include opinions, speculative statements, or rhetorical questions
as claims.
8. If there is ambiguity in the claim, provide the most concise and accurate
interpretation.
9. Ensure claims are extracted accurately, even if they are embedded in
longer sentences.</p>
      <p>Then, the system retrieves evidence from external Google using Google Custom Search, Wikipedia,
and a vector store for each claim. Evidence snippets are individually classified as Supporting,
Contradicting, or Neutral based on relevance, which the model itself determines. Also, the model is instructed
to output a credibility score based on the website on which the material was found. Irrelevant or low
credibility are filtered out. Classification utilises chain-of-thought (CoT) reasoning, wherein the LLM
processes claims and evidence incrementally to facilitate nuanced judgments.</p>
      <p>Evidence analysis prompt:
Evaluate the relationship between the given claim and the provided evidence.
For each claim, determine whether the evidence supports, contradicts, or is
neutral to the claim.</p>
      <p>Criteria for Decision:
- {DecisionClaimEnum.supporting}: Evidence explicitly affirms the claim
with no ambiguity.
- {DecisionClaimEnum.contradicting}: Evidence explicitly denies or disputes
the claim with no ambiguity.
- {DecisionClaimEnum.neutral}: Evidence is unrelated, ambiguous, or
insufficient to affirm or deny the claim.</p>
      <p>Additionally, assess the credibility of the source using the following
scale:
- 1.0: Peer-reviewed resources (e.g., white papers, academic journals)
- 0.8: Crowd-sourced resources with robust moderation (e.g., Wikipedia)
- 0.6: Reputable news outlets and well-known publications
- Below 0.6: Sources with questionable credibility or lack of verification
The result should include:
1. Decision: ’{DecisionClaimEnum.supporting}’, ’{DecisionClaimEnum.
contradicting}’, or ’{DecisionClaimEnum.neutral}’
2. Source Credibility: A float value between 0 and 1, as defined above
3. Extract key phrases from the evidence that support the decision.
4. Provide a brief justification for the credibility score.</p>
      <p>Additional Step:
- Relevance: Assess whether the evidence is contextually relevant to the
claim before determining the relationship.
- If evidence is irrelevant, set ’relevant’ field to false and the decision
should default to ’{DecisionClaimEnum.neutral}’</p>
      <p>Classifications for each claim were aggregated by counting instances of supporting, contradicting,
and neutral evidence. A weighted ratio method was employed to assess the certainty of each claim’s
correctness or incorrectness based on the credibility of evidence classified as supporting or contradicting.
Each piece of evidence is assigned a credibility score () within a range of 0 to 1, reflecting the
source’s reliability (see evidence analysis prompt above). Certainty calculations consider the aggregated
credibility and the total number of supporting () and contradicting () evidence items. The certainty
that a claim is incorrect is calculated using the formula:
 =
 
where   = ∑︀ , the aggregated credibility of all supporting evidence;   = ∑︀ ,
=1 =1
the aggregated credibility of all contradicting evidence.</p>
      <p>This ratio reflects the proportion of contradicting evidence relative to the total weighted evidence
(supporting and contradicting). This score is compared against a predefined threshold (0.6 in this case)
to determine if the claim should be flagged for revision if we are certain enough that it is incorrect.</p>
      <p>When no supporting or contradicting evidence is provided ( = 0,  = 0), CertaintyIncorrect is set
to 0, as the absence of evidence does not permit any judgment on correctness. Claims are not flagged
for revision in this case.</p>
      <p>When only supporting evidence is provided ( &gt; 0,  = 0), certainty is determined by the average
credibility of the supporting evidence:
 =</p>
      <p>If the average supporting credibility exceeds a threshold, the claim is considered correct and left
unchanged; otherwise, it’s considered uncertain. This does not trigger revisions due to the lack of
contradicting evidence, which is the source of the revised version of the paragraph.</p>
      <p>When only contradicting evidence is provided ( = 0,  &gt; 0), the certainty that the claim is
incorrect is determined by the average credibility of the contradicting evidence, similar to the previous
case:
 =</p>
      <p>However, this time, we also check if the threshold has not exceeded (0.6); if it does, we are confident
enough to revise the paragraph text based on the contradicting evidence (use those as a source of truth).</p>
      <p>It must be noted that this methodology filters out neutral evidence, which may lead to the loss of
nuanced information. Incorporating a weighting factor for neutral evidence could improve robustness.
Also, evidence with extremely high or low credibility scores (e.g., 1.0 or near 0) can disproportionately
skew the results. Implementing a credibility cap or floor may mitigate this efect. Finally, the method
switches from ratio-based certainty calculations (for mixed evidence) to average-based certainty in
(1)
(2)
(3)
edge cases (only supporting or contradicting), which can lead to perceived inconsistency in certainty
interpretation.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>A simple web UI was developed to support better visualisation of results (the full listing is available
here: https://github.com/Vidzhel/ai-fact-checker). The page has a field where the provided text is placed.
Upon clicking “Verify”, the whole text is sent for verification, which takes approximately 1 minute for
up to 5 paragraphs of text (figure 1).</p>
      <p>The processed text contains the same paragraphs with highlighted parts representing analysed parts
of the text – claims. The colour of highlighting varies depending on the certainty of the decision (despite
the paragraph being changed or not), where yellow indicates that the provided evidence seems to
support the claim, and green indicates that the evidence is very likely to support the claim (basically
gradient based certainty level ranging from 0.5 to 1). If the certainty level is below 0.5 – the highlighting
will be red, indicating uncertainty. The highlighted parts can have either a black or blue border colour,
indicating that the piece was or wasn’t edited respectively.</p>
      <p>
        Upon hovering over a highlighted part, a popup shows all the details related to the claim. Revised
paragraphs start with the diference between original and revised paragraphs, and an explanation of
the changes is summarised below. Then, there are details related to the certainty of the decision – the
score and aggregated evidence counts. After that, a list of evidence is specified, each having the link,
credibility level (with an explanation of the decision), a snippet of text used, a summary of findings
related to the claim and classification (supporting, neutral, contradicting). This helps the reviewer make
an informed decision [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>Shortened example of details shown upon hovering over the highlighted part that was rewritten:
Updated paragraph: The financial crisis commonly referred to as the Global
Financial Crisis began in 2007 when home values in Europe and the United
States started to decline, intensifying during 2008. Experts largely attribute
the panic in international stock markets during the Global Financial Crisis
to a mix of factors, including the collapse of the housing market and the
resulting turmoil in financial markets, rather than a sudden burst of the
technology bubble. While subprime mortgages have been mentioned in some
reports, they accounted for a significant portion of loans, with estimates
suggesting figures around 13-15% of total mortgages at their peak, which
indicates their overall impact was substantial. Consequently, the meltdown
lasted several years, with many banks reporting significant losses throughout
2007-2009 before recovering to report profits in subsequent years.
Explanation: The claim that the Global Financial Crisis began in 2009 was
corrected to state it began in 2007, as evidenced by the chronology of
mortgagerelated financial strains leading to a recession starting in December 2007. The
assertion attributing stock market panic primarily to a technology bubble was
revised to reflect the documented role of the housing market collapse and broader
financial crisis. The claim about subprime mortgages being less than 1% was
adjusted to acknowledge their substantial impact, aligning with evidence that
they represented a significant portion of lending. Lastly, the characterization
of the crisis as short-lived was corrected to acknowledge its prolonged effects,
lasting from 2007 into 2009.</p>
      <p>Certainty: 0.87 - Based on gathered data - decision seems to be very likely
correct (whether claim was altered or not);
Supporting: 6 - Contradicting: 0 - Neutral: 1
Source: https://www.federalreservehistory.org/essays/subprime-mortgage-crisis
Cridability: 1 - The source is a reputable, peer-reviewed resource from the
Federal Reserve, which provides credible historical context and analysis.
Snippet: The expansion of mortgages to high-risk borrowers, coupled with
rising house prices, contributed to a period of turmoil in financial
markets that lasted from ...</p>
      <p>Evidence: expansion of mortgages to high-risk borrowers...contributed to
a period of turmoil in financial markets
Classification: Supporting
Source: https://en.wikipedia.org/wiki/2007%E2%80%932008_financial_crisis
Cridability: 0.8 - The source is Wikipedia, a crowd-sourced resource with
robust moderation, making it a credible source for general information.
Snippet: The 2007–2008 financial crisis, or the global financial crisis
(GFC), was the most severe worldwide economic crisis since the 1929 Wall
Street crash that began the Great Depression. Causes of the crisis
included predatory lending in the form of subprime mortgages to low-income
homebuyers and a resulting ... in the 2010s European debt crisis.
Evidence: Causes of the crisis included predatory lending in the form of
subprime mortgages to low-income homebuyers
Classification: Supporting</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The outlined method ofers a clear and practical framework for extracting, verifying, and refining claims,
prioritising simplicity and efectiveness. Combining prompt engineering with a multi-stage evidence
analysis process, the system consistently identifies claims within the text, retrieves relevant evidence,
and refines results, making the review process more straightforward. Using prompt-chaining techniques,
the system divides verification into smaller, manageable steps, such as clarifying ambiguous references
to entities, dates, or locations before retrieving evidence. This iterative approach incrementally improves
accuracy and minimises errors caused by incomplete context.</p>
      <p>While the system performs well in detecting factual inaccuracies, it faces some notable challenges.
These include occasionally missing specific claims within a passage and failing to provide adequate
context for specific claims. For instance, a statement like “the crisis significantly impacted the current
economic situation” may lack explicit reference to timeframes or events, preventing efective retrieval
of evidence. Resolving vagueness or insuficient detail issues is crucial for improving the system’s
reliability.</p>
      <p>Addressing these shortcomings could involve fine-tuning the model with more varied and
representative training examples. This would enhance its ability to recognise complex claims, define contextual
boundaries, and retrieve evidence more precisely. Additionally, using vector embeddings, such as
those from the text-embedding-3-small model, significantly boosts the system’s capability for accurate
evidence retrieval, particularly in specialised fields requiring alignment with specific terminology and
context.</p>
      <p>Another strength of the approach is its cost eficiency. By using a combination of GPT-4o-mini for
generating completions and text-embedding-3-small for retrieval, the system processes approximately
3,000 pages (two million tokens) for just $0.50. Including a structured API further ensures robust
performance by delivering machine-readable outputs validated against schemas. This consistency
simplifies subsequent verification and refinement while reducing the risk of errors in output formatting.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This study presents a eficient approach for processing and verifying user-provided text, focusing on
factchecking provided text with evidence extracted from various sources. The system balances simplicity
and performance by integrating embeddings for domain-specific retrieval and prompt-engineered
reasoning techniques, making it a promising tool for real-world applications. The proposed system
efectively extracts claims, retrieves relevant evidence, and classifies evidence to refine text based on
credibility-weighted thresholds, allowing it to either prove fact correctness or mark and suggest an
update for inaccurate text parts.</p>
      <p>Despite these strengths, limitations remain. The system occasionally struggles to handle ambiguous
claims or claims lacking suficient context, complicating evidence retrieval and verification. Addressing
these issues through fine-tuning can significantly enhance the system’s contextual understanding and
precision. Also, including more test data would be beneficial.</p>
      <p>Compared to similar tools, this system stands out for its emphasis on combining afordability with a
structured approach to ensure consistent results. Unlike generic models prioritising breadth over domain
specificity, the embedding-based retrieval method employed here enhances precision, particularly in
specialised fields.</p>
      <p>Overall, this study highlights the potential of combining simple yet powerful techniques to address
text verification and content refinement challenges. The findings demonstrate that a structured, scalable
approach can provide reliable results while opening avenues for further refinement and adoption in
diverse applications. Researchers and developers are encouraged to explore and refine this system in
real-world settings, ensuring its full potential is realised across domains.</p>
      <p>Declaration on Generative AI: During the preparation of this work, the authors used GPT-4o in order to: Generate
literature review, Abstract drafting, Content enhancement. After using this service, the authors reviewed and edited the
content as needed and takes full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Marienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Semerikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Markova</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence literacy in secondary education: methodological approaches and challenges</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>3679</volume>
          (
          <year>2024</year>
          )
          <fpage>87</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Mintii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Semerikov</surname>
          </string-name>
          ,
          <article-title>Optimizing Teacher Training and Retraining for the Age of AI-Powered Personalized Learning: A Bibliometric Analysis</article-title>
          , in: E. Faure,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tryus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vartiainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Danchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bazilo</surname>
          </string-name>
          , G. Zaspa (Eds.),
          <source>Information Technology for Education, Science, and Technics</source>
          , volume
          <volume>222</volume>
          <source>of Lecture Notes on Data Engineering and Communications Technologies</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>339</fpage>
          -
          <lpage>357</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -71804-5_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Leiker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Finnigan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Gyllen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Cukurova, Prototyping the use of Large Language Models (LLMs) for Adult Learning Content Creation at Scale</article-title>
          , in: S. Moore,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Stamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Khosravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Brooks (Eds.),
          <source>Proceedings of the Workshop on Empowering Education with LLMs - the Next-Gen Interface and Content Generation 2023 co-located with 24th International Conference on Artificial Intelligence in Education (AIED</source>
          <year>2023</year>
          ), Tokyo, Japan, July
          <volume>7</volume>
          ,
          <year>2023</year>
          , volume
          <volume>3487</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>7</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3487</volume>
          /short1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Leal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Queirós</surname>
          </string-name>
          ,
          <article-title>Authoring Programming Exercises for Automated Assessment Assisted by Generative AI</article-title>
          , in: A. L.
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pinto-Albuquerque</surname>
          </string-name>
          (Eds.),
          <source>5th International Computer Programming Education Conference (ICPEC</source>
          <year>2024</year>
          ), volume
          <volume>122</volume>
          of Open Access Series in Informatics (OASIcs),
          <source>Schloss Dagstuhl - Leibniz-Zentrum für Informatik</source>
          , Dagstuhl, Germany,
          <year>2024</year>
          , pp.
          <volume>21</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          :8. doi:
          <volume>10</volume>
          .4230/OASIcs.ICPEC.
          <year>2024</year>
          .
          <volume>21</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dickey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bejarano</surname>
          </string-name>
          , C. Garg,
          <string-name>
            <surname>AI-Lab</surname>
          </string-name>
          :
          <article-title>A Framework for Introducing Generative Artificial Intelligence Tools in Computer Programming Courses</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <article-title>720</article-title>
          . doi:
          <volume>10</volume>
          . 1007/s42979-024-03074-y.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <article-title>Research on the Training of College Teachers' Practical Teaching Ability under the Background of the Application-Oriented Transformation</article-title>
          ,
          <source>in: Proceedings of the 2017 7th International Conference on Education and Management (ICEM</source>
          <year>2017</year>
          ), Atlantis Press,
          <year>2018</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>278</lpage>
          . doi:
          <volume>10</volume>
          .2991/icem-
          <fpage>17</fpage>
          .
          <year>2018</year>
          .
          <volume>58</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Research on the Development of College Teachers' Practical Ability from the Perspective of Teachers' Professional Development</article-title>
          ,
          <source>in: Proceedings of the 2016 International Conference on Management Science and Innovative Education</source>
          , Atlantis Press,
          <year>2016</year>
          , pp.
          <fpage>544</fpage>
          -
          <lpage>549</lpage>
          . doi:
          <volume>10</volume>
          .2991/ msie-
          <fpage>16</fpage>
          .
          <year>2016</year>
          .
          <volume>120</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Spiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bhamidi</surname>
          </string-name>
          ,
          <article-title>Employee-Generated Learning: How to Develop Training That Drives Performance</article-title>
          , Kogan Page,
          <year>2024</year>
          . URL: https://books.google.com.ua/books?id=dwChzwEACAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          , T. Liu,
          <source>A Survey on Hallucination in Large Language Models: Principles</source>
          , Taxonomy, Challenges, and Open Questions,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>43</volume>
          (
          <year>2025</year>
          )
          <article-title>42</article-title>
          . doi:
          <volume>10</volume>
          .1145/3703155.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ayyamperumal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <article-title>Current state of LLM Risks and</article-title>
          AI Guardrails,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/ arXiv.2406.12934.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. K. F. Chiu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
          </string-name>
          , M. Cheng,
          <article-title>Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education</article-title>
          ,
          <source>Computers and Education: Artificial Intelligence</source>
          <volume>4</volume>
          (
          <year>2023</year>
          )
          <article-title>100118</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.caeai.
          <year>2022</year>
          .
          <volume>100118</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mollick</surname>
          </string-name>
          , L. Mollick,
          <string-name>
            <surname>Assigning</surname>
            <given-names>AI</given-names>
          </string-name>
          :
          <article-title>Seven Approaches for Students, with</article-title>
          <string-name>
            <surname>Prompts</surname>
          </string-name>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          . 2139/ssrn.4475995.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Küchemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bannert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Gasser</surname>
          </string-name>
          , G. Groh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Günnemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hüllermeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krusche</surname>
          </string-name>
          , G. Kutyniok,
          <string-name>
            <given-names>T.</given-names>
            <surname>Michaeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Nerdel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pfefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Poquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sailer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Seidel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          , G. Kasneci,
          <article-title>ChatGPT for good? On opportunities and challenges of large language models for education</article-title>
          ,
          <source>Learning and Individual Diferences</source>
          <volume>103</volume>
          (
          <year>2023</year>
          )
          <article-title>102274</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.lindif.
          <year>2023</year>
          .
          <volume>102274</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in education: A systematic literature review</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>252</volume>
          (
          <year>2024</year>
          )
          <article-title>124167</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2024</year>
          .
          <volume>124167</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>A. M. Nazar</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Selim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gafar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <source>Revolutionizing Undergraduate Learning: CourseGPT and Its Generative AI Advancements</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407.18310. arXiv:
          <volume>2407</volume>
          .
          <fpage>18310</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Greif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gasevic</surname>
          </string-name>
          ,
          <article-title>Promises and challenges of generative artificial intelligence for human learning</article-title>
          ,
          <source>Nature Human Behaviour</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <fpage>1839</fpage>
          -
          <lpage>1850</lpage>
          . doi:
          <volume>10</volume>
          .1038/ s41562-024-02004-5.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. M. T. I.</given-names>
            <surname>Tonmoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M. M.</given-names>
            <surname>Zaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Rawte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chadha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          ,
          <source>A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models</source>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/abs/2401.01313. arXiv:
          <volume>2401</volume>
          .
          <fpage>01313</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shamsujjoha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Designing Multi-layered Runtime Guardrails for Foundation Model Based Agents: Swiss Cheese Model for AI Safety by Design</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2408.02205. arXiv:
          <volume>2408</volume>
          .
          <fpage>02205</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mu</surname>
          </string-name>
          , G. Jin,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Position: building guardrails for large language models requires systematic design</article-title>
          ,
          <source>in: Proceedings of the 41st International Conference on Machine Learning, ICML'24</source>
          , JMLR.org,
          <year>2024</year>
          , p.
          <fpage>451</fpage>
          . URL: https: //dl.acm.org/doi/10.5555/3692070.3692521.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          , L. Bing,
          <string-name>
            <surname>Verify-</surname>
          </string-name>
          and
          <article-title>-</article-title>
          <string-name>
            <surname>Edit</surname>
          </string-name>
          :
          <article-title>A Knowledge-Enhanced Chain-of-Thought Framework</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>5823</fpage>
          -
          <lpage>5840</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>320</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ishizu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Yeoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Okumura</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Fukuda,</surname>
          </string-name>
          <article-title>The Efect of Communicating AI Confidence on Human Decision Making When Performing a Binary Decision Task, Applied Sciences 14 (</article-title>
          <year>2024</year>
          )
          <article-title>7192</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14167192.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>