<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QuerIA: Contextual Learning-Driven Questionnaire Generation and Assessment based on Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Eyzaguirre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes-Olmedo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Sistemas Informáticos, ETSI, Universidad Politécnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents QuerIA, a system based on large language models (LLMs) and contextual learning, to automate the generation and evaluation of educational questionnaires. Central to QuerIA is its integration of Bloom's Taxonomy into the knowledge base of LLMs, enabling the transfer of structured educational objectives to dynamically generate questions that vary in cognitive dificulty. This approach facilitates nuanced customization of assessments that align with individual learning needs and cognitive levels. Using semantic segmentation and in-context learning techniques, QuerIA not only streamlines the creation of questionnaires, but also ensures the relevance and semantic integrity of the generated questions. Both the source code and the online service of QuerIA are publicly available. Our application of the Rasch model to evaluate the system confirms its capability to precisely adapt Bloom's hierarchical framework within the outputs of the LLM, thus achieving adequate control over the dificulty of questions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Assessment System</kwd>
        <kwd>Questionnaire Automation</kwd>
        <kwd>Adaptive Learning</kwd>
        <kwd>Bloom's Taxonomy</kwd>
        <kwd>Semantic segmentation</kwd>
        <kwd>Multiple Choice questions (MCQ)</kwd>
        <kwd>Open Ended Questions (OEQ)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Questionnaires are an essential educational tool for assessing student comprehension and promoting
active engagement in the learning process. Decades of research have demonstrated their efectiveness
in improving learning outcomes [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Feedback from quizzes allows students to gauge their own
understanding and revisit unclear content. However, creating high-quality questionnaires and delivering
timely feedback is labor-intensive and time-consuming. The quality and dificulty of questions are often
subjectively determined, and in automated settings, traditional methods like Bloom’s Taxonomy [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are
employed to manually set question dificulty.
      </p>
      <p>
        Recent advancements in question generation research have predominantly leveraged
Transformerbased large language models (LLMs) [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], which have significantly outperformed earlier rule-based and
supervised systems [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] . However, real-world applications of these technologies are scarce due to the
disconnect between academic research objectives and the practical needs of educators [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For example,
existing systems such as the rule-based system of Van Campenhout et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the GPT-based system
of Elkins et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have focused on basic question formats and utilized empirical strategies to improve
question diversity and reduce redundancy.
      </p>
      <p>
        Despite the existence of automated question generation systems based on natural language processing
(NLP), their integration into classrooms has been limited due to domain specificity, language restrictions,
and limitations in the types and dificulty levels of the questions generated [
        <xref ref-type="bibr" rid="ref12 ref5 ref9">5, 12, 9</xref>
        ]. Commercial
question generation services like WebExperimenter [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and AnswerQuest [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] ofer limited types of
questions, often restricted by language and lack of customizable dificulty settings. To address these
challenges, we have developed a bilingual framework that not only assists educators and students in
creating and assessing high-quality questionnaires in English and Spanish, but also incorporates an
approach of transferring the structured knowledge from Bloom’s Taxonomy into Large Language Models
(LLM) through contextual adjustments. This system ofers customizable dificulty levels and question
types, along with automated feedback and grading for open-ended questions, ensuring an adaptive
learning experience. To validate the efectiveness of our approach to transfer learning, we conducted
surveys with 20 Spanish university students, assessing the alignment of Bloom’s taxonomy in estimating
question dificulty and confirming the importance of precise instructional context segmentation when
using language models to generate high-quality questions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Adaptive Learning for Bloom’s Taxonomy Alignment</title>
      <p>
        Our framework utilizes “in-context learning”, a technique where language models generate outputs
based on examples and instructions provided within the input context, allowing for task adaptation
without additional fine-tuning. We utilized Llama 3-8B [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a large language model (LLM) trained
on extensive text data, to generate multiple-choice and open-ended questions from a given input
document. Our approach employs a semantic chunking strategy, segmenting the document into
sequential blocks of text, each serving as the basis for generating a question. To address three diferent
levels of dificulty, we developed a new taxonomy by grouping Bloom’s dimension levels [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] into
three categories. Question generation and automated grading are achieved through a combination of
instructional prompts based on our taxonomy and in-context learning techniques, such as few-shot
learning. The following subsections will delve into the specifics of the semantic chunking strategy, our
proposed taxonomy, and the automated grading method.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Semantic Chunking</title>
        <p>The proposed chunking strategy, which breaks down long-sequence inputs into manageable parts for a
LLM, is a crucial step of the question generation process. These chunks provide the necessary context
to the LLM, enabling it to generate relevant and accurate questions. By ensuring that each segment
maintains a consistent topic or content, the model can efectively understand the context, leading to
the generation of high-quality questions.</p>
        <p>
          Many popular Retrieval-Augmented Generation (RAG) frameworks, such as Langchain [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
LlamaIndex [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], Pincone [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], typically employ empirical or heuristic methods to address this problem.
In contrast, our work adopts a semantic chunking approach inspired by methodologies discussed by
Greg Kamradt [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Here, each chunk serves as the contextual foundation for generating individual
questions. The semantic chunking process comprises three key steps: Sentence Extraction, Embeddings,
and Merging.
        </p>
        <p>Initially, the text of a document is segmented into individual sentences for the sentence extraction
phase. In the embeddings phase, each sentence is grouped with the preceding and following sentence to
form a sentence cluster anchored by the central sentence, providing contextual coherence. The optimal
configuration includes one sentence before and after the central sentence, and embeddings are created
for these clusters. The semantic distances between sequential sentence groups are then compared,
grouping clusters that maintain a low semantic distance, indicating topic consistency, while a higher
distance suggests a topic shift, thus delineating distinct text chunks. In the merging phase, the final
breakpoints for chunking are determined by setting a threshold at the 80th percentile of the semantic
distances, allowing the granularity of the divisions to be adjusted and ensuring an optimal number of
chunks for efective question generation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dificulty based on Bloom’s Taxonomy</title>
        <p>To efectively categorize question dificulty into three levels, our proposed taxonomy groups the
cognitive dimension (CD) and knowledge dimension (KD) levels of Bloom’s Taxonomy. Our focus
is on specific levels for each dimension. From CD, we include the levels of Remember, Understand,</p>
        <p>Apply, Evaluate, and Analyze; from the KD, we consider Factual, Procedural, and Conceptual knowledge
types. The Create level from the CD and the Metacognitive level from the KD are excluded due to the
nature of multiple-choice questions, which require closed responses and do not provide the flexibility
to efectively assess creativity or self-reflection. The following taxonomy is proposed:</p>
        <sec id="sec-2-2-1">
          <title>1. Easy level: Cognitive level “Remember” and type of knowledge “Factual”.</title>
          <p>2. Intermediate level: Cognitive level “Understand” or “Apply”, and type of knowledge “Procedural”
or “Conceptual”.
3. Dificult level : Cognitive level “Analyze” or “Evaluate” and type of knowledge “Conceptual”.</p>
          <p>
            Furthermore, we incorporated the verbs associated with Bloom’s Taxonomy as identified by Stanny
[
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. Previous works [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] have shown that the choice of verbs in each category plays a crucial role
in determining the cognitive level required to answer a question. The input for question
generation is set to temperature 0.1, resulting in more deterministic and focused responses, while a higher
temperature would generate more unpredicted and creative outputs. The input is formatted as:
&lt;taxonomy_description&gt; &lt;few-shot-learning&gt; &lt;Instructions&gt; &lt;context&gt; Table 1 provides an example of
a multiple choice question generated by our system, further illustrating the application of Bloom’s
Taxonomy in our approach.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Automated grading</title>
        <p>
          Transitioning from examining question dificulty, the focus shifts to automating the grading of
openended questions using both basic and complex methodologies. While simpler answers aligned with lower
levels of Bloom’s taxonomy can be graded on surface-level features [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], responses demanding higher
cognitive skills, such as analysis or evaluation, require advanced syntactic and semantic assessments to
understand conceptual relationships and reasoning coherence. Traditional automatic grading systems,
which predominantly measure lexical or semantic overlaps [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], often fail to accurately score nuanced
answers and show poor alignment with human judgment, suggesting limitations in capturing the depth
of answers. To address these shortcomings, our approach involves using learned metrics [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] that
incorporate the question’s context and specific instructions, allowing pre-trained language models
to better approximate human evaluations. The system uses a three-tier grading scale and includes
instruction to enhance the accuracy of the model’s scoring, demonstrating significant improvements in
the automated grading of complex answers.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. QuerIA</title>
      <p>QuerIA enables users to upload textual documents such as lecture notes or textbooks for assessment.
Users customize their questionnaires by setting the number of questions, choosing between
openQUESTION 2
Option 1
Option 2
Option 3
Option 4
Evidence</p>
      <p>What type of information is used to evaluate negative side efects of
vaccines and distinguish them from false alarmists?
Scientific evidence and statistical data
Analysis of the chemical composition of vaccines
Opinions of medical experts
Rigorously designed studies published in medical journals
The correct answer refers to the fact that anti-vaccine groups tend to
excessively underestimate the complications of infectious diseases that
are published in medical articles, while they magnify the side efects
of vaccines and ofer a very biased view of reality.</p>
      <p>CD
KD
Level
Rasch estimation</p>
      <p>Evaluate
Conceptual
Dificult
0.91
ended and multiple-choice formats, and selecting the dificulty level. Once uploaded, QuerIA processes
the document asynchronously, extracting content to intelligently generate questions and crafting
plausible distractors for multiple-choice questions. These questions are displayed in real-time for
immediate review to ensure they meet educational standards. Upon questionnaire completion, users
can answer directly on the platform, where QuerIA provides instant feedback on open-ended responses,
ofering corrections, improvements, or confirmations to enhance the learning experience through
active engagement. Additionally, the source code is publicly available on GitHub at QuerIA GitHub
Repository 1, and there is an online service hosted at QuerIA Online Service 2. However, the performance
of the online service may be slower as it operates on CPUs rather than the more eficient GPUs.</p>
      <p>QuerIA evaluates user-submitted answers by analyzing the content extracted from the uploaded
educational materials, ensuring that feedback is deeply rooted in the documented evidence. When a user
responds to a question, especially in open-ended formats, the system uses advanced NLP techniques
to assess the accuracy and relevance of the answer relative to the content of the source material. It
then provides a detailed commentary that justifies the answer, highlighting connections to specific
information within the document. For instance, if a response is incorrect or partially correct, QuerIA
ofers constructive feedback that references particular sections or concepts from the document, guiding</p>
      <sec id="sec-3-1">
        <title>1https://github.com/cbadenes/queria</title>
        <p>2https://cbadenes.github.io/queria/
users on how to improve their answers or understand the material more thoroughly. This process not
only aids in learning but also reinforces the educational content by linking feedback directly to the text,
fostering a more comprehensive understanding and retention of the material.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation and Results</title>
      <p>The eficacy of this framework has been empirically validated through surveys involving both
openended and multiple-choice questions, demonstrating its capability to align generated questions with
the intended dificulty levels as confirmed by both perceived dificulty assessments and Rasch analysis.
Furthermore, syntactic evaluations have verified the accurate alignment of language used in questions
with the cognitive and knowledge dimensions of Bloom’s Taxonomy. The innovative automated grading
method employed further underscores the framework’s utility by providing accurate assessments and
feedback, thus enabling efective self-assessment and adaptive learning. Future enhancements will
focus on refining the semantic chunker to include image and table processing capabilities and exploring
further in-context learning techniques for specialized subjects requiring detailed analytical skills.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this paper, we introduced a framework that automates the generation and assessment of
questionnaires, transcending domain-specific limitations and supporting multilingual implementation.
Our method integrates a taxonomy that breaks down Bloom’s dimension levels into three dificulty
categories: easy, intermediate, and dificult, into language models using instruction prompting and
few-shot learning, efectively creating leveled questions. We also introduced a semantic chunking
methodology that improves question quality by analyzing document semantics, allowing for the
generation of contextually relevant and semantically accurate questions without extensive fine-tuning.
The framework’s efectiveness was afirmed through surveys evaluating both open-ended and
multiple-choice questions, with the results from perceived dificulty assessments and Rasch analysis
confirming the accuracy of question dificulty alignment. Additionally, syntactic evaluations upheld the
alignment of verbs and interrogative adverbs with Bloom’s Taxonomy, and our innovative automated
grading method demonstrated accurate response assessments, facilitating efective self-assessment
and adaptive learning. Future work will focus on improving the semantic chunker to process visual
elements such as images, charts, tables and exploring advanced in-context learning techniques for
specialized disciplines that require structured reasoning, such as mathematics or programming.</p>
      <p>Acknowledgments. We acknowledge the support of the Educational Innovation Project at
Universidad Politécnica de Madrid for facilitating and promoting this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>e.</surname>
          </string-name>
          <article-title>a. Ambrose, How learning works: Seven research-based principles for smart teaching</article-title>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wylie</surname>
          </string-name>
          ,
          <article-title>The icap framework: Linking cognitive engagement to active learning outcomes</article-title>
          ,
          <source>Educational Psychologist</source>
          <volume>49</volume>
          (
          <year>2014</year>
          )
          <fpage>219</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Koedinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Corbett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Perfetti</surname>
          </string-name>
          ,
          <article-title>The knowledge-learning-instruction framework: Bridging the science-practice chasm to enhance robust student learning</article-title>
          ,
          <source>Cognitive Science 36</source>
          (
          <year>2012</year>
          )
          <fpage>757</fpage>
          -
          <lpage>798</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <article-title>The Taxonomy of Educational Objectives, the Classification of Educational Goals</article-title>
          ,
          <source>Volume Handbook I: Cognitive Domain</source>
          ,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kurdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , et al.,
          <article-title>A systematic review of automatic question generation for educational purposes</article-title>
          ,
          <source>International Journal of Artificial Intelligence in Education</source>
          <volume>30</volume>
          (
          <year>2020</year>
          )
          <fpage>121</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , et al.,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mulla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gharpure</surname>
          </string-name>
          ,
          <article-title>Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications</article-title>
          ,
          <source>Progress in Artificial Intelligence</source>
          <volume>12</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Steuer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bongard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uhlig</surname>
          </string-name>
          , G. Zimmer,
          <article-title>On the linguistic and pedagogical quality of automatic question generation via neural machine translation, in: Technology-Enhanced Learning for a Free, Safe,</article-title>
          and Sustainable World, Springer,
          <year>2021</year>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Houghton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Towards process-oriented, modular, and versatile question generation that meets educational needs</article-title>
          ,
          <source>arXiv preprint arXiv:2205.00355</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Van Campenhout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hubertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <article-title>Evaluating ai-generated questions: A mixedmethods analysis using question data and student perceptions</article-title>
          ,
          <source>in: Artificial Intelligence in Education: 23rd International Conference, AIED</source>
          <year>2022</year>
          ,
          <article-title>Durham</article-title>
          ,
          <string-name>
            <surname>UK</surname>
          </string-name>
          ,
          <source>July 27-31</source>
          ,
          <year>2022</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Elkins</surname>
          </string-name>
          , E. Kochmar, Cheung,
          <article-title>How teachers can use large language models and bloom's taxonomy to create educational quizzes (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Seßler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Katharina</surname>
          </string-name>
          , et al.,
          <article-title>Chatgpt for good? on opportunities and challenges of large language models for education</article-title>
          ,
          <source>Learning and Individual Diferences</source>
          <volume>103</volume>
          (
          <year>2023</year>
          )
          <fpage>102274</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hoshino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nakagawa</surname>
          </string-name>
          ,
          <article-title>Webexperimenter for multiple-choice question generation</article-title>
          ,
          <year>2005</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Roemmele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sidhpura</surname>
          </string-name>
          , S. DeNeefe, L. Tsou,
          <article-title>Answerquest: A system for generating questionanswer items from multi-paragraph documents</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>M. AI</surname>
          </string-name>
          ,
          <volume>Llama3</volume>
          8b model,
          <year>2024</year>
          . URL: https://github.com/meta-llama/llama3, accessed:
          <fpage>2024</fpage>
          -07-10.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. W.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Krathwohl</surname>
          </string-name>
          ,
          <article-title>A Taxonomy for Learning, Teaching and Assessing: A Revision of Bloom's, Pearson</article-title>
          <string-name>
            <surname>Education</surname>
          </string-name>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Inc. LangChain</surname>
          </string-name>
          ,
          <source>Langchain Documentation on Text Splitters</source>
          ,
          <year>2023</year>
          . URL: https://js.langchain.com/.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwaber-Cohen</surname>
          </string-name>
          ,
          <source>Chunking Strategies for LLM Applications</source>
          ,
          <year>2023</year>
          . URL: https://www.pinecone. io/learn/chunking-strategies/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kamradt</surname>
          </string-name>
          ,
          <article-title>5 levels of text splitting</article-title>
          , https://github.com/FullStackRetrieval-com/RetrievalTutorials/ blob/main/,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -07-08.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Stanny</surname>
          </string-name>
          ,
          <article-title>Reevaluating bloom's taxonomy: What measurable verbs can and cannot say about student learning</article-title>
          ,
          <source>Educ. Sci. 6</source>
          (
          <year>2016</year>
          )
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>U.</given-names>
            <surname>Padó</surname>
          </string-name>
          ,
          <article-title>Get semantic with me! the usefulness of diferent feature types for short-answer grading</article-title>
          ,
          <source>in: Proceedings of COLING-2016</source>
          , Osaka, Japan,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kishore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Artzi</surname>
          </string-name>
          , Bertscore:
          <article-title>Evaluating text generation with bert</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sellam</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Das</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
          </string-name>
          ,
          <article-title>BLEURT: Learning robust metrics for text generation</article-title>
          ,
          <source>in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>7881</fpage>
          -
          <lpage>7892</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>