<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Textbook Accessibility through AI Simplification: Readability Improvements and Meaning Preservation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benny G. Johnson</string-name>
          <email>benny.johnson@vitalsource.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bill Jerome</string-name>
          <email>bill.jerome@vitalsource.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeffrey S. Dittel</string-name>
          <email>jeff.dittel@vitalsource.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rachel Van Campenhout</string-name>
          <email>rachel.vancampenhout@vitalsource.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>VitalSource Technologies</institution>
          ,
          <addr-line>Raleigh, NC 27601</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generative artificial intelligence has the potential to tackle many longstanding educational challenges, including helping students comprehend difficult textbook material. Textbooks are considered the goldstandard for rigorous and expert-developed educational content, yet still pose challenges to students who struggle with the complexity of textbook language. A strength of large language models (LLMs) is their ability to manipulate text according to specific requirements. An LLM was harnessed to create a “simplifier” tool in a higher education ereader platform, allowing students to select a textbook passage and receive a simplified version of that content. In this study, we analyzed 54,371 simplifier interactions to compare the original textbook content and simplified versions according to estimated readability, lexical and syntactic simplification, and semantic fidelity. Results indicate that the simplifier tool was able to reduce complexity of the original text while maintaining meaning, laying groundwork for future studies involving student perception and comprehension outcomes. The practical implications of this tool for enhancing textbook accessibility and supporting student comprehension are discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>artificial intelligence</kwd>
        <kwd>large language models</kwd>
        <kwd>simplification</kwd>
        <kwd>readability</kwd>
        <kwd>semantic fidelity 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Textbooks are considered the gold-standard for content, as they are developed by reputable subject
matter experts and subjected to thorough accuracy reviews. Textbooks are often assigned by faculty
for students to read as part of their coursework; however, it is known through decades of research
that students do not read as expected. A longitudinal study between 1981 and 1997 found student
textbook reading declined in that period [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Several research studies across disciplines found that
only a small percentage of students (16–27%) reported reading before class [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2–5</xref>
        ]. Russell et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
gained further insights into student reading using ebook data, finding that when faculty did not
employ a reading strategy, students read only 14% of the textbook. When asked why they did not
read, in addition to common factors such as time limitations or level of perceived importance, some
students noted they needed scaffolding for readings and were unsure of how to approach the
textbook [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Difficulty and readability were factors for struggling students when interviewed as part of an
internal user experience study. College students, who had previously used learning features as part
of their traditional university courses, volunteered to discuss new learning feature prototypes as well
as their own experiences, motivations, and struggles. Students reported that they often found the
textbook content intimidating, struggled with the language, or felt the material was too complex to
comprehend. This led students to give up on their reading assignments prematurely or avoid reading
altogether. Their experiences provided additional tangible, student-centered motivation to tackle the
central challenge of textbook readability. Although the current study does not assess students’
perceptions of simplified text, it represents an essential first step in validating whether AI-based
simplifications are linguistically simpler and semantically faithful in authentic use.</p>
      <p>
        While student interviews provide valuable contextualization of the ways in which readability can
deter some students, Sheridan-Thomas [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] frames textbook comprehension as an issue of access and
equity:
      </p>
      <p>Students who struggle with extracting important information and making meaning from
textbook reading do not have the same access to course material as competent textbook
readers. Helping all students comprehend textbook reading is an equity issue. For courses in
which textbooks are used, whether as the main source of information or as a secondary
reference, all students need to be able to use the textbook with as much competence and
independence as possible. (p. 267)</p>
      <p>Strategies for teachers to support students are also discussed; however, although instructors can
employ various pedagogical strategies to mitigate comprehension difficulties, providing personalized
reading assistance at scale often exceeds available instructional resources. Consequently, scalable
technological solutions are increasingly appealing.</p>
      <p>
        The significant advancement of large language models (LLMs) has made it possible to address this
challenge in a personalized manner. A strength of LLMs is their ability to manipulate language as
directed, making them promising tools for producing accessible, simplified versions of complex
academic texts. Recent studies have begun to explore LLM-based simplification. For example,
Guidroz et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] conducted a randomized controlled study involving over 4,500 participants,
demonstrating that LLM-generated simplifications significantly improved reading comprehension
and reduced perceived cognitive load, particularly in complex domains such as biomedical articles
and financial texts. Similarly, recent LLM-based tools such as SimplifyMyText have been specifically
designed to create plain-language adaptations aimed at enhancing inclusivity and accessibility [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Additionally, progressive approaches proposed by Fang et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] illustrate that LLMs can handle
complex document-level simplifications by systematically decomposing tasks from discourse-level
down to lexical-level adjustments. However, although these studies collectively demonstrate the
potential of LLMs for simplifying complex texts, empirical evidence from authentic educational
settings remains limited.
      </p>
      <p>To address this gap, the current study investigates student-initiated LLM-based simplifications
generated within real-world textbook environments using an embedded ereader interface. In fall
2024, the VitalSource Bookshelf platform introduced an LLM-powered text simplification tool as a
free enhancement within the ereader interface (available in textbooks from publishers granting
permission for generative AI features). Students can highlight a passage of text and select "Simplify"
(Figure 1) to receive a simplified version displayed in an interactive side panel chat window next to
the textbook content (Figure 2). The primary goal of the simplifier is to reduce lexical and syntactic
complexity to improve readability and comprehension for students. It does not explicitly aim to
summarize or deeply elaborate content beyond clarifying complex sentences.</p>
      <p>After the initial simplification, the student is prompted to attempt to restate the content in their
own words to check their understanding or ask for another simplification; however, the current
study focuses specifically on analyzing the initial simplification event.</p>
      <p>
        The theoretical underpinning for the simplification approach employed here aligns with cognitive
load theory [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which posits that learning is optimized when extraneous cognitive load is
minimized. Complex lexical and syntactic structures in textbooks can represent extraneous load,
hindering students’ ability to engage deeply with instructional content, particularly when combined
with the inherently high intrinsic cognitive load of challenging academic material. By simplifying
these linguistic structures, we aim to reduce unnecessary cognitive effort, enabling students to better
construct coherent mental representations as described by Kintsch's Construction-Integration Model
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This theoretically informed approach highlights the potential for simplified text to support not
just immediate readability, but deeper comprehension and retention of complex academic content.
      </p>
      <p>This study examines the practical effectiveness of simplifications across a substantial dataset of
more than fifty thousand requests, assessing their impacts along key dimensions of readability,
lexical and syntactic simplification, and semantic fidelity. Specifically, the study addresses two core
research questions:</p>
      <p>By grounding analysis in spontaneous student engagement and systematically assessing
simplifications in authentic educational settings, this study contributes to understanding how
LLMgenerated simplifications function in practice and lays the groundwork for evaluating their potential
to support student comprehension and educational equity. Although cognitive load theory motivates
the design of the simplifier, the current study does not directly assess cognitive load reduction.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. Simplification Procedure</title>
        <p>
          The textbook simplifications were generated using OpenAI GPT-4o [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The process was carried
out on demand in real time upon student request, using an interface embedded in the ereader
platform (Figures 1 and 2). Parameter settings were temperature = 0, top_p = 1, and max_tokens
= 4095 to ensure consistency and determinism in the simplification outputs.
        </p>
        <p>The simplifier was prompted to act as a helpful college professor, assisting a student who reported
difficulty understanding highlighted sentences from a textbook. The prompt explicitly instructed the
simplifier to reduce sentence complexity, decrease reading level by approximately four grade levels,
substitute specialized terms with simpler and more general vocabulary, and maintain a
conversational, positive tone. However, no explicit instructions were given regarding the target
length of simplified text, nor was guidance provided against summarizing beyond simplification.</p>
        <p>The LLM was given the student-selected text requiring simplification along with additional
surrounding content from the textbook. Context was determined by including the immediate
paragraph containing the selection and the larger section or subsection enclosing that paragraph.
This approach aimed to provide sufficient relevant context without incorporating excessive portions
of text. However, due to variation in textbook formatting, the extracted context may not always align
precisely with clearly defined chapter subsections. While the LLM may have been pretrained on
similar domain material, providing local textbook context helps ensure the simplification is tailored
to the student’s selected passage. No additional post-simplification filters or fidelity checks were
applied during real-time student interactions.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Collection and Analysis</title>
        <p>
          The dataset consists of student-initiated simplification events recorded between September 1, 2024,
and April 30, 2025. The dataset contains 54,371 events generated by 11,689 students across 2,082
distinct textbooks. This dataset is publicly available in our open-source data repository [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ],
facilitating replication and further research. Almost 95% of usage was from higher education
institutions in the United States and Canada. The ereader platform did not collect any student
demographic characteristics. Using the BISAC major subject heading classification for the textbooks
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the top subject domains as a percentage of the data were Social Science (29.7%), Political Science
(16.1%), and Psychology (13.8%). The impact of simplification was quantified along four dimensions:
•
•
•
•
        </p>
        <p>Readability: the ease with which a text can be read and processed, related to amount of effort
required by the reader
Lexical simplification: the replacement of complex words or phrases with simpler alternatives
Syntactic simplification: the reduction of structural complexity of sentences</p>
        <p>Semantic fidelity: the degree to which the original meaning is maintained after simplification
Each metric was computed on both the original student-selected passage and its LLM-simplified
version. Differences between simplified and original texts (Δ = simplified – selected) were analyzed.
Because the simplification feature addresses both lexical and syntactic aspects of the student-selected
text, additional analyses were performed to gain insight into the relative contribution of each type
of simplification.</p>
        <p>
          Readability improvements (RQ1) were measured using two widely recognized metrics. The
primary measure was the Flesch–Kincaid Grade Level (FKGL), chosen for its direct interpretability
in terms of U.S. educational grade levels [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Complementing FKGL, the Flesch Reading Ease (FRE)
scale was used [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Notably, both FKGL and FRE rely on two core linguistic variables: average
syllables per word, which primarily reflects lexical complexity, and average words per sentence,
which captures syntactic complexity. However, these features are weighted differently by each
metric; in particular, FRE places greater emphasis on word length compared to FKGL and thus serves
as an additional robustness check. Both metrics were computed using the readability module in the
NLTK library [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], which determines sentence boundaries, word counts, and syllable counts
internally.
        </p>
        <p>While FKGL and FRE have known limitations, they remain standard proxies for readability in
educational research due to their transparency and alignment with grade-level norms. Their use here
provides a practical and interpretable means of assessing changes in estimated readability across a
large dataset. Future work may explore more advanced metrics, including those based on language
models, to complement these analyses.</p>
        <p>
          Lexical simplification was assessed by examining replacements of less common or more
morphologically complex words with simpler alternatives. The primary lexical measure was the
change in mean corpus log probability (Δ log p, natural log units) of content words (nouns, verbs,
adjectives, and adverbs), calculated using the precomputed probabilities available from spaCy's
en_core_web_lg model (version 3.5.0) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. This model’s word frequency estimates are based on
large-scale web and news corpora, making it suitable for general-purpose lexical analysis. Positive Δ
values indicate substitutions of less frequent words with higher-frequency (more common) words,
generally corresponding to simpler vocabulary [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. As a complementary lexical measure, the change
in average word length in characters was computed, reflecting morphological simplification.
        </p>
        <p>
          Syntactic simplification was assessed by measuring reduction in sentence structural complexity.
The primary syntactic measure was the average change in dependency tree depth (Δ dependency
depth), calculated as the mean distance (in ancestor links) from each word to its sentence root using
spaCy’s dependency parser (en_core_web_lg). A well-established syntactic complexity measure is
dependency length (linear distance between syntactically linked words), which has been shown to
impact processing difficulty. While dependency depth is distinct from dependency length, it similarly
reflects hierarchical structure and has been proposed as a proxy for syntactic complexity. Futrell et
al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] provide large-scale evidence that minimizing syntactic dependencies supports processing
efficiency, motivating the use of structural measures like depth in simplification analysis. To
complement this, the change in average sentence length (Δ words / sentence) was computed,
reflecting the degree to which simplification involved clause splitting.
        </p>
        <p>
          Preserving semantic fidelity during simplification (RQ2) was evaluated primarily using cosine
similarity between the selected and simplified text, computed on embeddings obtained via the
allmpnet-base-v2 model from the sentence-transformers library [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], which has demonstrated
strong performance across various semantic textual similarity tasks. Prior research has demonstrated
that cosine similarities derived from Sentence-BERT embeddings correlate strongly with human
judgments of semantic similarity [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Other metrics commonly used in text simplification research
(e.g., BLEU, ROUGE, SARI) rely on reference-based comparisons and are oriented primarily toward
evaluating lexical overlap or n-gram similarity with human-authored simplifications. Because the
current study involves spontaneous, student-initiated simplifications without curated reference texts
and prioritizes semantic fidelity and readability in authentic educational contexts, these metrics were
not directly applicable.
        </p>
        <p>
          To identify a cosine similarity threshold for acceptable semantic fidelity, an empirical validation
procedure was conducted using an established semantic similarity rating scale [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], which ranges
from 0 (different topics) to 5 (completely equivalent). The scale characterizes a rating of 4 as “mostly
equivalent, but some unimportant details differ,” which serves as a conservative minimum standard
for acceptability in the context of assisting college students struggling with textbook readability.
Ratings of 3 or lower indicate more substantial alterations, such as extensive summarization, which
exceed the intended scope of the simplification approach.
        </p>
        <p>
          Cosine similarity scores were partitioned into bands of width .1 (.5–.6, …, .9–1.0). From each band,
40 original-simplified pairs were randomly selected. Each pair was independently rated for semantic
similarity by OpenAI's o3 LLM [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], blinded to cosine similarity values. While the similarity ratings
were generated by an LLM, the authors independently reviewed samples of these ratings and the
rationale provided for each case and found them to be reasonable and well-aligned with human
judgments. Prior work has also demonstrated strong correlation between LLM-based semantic
similarity judgments and human ratings [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. The lowest similarity band for which all randomly
selected pairs received a rating of 4 or better was used to define the threshold. The sample size of 40
was chosen based on a power analysis using the Wilson method for estimating binomial proportion
confidence intervals (CIs). Observing 40 consecutive acceptable ratings provides a 95% CI of 95.6% ±
4.4%, ensuring a lower bound of over 90% for the true proportion of acceptable cases. Adjacent bands
were also evaluated, confirming those below the threshold failed to consistently achieve ratings of 4
or higher, whereas higher bands passed, reinforcing the threshold’s stability.
        </p>
        <p>Although alternative approaches could arrive at different threshold values, this approach is
justified on several grounds: it is empirically derived, use-case-specific, reproducible, and leverages
the LLM’s broad semantic knowledge, making it likely better suited than a single human rater for
evaluating pairs across numerous diverse textbook domains.</p>
        <p>Because the all-mpnet-base-v2 embedding model has an input limit of 384 tokens (i.e.,
subword units used by language models to process text, approximately 300 words), longer texts
required truncation to meet this constraint. Such longer texts comprised 17.5% of student selections.
Cosine similarities were examined separately for shorter (≤ 300 words) and longer (&gt; 300 words)
selected passages, finding the distributions to be closely aligned. This suggests that embeddings
computed by truncating these longer passages still robustly represented their semantic content.
Therefore, additional chunking or embedding aggregation methods were deemed unnecessary, as
they would likely not substantially alter semantic similarity assessments when passages maintain a
consistent focus throughout.</p>
        <p>
          As a complementary semantic metric, the compression ratio (CR) was computed as the ratio of
the simplified passage length to the original passage length in words. An automated metric in text
simplification evaluation [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], CR serves as an additional diagnostic tool when interpreted jointly
with cosine similarity. Because simplification strategies can vary, it is treated qualitatively rather
than used as a strict threshold. For instance, a low cosine similarity coupled with a CR significantly
below 1 may indicate excessive reduction or summarization, potentially leading to the omission of
critical information. A study by Schwarzer [26] found that lower CRs (indicating greater length
reduction) correlate positively with perceived simplicity but negatively with adequacy. Conversely,
a low cosine similarity with a CR significantly above 1 could suggest elaboration or introduction of
new content not present in the original text. By jointly analyzing cosine similarity and CR, different
types of semantic divergence can be better identified and categorized, facilitating more targeted
qualitative assessments.
        </p>
        <p>A scatterplot of compression ratio versus cosine similarity revealed no visually discernible
boundaries or clusters, indicating a continuous rather than categorical relationship between these
metrics and semantic fidelity. Consequently, cosine similarity and CR thresholds were used as
diagnostic guidelines rather than definitive indicators, highlighting the necessity of qualitative
analysis to accurately assess potential meaning loss or elaboration.</p>
        <p>Readability formulas such as FKGL and FRE assume continuous prose. When the selected text
deviates significantly from typical prose, such as glossary entries, answer-key lists, or structured
outlines, these formulas can yield extreme, uninformative values. For example, an extended run-on
“sentence” created by a bulleted list of phrases lacking punctuation can artificially inflate FKGL or
sharply decrease FRE scores, resulting in misleading values unrelated to the simplification tool’s
actual performance. Manual inspection revealed that virtually all passages assessed in the top 1% in
reading difficulty by either metric (FKGL &gt; 44.0 or FRE &lt; -60.9) represented these non-prose formats.
These outliers (n = 634, 1.2% of the dataset) were therefore excluded from analysis. Very low
FKGL/high FRE values, indicating already-readable prose, were not removed because such passages
could not exaggerate estimated readability improvements. Re-running the full dataset without
trimming altered the mean FKGL improvement by ~0.5 grade levels and the mean FRE improvement
by less than 2 points (but considerably reduced standard deviations), with no change to the overall
statistical conclusions.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results and Discussion</title>
      <p>Table 1 presents descriptive statistics (mean, standard deviation, first quartile, median, third quartile)
summarizing the extent of readability improvements, lexical and syntactic simplifications, and
semantic fidelity across all simplification events analyzed. These results address RQ1 by quantifying
improvements in readability. To address RQ2, we then examine semantic fidelity metrics in more
detail, reporting the proportion of simplification events falling below the empirically determined
cosine similarity threshold and analyzing illustrative examples to explore potential sources of
divergence, such as shifts in structure, vocabulary, or emphasis that may affect perceived meaning.</p>
      <sec id="sec-3-1">
        <title>3.1. Readability</title>
        <p>Metric
Δ FKGL
Δ FRE
Δ log p
Δ chars / word
Δ dependency depth
Δ words / sentence
Cosine similarity
Compression ratio
Prior to simplification, the mean FKGL of selected textbook passages was 16.65, indicating content
typically written at a level substantially above typical undergraduate reading expectations. The
average simplification lowered FKGL by 7.37 grade levels to 9.28, bringing the text into a more
accessible range for college-level readers. The interquartile range (Q1 = -9.20, Q3 = -4.51) shows that
simplifications consistently resulted in meaningful readability improvements, with even the
leastimproved examples achieving several grade levels of improvement. The mean FRE increase was
approximately 31 points (Q1 = 20.18, Q3 = 40.07), reinforcing that texts were easier to read
postsimplification.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Lexical Simplification</title>
        <p>Lexical simplification was assessed through changes in word familiarity and length. The mean
increase of 1.02 in the Δ log p metric corresponds roughly to a 2.8-fold increase in average content
word frequency. This indicates words were replaced with more common synonyms, increasing
lexical familiarity for readers. The interquartile range (0.66 to 1.33) indicates consistent lexical
simplifications. Simplified texts also showed an average reduction of 0.41 characters per word,
suggesting a preference for shorter, simpler words.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Syntactic Simplification</title>
        <p>Simplified texts also showed clear evidence of structural simplification. Dependency depth, reflecting
syntactic complexity, was reduced on average by 0.98 levels, signaling less complex sentence
structures. With an interquartile range from -1.33 to -0.48, the results indicate readers encounter
fewer deeply embedded modifiers, which lowers working memory load and improves clarity. The
average simplification reduced sentence length by about 14.6 words. These reductions make
sentences easier to parse and understand, supporting the syntactic effectiveness of the simplification
process.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Semantic Fidelity</title>
        <p>It is important to confirm that semantic fidelity (preservation of original meaning) is maintained
alongside reductions in complexity. Mean cosine similarity was .85, suggesting that while the
simplified texts differed structurally and lexically from their originals, the core meanings remained
well-preserved. The narrow interquartile range (.81 to .91) indicates stable semantic fidelity across
the majority of simplifications. Simplified texts on average retained 80% of the original length. The
interquartile range (66% to 92%) shows variability, but consistently high ratios are consistent with
simplifications that reduce extraneous complexity without losing critical information.</p>
        <p>Collectively, these metrics indicate that substantial readability improvements can be achieved
while maintaining high semantic fidelity. This alignment suggests high potential educational value
for college students, as simplified texts effectively reduce cognitive load without compromising
essential meaning.</p>
        <p>To further assess semantic fidelity, an empirical threshold for acceptable cosine similarity was
established at .7 using the procedure described in the Method section. Applying this threshold, 94.5%
of original-simplified pairs demonstrated acceptable fidelity. The remaining 5.5% warrant further
investigation, as potential loss of meaning is indicated. These lower-similarity cases likely reflect
more substantial semantic shifts, such as unintended summarization, elaboration beyond
simplification, or possible inaccuracies. Further analysis can characterize the nature of semantic
divergences and inform subsequent refinement of the simplification process.</p>
        <p>First, however, it is important to recognize that a cosine similarity below .7 does not automatically
indicate an unacceptable simplification. Of the 2,980 pairs with low cosine similarity, the majority
(76.8%) scored at least .6, i.e., only slightly below the threshold. The low-similarity cases were
therefore divided into moderately low (≥ .6, n = 2,289, 4.3% of the dataset) and very low (&lt; .6, n = 691,
1.3% of the dataset) similarity groups for further investigation.</p>
        <p>Applying the previously established sampling method, 95% confidence intervals were calculated
for the proportion of acceptable simplifications in each group. In the moderately low similarity
group, 37 of 40 pairs were rated acceptable, resulting in a 95% CI of 88.8% ± 8.6%. In the very low
group, 31 of 40 pairs were rated acceptable, giving a 95% CI of 75.1% ± 12.6%. These findings suggest
that even at substantially lower cosine similarities, most simplifications remain acceptable in
preserving semantic fidelity.
3.4.1. Example 1
The following pair from a Social Science textbook (cosine similarity = .55) illustrates how
simplifications with relatively low cosine similarity can remain semantically acceptable.</p>
        <p>Insofar as corrections remains at the heart of our social policy—rather than as a supplemental
or marginal support as it was throughout most of United States history—it is the Iron State
stealing from the future of the Golden State.</p>
        <p>When we focus too much on prisons as a main part of our social policy, it takes away from
other important areas. In the past, prisons were just a small part of our approach. Now, they
take up a lot of attention and resources. This focus on prisons is like taking away from our
future growth and success.</p>
        <p>The simplification demonstrates substantial readability improvement, reducing FKGL from 19.0
to 7.2. Several factors appear to be contributing to the lower cosine similarity:
•
•
•</p>
        <p>Metaphorical language and named entities were removed. Terms like “Iron State” and
“Golden State” carry substantial semantic weight through metaphor and allusions to the
prison system and an idealized California, respectively. Omitting these terms in the simplified
version removed a dense semantic anchor, thus lowering the cosine similarity despite the
preservation of the core meaning.</p>
        <p>Key domain terms experienced semantic shifts. The term “corrections” was simplified to
“prisons,” “supplemental or marginal support” became “small part of our approach,” and
“stealing from the future” transitioned to “taking away from our future growth.” While these
paraphrases effectively retained the intended meaning, the phrase-level embeddings for each
substitution may occupy different positions in semantic space.</p>
        <p>Rhetorical intensity and evaluative tone were softened in the simplified text. The original
vivid and critical expression “stealing from the future” was rendered in more neutral
economic language.</p>
        <p>The combination of such factors illustrates how low similarity values may occur despite
preservation of the original's essential meaning as perceived by human readers.
3.4.2. Example 2
Next, cases with potential substantive meaning loss are considered, specifically those exhibiting very
low compression ratios. We consider cases where the simplified text contains fewer than half the
original number of words (CR &lt; 0.5). Among pairs with low cosine similarity, a strong negative
correlation was observed between the length of original selections and their CR (r = -.72, p &lt; .001).
This highlights that longer original passages were substantially more likely to undergo extensive
summarization in simplification.</p>
        <p>Detailed examination of these pairs revealed notable trends. The primary strategy identified was
summarization and condensation, going further than lexico-syntactic simplification for readability
improvement. This approach led to significant reductions in supporting details, examples, historical
context, nuanced definitions, and sometimes important qualifications originally present. Despite
these substantial reductions, simplified versions generally retained accurate representations of the
original core ideas. Semantic drift, although possible, was typically minimal, and hallucinations were
not observed in these cases.</p>
        <p>The following simplification from a Psychology textbook (CR = 0.36, cosine similarity = .54)
illustrates this pattern.</p>
        <p>Reaction chains are similar to FAPs, but with one major difference—each set of responses in
a reaction chain requires an appropriate stimulus to set it off. Recall that once a fixed-action
pattern (FAP) begins, the animal usually continues the sequence even when the stimuli that
set off the behavior are removed. In the previous squirrel and nuts example, the animal
continues to dig a hole and bury the non-existent nut, even if the nut is removed. In contrast,
a reaction chain requires the presence of a specific stimulus to activate each link in the
sequence of behavior. An organism’s performance produces stimuli that set off the next
series of responses in the chain; these behaviors in turn produce the stimuli followed by
another set of responses. Presenting a stimulus that ordinarily occurs in the middle part of
the sequence activates the chain at that point rather than at the beginning. Also, unlike FAPs,
if the stimuli that activate behavior are removed, the sequence is disrupted. The courtship
ritual of the male and female stickleback fish (Gasterosteus aculeatus) is a reaction chain
(Figure 3.2). Reaction chains often show behavioral flexibility similar to MAPs, but here we
describe an idealized behavioral sequence. During the mating season, the reaction chain is
initiated when a male stickleback sees a female and begins a zigzag dance, displaying his red
underbelly. For a receptive female with eggs, the male’s display functions as a stimulus that
causes the female to arch her back and reveal her swollen abdomen. This initiates swimming
by the male to a previously constructed nest. The female follows and enters the nest with
her tail protruding, causing the male to nibble at the base of her tail and initiate the
egglaying sequence. Once the female has deposited her eggs, the male ousts her from the nest,
enters the nest, and fertilizes the eggs, thereby completing the reaction chain.</p>
        <p>Reaction chains and fixed-action patterns (FAPs) are alike, but they have a key difference. In
a reaction chain, each step needs a specific trigger to start. Once a FAP starts, the animal
keeps going even if the trigger is gone. For example, a squirrel will keep burying a nut even
if the nut is taken away. But in a reaction chain, each step needs its own trigger to happen.
The animal's actions create new triggers for the next steps. If you give a trigger that usually
happens in the middle, the chain starts there instead of at the beginning. Unlike FAPs, if you
take away the triggers in a reaction chain, the sequence stops.</p>
        <p>In this case, a detailed biological description was substantially condensed, yet the core conceptual
contrast was preserved. The simplification omits technical elaborations, such as the named species
and the full courtship sequence, in favor of a generalized explanation. While contextual richness is
reduced, the central distinction between FAPs and reaction chains is retained, and no hallucinated
content is introduced. The occurrence of extensive summarization was not an explicitly intended
outcome of the simplification method; however, it represents a manageable side effect compared to
problematic issues like hallucination or substantive meaning distortion. Should summarization be
deemed undesirable, adjustments through targeted prompt engineering could minimize this
behavior.</p>
        <p>Pedagogically, extensive summarization may offer advantages, particularly for simplifying very
long or complex passages. By distilling lengthy selections to core ideas, simplified texts might
improve clarity and comprehension, reduce cognitive load, and make complex topics more accessible
to learners. The educational value of this approach depends on instructional goals, context, and
specific learner needs and warrants further investigation.
3.4.3. Example 3
Very high compression ratios, where the simplified text contains over 50% more words than the
original (CR &gt; 1.5), were also analyzed. These cases, while less common than those involving
summarization, typically reflect elaboration and explanation, i.e., the expansion of dense or technical
content into more accessible language. This process often includes defining key terms, adding
context, restating implicit ideas explicitly, or providing illustrative examples.</p>
        <p>Although the expanded output introduces additional wording, semantic drift is rare and generally
mild when it occurs. The increased length is usually the result of purposeful elaboration rather than
hallucination or distortion. In these cases, the simplification trades conciseness for clarity, aiming to
enhance comprehension rather than merely reduce complexity.</p>
        <p>The following simplification from a Psychology textbook (CR = 2.03, cosine similarity = .64)
illustrates this pattern.</p>
        <p>Publication bias is a fundamental problem with our method of creating the official record of
scientific inquiry such that studies that report statistically significant findings are more likely
to be published than studies that report no statistically significant findings.</p>
        <p>When scientists do research, they often look for results that show a clear effect or change.
These are called statistically significant findings. Studies with these kinds of results are more
likely to get published in scientific journals. On the other hand, studies that don't find a clear
effect, known as "negative studies," are less likely to be published. This creates a problem
because it means that the official record of scientific research might not include all the
important information.</p>
        <p>In this example, the simplified text is substantially longer primarily because it elaborates on key
concepts to improve clarity, rather than because it introduces new or unrelated information. While
the original text concisely employs technical terms such as “publication bias” and “statistically
significant findings,” the simplified version uses more accessible language and explicitly defines
these concepts. Notably, it introduces the term "negative studies," a concept implied but not explicitly
labeled in the original passage. The expanded length results from explaining technical terms in
simpler vocabulary and restating ideas clearly to ensure reader comprehension. This illustrates that
simplification does not necessarily involve shortening text; rather, it can involve detailed unpacking
and explicit elaboration to enhance reader understanding.</p>
        <p>While not an intended behavior of the system, such elaboration may serve pedagogical goals
when precision and clarity outweigh brevity. Interestingly, although summarization and elaboration
may seem like opposing strategies, both can plausibly emerge in response to different forms of
complexity. Dense, highly technical language may induce the LLM to elaborate and clarify, while
verbose or example-laden passages may lead to condensation. Although neither behavior was
explicitly instructed in the prompt, both reflect the LLM’s responsiveness to the local context and
illustrate the nuanced, context-sensitive nature of simplification. Whether such behavior is beneficial
depends on instructional context and the needs of the student requesting the simplification, a
question that invites future empirical investigation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This study demonstrated that LLM-generated simplifications of textbook content can substantially
improve readability without sacrificing semantic fidelity. These findings also highlight that textbook
content often contains unnecessary lexical and syntactic complexity, as evidenced by students
actively selecting these passages and requesting simplification, suggesting they posed challenges to
comprehension. Cosine similarity analysis provided empirical evidence that simplified passages
reliably retain essential meanings, even when similarity scores are relatively low. Further analysis
of cases with low cosine similarity and extreme compression ratios revealed additional simplification
strategies, such as summarization and elaborative explanation. Collectively, these results underscore
the practical value of targeted simplification—making textbook content more accessible and
equitable for struggling students. While this suggests potential value for students, we stop short of
claiming comprehension gains, which would require further study.</p>
      <p>While the study does not include direct evaluations from students, its purpose was to establish
foundational evidence that LLM-based simplifications reduce linguistic complexity while preserving
meaning. We view this as a necessary precursor to studies that assess perceived helpfulness or effects
on comprehension. Although our analysis relies on automated metrics, these offer scalable and
objective indicators of meaning preservation and linguistic change. Nevertheless, automated metrics
may not fully capture all nuances of textual understanding, and future work should incorporate
complementary human-centered evaluations. Despite these limitations, the current findings provide
a rigorous foundation for the continued development of educationally aligned simplification
methods.</p>
      <p>Several directions warrant further exploration. One priority is investigating how students
perceive the quality and helpfulness of simplifications, perhaps through lightweight feedback
mechanisms such as the “thumbs up/down” method used in AI-generated question evaluation [27].
These perceptions could provide insights into preferences and trade-offs between simplification and
exact meaning retention. Additional work should also explore whether simplification leads to
improvements in comprehension or downstream learning outcomes—ultimately the most important
goal for educational applications. More broadly, methodological refinements may include identifying
optimal thresholds for semantic fidelity and readability improvement linked to enhanced learning.
In addition, extending analyses to a broader range of textbooks and subject areas would strengthen
generalizability. Future iterations may also explore adaptive supports, including optional inline
glosses or tunable elaboration levels, to better serve diverse learner populations, particularly those
who struggle with reading complex academic texts.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We thank the participating publishers for granting permission to enable generative AI features in their
textbook content and permitting the release of the resulting data for open research. We are grateful to
Lainey Murdock, Margaret Thompson-Schulz, and Mike Tapply for their support in preparing the open
dataset. Finally, we thank the reviewers for their thoughtful and constructive feedback on this paper.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI o3 and GPT-4.5 for: refining draft
content; paraphrasing and rewording; grammar and spelling checks. After using these tools, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
Natural Language Processing and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP): System Demonstrations (pp. 49–54). Association for
Computational Linguistics. https://aclanthology.org/D19-3009.pdf
[26] Schwarzer, M. (2018). Crowdsourcing text simplification with sentence fusion [Bachelor's thesis,</p>
      <p>Pomona College]. https://cs.pomona.edu/classes/cs190/thesis_examples/Schwarzer.18.pdf
[27] Johnson, B. G., Dittel, J., &amp; Van Campenhout, R. (2024). Investigating student ratings with
features of automatically generated questions: A large-scale analysis using data from natural
learning contexts. In Proceedings of the 17th International Conference on Educational Data
Mining (pp. 194–202). https://doi.org/10.5281/zenodo.12729796</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Burchfield</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sappington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Compliance with required reading assignments</article-title>
          .
          <source>Teaching of Psychology</source>
          ,
          <volume>27</volume>
          (
          <issue>1</issue>
          ),
          <fpage>58</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>An exploratory analysis of textbook usage and study habits: Misperceptions and barriers to success</article-title>
          .
          <source>College Teaching</source>
          ,
          <volume>59</volume>
          (
          <issue>1</issue>
          ),
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          . https://doi.org/10.1080/87567555.
          <year>2010</year>
          .509376
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Clump</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>The extent to which psychology students read textbooks: A multiple class analysis of reading across the psychology curriculum</article-title>
          .
          <source>Journal of Instructional Psychology</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ),
          <fpage>227</fpage>
          -
          <lpage>232</lpage>
          . https://psycnet.apa.org/record/2004-19597-007
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Connor-Greene</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Assessing and promoting student learning: Blurring the line between teaching and testing</article-title>
          .
          <source>Teaching of Psychology</source>
          ,
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <fpage>84</fpage>
          -
          <lpage>88</lpage>
          . https://doi.org/10.1207/S15328023TOP2702_
          <fpage>01</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2001</year>
          , May 4).
          <article-title>Can plot improve pedagogy? Novel textbooks give it a try</article-title>
          .
          <source>The Chronicle of Higher Education</source>
          ,
          <volume>47</volume>
          (
          <issue>35</issue>
          ),
          <fpage>A12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>J.-E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>George</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Damman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Instructional strategies and student eTextbook reading</article-title>
          .
          <source>In Proceedings of the ACM International Conference</source>
          (pp.
          <fpage>613</fpage>
          -
          <lpage>618</lpage>
          ). https://doi.org/10.1145/3576050.3576086
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sheridan-Thomas</surname>
            ,
            <given-names>H. K.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Assisting struggling readers with textbook comprehension</article-title>
          . In K. A.
          <string-name>
            <surname>Hinchman &amp; H. K.</surname>
          </string-name>
          Sheridan-Thomas (Eds.),
          <article-title>Best practices in adolescent literacy instruction</article-title>
          (pp.
          <fpage>164</fpage>
          -
          <lpage>184</lpage>
          ). Guilford Press.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Guidroz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ardila</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jhun</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kakarmath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellaiche</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garrido</surname>
            ,
            <given-names>M. Á.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choudhary</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartford</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serrano</surname>
            <given-names>Echeverria</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Shaffer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , Cao, …
            <surname>Duong</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.</surname>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>LLM-based text simplification and its effect on user comprehension and cognitive load</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2505.01980
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Färber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aghdam</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Im</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tawfelis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ghoshal</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>SimplifyMyText: An LLMbased system for inclusive plain language text simplification</article-title>
          .
          <source>In Advances in Information Retrieval: 47th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2025</year>
          , Lucca, Italy, April 6-
          <issue>10</issue>
          ,
          <year>2025</year>
          , Proceedings, Part IV (pp.
          <fpage>418</fpage>
          -
          <lpage>424</lpage>
          ). Springer. https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88717- 8_
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Progressive document-level text simplification via large language models</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2501.03857
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Sweller</surname>
            , J., van Merriënboer,
            <given-names>J. J. G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Paas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Cognitive architecture and instructional design: 20 years later</article-title>
          .
          <source>Educational Psychology Review</source>
          ,
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <fpage>261</fpage>
          -
          <lpage>292</lpage>
          . https://doi.org/10.1007/s10648-019-09465-5
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kintsch</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Comprehension: A paradigm for cognition</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <fpage>OpenAI</fpage>
          .
          <article-title>(2024, August 8). GPT-4o system card</article-title>
          . https://openai.com/index/gpt
          <article-title>-4o-system-card/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>VitalSource</given-names>
            <surname>Supplemental Data Repository.</surname>
          </string-name>
          (
          <year>2025</year>
          ). https://github.com/vitalsource/data
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Book</given-names>
            <surname>Industry Study Group</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Complete BISAC subject headings list</article-title>
          . https://www.bisg.
          <article-title>org/complete-bisac-subject-headings-list</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kincaid</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fishburne</surname>
            ,
            <given-names>R. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>R. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chissom</surname>
            ,
            <given-names>B. S.</given-names>
          </string-name>
          (
          <year>1975</year>
          ).
          <article-title>Derivation of new readability formulas (Automated Readability Index</article-title>
          , Fog Count, and Flesch Reading Ease Formula) for
          <source>Navy enlisted personnel (Research Branch Report 8-75)</source>
          .
          <source>Naval Air Station Memphis: Chief of Naval Technical Training.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Flesch</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>1948</year>
          ).
          <article-title>A new readability yardstick</article-title>
          .
          <source>Journal of Applied Psychology</source>
          ,
          <volume>32</volume>
          (
          <issue>3</issue>
          ),
          <fpage>221</fpage>
          -
          <lpage>233</lpage>
          . https://doi.org/10.1037/h0057532
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Natural language processing with Python: Analyzing text with the Natural Language Toolkit</article-title>
          .
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
            , I., Van Landeghem,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Boyd</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>spaCy: Industrial-strength natural language processing in Python</article-title>
          . https://doi.org/10.5281/zenodo.1212303
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Crossley</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>D. B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>D. S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Text readability and intuitive simplification: A comparison of readability formulas</article-title>
          .
          <source>Reading in a Foreign Language</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ),
          <fpage>84</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Futrell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahowald</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Large-scale evidence of dependency length minimization in 37 languages</article-title>
          .
          <source>In Proceedings of the National Academy of Sciences</source>
          ,
          <volume>112</volume>
          (
          <issue>33</issue>
          ),
          <fpage>10336</fpage>
          -
          <lpage>10341</lpage>
          . https://doi.org/10.1073/pnas.1502134112
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Sentence-BERT: Sentence embeddings using Siamese BERTnetworks</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          (pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          . https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -1410
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>SemEval-2012 Task 6: A pilot on semantic textual similarity</article-title>
          .
          <source>In Proceedings of the First Joint Conference on Lexical and Computational Semantics (SEM</source>
          <year>2012</year>
          )
          <article-title>(pp</article-title>
          .
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          . https://aclanthology.org/S12-1051/
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>OpenAI.</surname>
          </string-name>
          (
          <year>2025</year>
          , April 16).
          <article-title>OpenAI o3 and o4‑mini system card</article-title>
          . https://openai.com/index/o3-o4
          <string-name>
            <surname>-</surname>
          </string-name>
          mini
          <string-name>
            <surname>-</surname>
          </string-name>
          system-card/
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Alva-Manchego</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scarton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Specia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>EASSE: Easier automatic sentence simplification evaluation</article-title>
          .
          <source>In Proceedings of the 2019 Conference on Empirical Methods in</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>