<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>S-3 Pipeline for Biomedical Text Simplification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ansh Vora</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tanish Chaudhari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanjeev Hotha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dr. Sheetal Sonawane</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pune Institute of Computer Technology (PICT)</institution>
          ,
          <addr-line>Pune</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Text simplification is an important field in natural language processing that is aimed at improving the readability and understanding of complex text. This makes scientific and technical text more accessible for individuals with limited literacy and comprehension and aids patients in a healthcare setting. We present the S-3 Simplification system - an expert in biomedical text simplification that produces lexically and structurally simplified text, which is semantically fluent, accurate and easy to understand. The system integrated a semantic simplification using T5 models, AMR (Abstract Meaning Representation) -guided structural simplification and BERT-masked modelling with medical thesaurus for context-aware synonym substitution. This approach highlights the efectiveness of a hybridized model for maintaining meaning and fluency while achieving lexical and syntactic simplification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Simplification</kwd>
        <kwd>Semantic</kwd>
        <kwd>Syntactic</kwd>
        <kwd>Lexical</kwd>
        <kwd>Abstract Meaning Representation</kwd>
        <kwd>BERT-Masking</kwd>
        <kwd>Biomedical</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Increasing biomedical and clinical literature has led to a vast repository containing scientific and medical
information. While this data carries critical information in the studies, its linguistic complexity tends
to make it incomprehensible and inaccessible. These clinical reports, reviews, and other literature
contain complex semantics and domain-specific jargon that are usually at a graduate level or above.
Hence, in the biomedical domain, simplification helps make biomedical literature more accessible and
encourage better health literacy. However, the general simplification of sentences may pose significant
challenges in maintaining biomedical contexts, where subtle semantic deviations can have significant
consequences during interpretation.</p>
      <p>Recent advancements in the field of Natural Language Processing (NLP), especially in
transformerbased architectures such as BERT, T5, Pegasus, and SciBERT, have enabled progress in tasks such as
summarization and translation. In this study for SimpleText, we propose a comprehensive and modular
method for performing biomedical text simplification. Our proposed method is devised such that the
multidimensional nature of simplification, that is, the lexical, syntactical and semantic simplification
is performed in an integrated manner. Lexical simplification is performed by leveraging BERT-based
masked language modelling and WordNet filtering to replace complex terms with simpler synonyms.
Semantic and syntactic simplification was achieved using pre-trained models like Pegasus and T5 for
paraphrasing and sentence restructuring. And finally, intelligent output merging, which integrates
multiple simplification outputs based on readability, fluency, and semantic consistency using automated
metrics like SARI, BLEU, and FKGL. Additionally, for lexical simplification, we used an AMR–guided
mechanism. By extracting biomedical entities from AMR graphs and replacing them using WordNet
and masked language modelling with contextual scoring, we ensure that simplifications remain aligned
with the original meaning. Furthermore, the pipeline is optimized with several enhancements such
as frequency-aware synonym selection, context-sensitive scoring using pseudo-log-likelihoods, and
GPU-accelerated evaluation. Through SimpleText, we aim for more accessible biomedical content,
supporting not just researchers but also everyone involved in scientific and healthcare fields, and even
the general public in engaging with medical information more efectively.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Survey</title>
      <p>
        SimpleText 2025 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] track at CLEF 2025 is focused on simplification of Biomedical Text. Recently,
there have been numerous advances in the domain of biomedical text simplification. Attal et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
introduced the PLABA dataset with 7,600 sentence pairs that are distributed over 750 abstracts and
consist of complex and simplified sentence pairs, and Bakker et al. (2024) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed the Cochrane
dataset that is a large parallel corpus of Cochrane biomedical reviews and corresponding summaries that
are aligned at various textual levels. Basu et al. (2023) also proposed a crowdsourced dataset annotated
with transformation types that can be used for controlled simplification, while Ren et al. (2023) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
extended the Cochrane with MULTICHOCRANE that became the first multilingual simplification corpus
in the biomedical domain, although there was a remarkable performance gap with English models far
outperforming other languages.
      </p>
      <p>
        Text simplification methods originally included procedural methods but have now shifted majorly
towards data-driven neural network architectures, as highlighted in Ondov et al. (2022) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The
paper also underlined the challenges in text simplification such as low-resource settings and domain
sensitivity. Li et al. (2024) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced domain-specific biomedical simplification using several
Large Language Models (LLMs) including BART, T5, SciFive, and GPT-4. This research showcased
how meaning preservation remained a critical challenge despite good results. Knappich et al. (2023)
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] then introduced a novel Llama-2-based architecture that incorporated token-level loss weighting
for improved emphasis on simplification edits, while Flores et al. (2023) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] optimized output by
introducing unlikelihood training and ranked decoding. Phatak et al. (2022) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used an innovative
approach incorporating reinforcement learning to optimize simplicity in the TESLEA model. Gill et al.
(2023) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] pioneered rule-based simplification using a knowledge-based approach (KITS) to enhance
biomedical relation extraction.
      </p>
      <p>
        The T5 models have always been favoured in text simplification, whether it be T5-based lexical
simplification as demonstrated in Sheang et al. (2022) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or controllable simplification using T5 in
Saggion et al. (2021) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Lexical simplification methods could also be made context-aware, as shown
in Qiang et al. (2020) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which introduces LSBert. Transformer approaches to text simplification
generally involve embedding-based simplification, as demonstrated in Trucia et al. (2023) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        AMR-based structural subgraph extraction and simplification was explored in Yao et al. (2024) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and
showed better preservation on semantic fidelity compared to LLMs. Zaman et al. (2024) [ 18] combined
simplification with summarization using the SATS model for conciseness and clarity. Beauchemin et al.
(2023) [19] introduced MeaningBERT, a simplification metric that improves upon the limitations of
traditional metrics like BLEU and SARI.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The proposed S3 model is a hybrid system designed to perform simplification of biomedical text at
sentence level. It performs simplification at three distinct and complementary levels: sub-lexical,
structural (syntactic) and semantic. The model combines rule-based resources for simplifying the
structure of the sentence for greater understanding and replaces complex phrases at a tokenized level
with simpler alternatives. The pipeline improves readability and accessibility of complex tests using a
semantic approach where paraphrasing is done while preserving meaning and domain relevance. The
architecture of the pipeline is given in Fig 1</p>
      <p>The pipeline has five main phases- Preprocessing, Lexical Simplification using WordNet and BERT
Masking, Syntactic Simplification using a graph-based approach (AMR construction and parsing),
Semantic Simplification using T-5 paraphrasing models and finally merging of the outputs from the
lexico-structural and semantic branches. Each stage is modular and has its advantages that are hybridized
for balanced and simplified output.</p>
      <p>• Sub-Lexical Simplification focuses on simplifying complex phrases, words and terms with simpler
alternatives that are more readable. In the pipeline we use WordNet and UMLS Medical Thesaurus
for finding simpler alternatives to complex medical terms. The appropriateness of the word is
judged using BERT- Masking to select the most appropriate and simplified synonym for the
complex term.
• Structural Simplification uses Abstract Meaning Representation graphs to manipulate the
conceptual and syntactic of sentences. Once the graphs are created, we simplify the technical and
complex terms to prune the unnecessary subtrees and then use various graph parsing techniques
to reconstruct the sentence using AMR to Text generation. This step in hybridized strongly with
Lexical simplification to discover the important terms in the graph and send them only for lexical
substitution.
• Semantic Simplification is performed using a Pegasus (T5) model for sentence rewriting. It
understands the meaning of a sentence and rewrites it in simpler phrases to enhance fluency. T5
models introduce semantic transformations that complement the lexico- structural transformation.</p>
      <p>In the final stage we use various simplification metrics and gram-based overlap using evaluation
metrics such as FKGL and SARI. The Hybridisation system merges phrases and segments with highest
simplification eficacy by combining fine grained word level replacements with structural rearrangement
while ensuring fluency with high level paraphrasing ensuring semantic cohesion. The S3-Pipeline has a
balanced trade-of between structure, syntax, fluence and ensuring improved biomedical simplification.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Preprocessing Stage</title>
      <p>Preprocessing in our pipeline is critical due to the domain-specificity of the vocabulary and complex
sentence structures beacuse of the scientific nature of the text corpus. The document is split into
sentences using tokenization techniques. These sentences are then tokenized at a word level and
POS-tagged to facilitate with lexical replacement in the pipeline. The tokenization is performed using a
combination of SCIBERT tokenizer [20] (for domain specificity) and SpaCy for NLP pipeline integration.
This helps in simplifying the criteria for selecting candidates to be substituted as well as identifying the
semantic blueprints for the AMR graphs using the AMRlib parser. Other steps in preprocessing involve
removal of punctuation, lemmatization and citation markers.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Structural Simplification</title>
      <p>Structural simplification is implemented in the pipeline using Abstract Meaning Representation (AMR),
using a graphical method that is used to encode the semantics of a sentence independently of its syntax.
The focus is on reconfiguring the structure of a sentence by identifying and modelling their conceptual
scafolding through AMR graph. This stage involves using semantic representations to rewrite sentences
at the conceptual level, making the structure and complex terms simpler, while preserving the original
meaning of the sentence.</p>
      <sec id="sec-5-1">
        <title>5.1. AMR Graph Construction</title>
        <p>Each sentence is parsed using the AMRlib pipeline which uses a Transformed based encoder-decoder
architecture that is trained on the LDC AMR data to convert the sentence structure in a rooted, directed
and acyclic graph that represents the syntactic structure of the sentence. The Nodes in the AMR graph
represent the concepts that include entities, events and attributes while the edges represent the semantic
roles like agents, patients and modifiers as well as temporal relations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Concept Node Identification</title>
        <p>After creation the AMR graph is traversed to identify key concept nodes that are most likely to carry
added scientific complexity and require the most simplification. These include multiword scientific
terms like myocardial infarction, predicates that represent complex relations and compound nominal
modifiers.</p>
        <p>These nodes are prioritized on the following
principals• Graph role- This indicates whether they are the central arguments or are they peripheral modifiers
that do not contribute much to the meaning of the sentence.
• Domain specificity – This indicates whether they can be generalised or need thesaurus-based
domain specific substitution
• Lexical features- These features include NLP features like TF-IDF and relative importance across
documents.</p>
        <p>This approach ensures that the focus remains on the semantically complex part of the sentence that
contributes most to the meaning, rather than simplifying the supporting phrases.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Lexical Simplification within AMR Graphs</title>
        <p>We perform the lexical substitution using the AMR graphs only to ensure only the important phrases
in the concept nodes are substituted so that we have maximum eficiency. The content nodes already
store the most significant portions of the input text and thus require maximum simplification and by
integrating lexical substitution within AMR parsing, we can make easily identify key phrases.</p>
        <p>For each concept node in the AMR graph, the S-3 Pipeline:
• Identifies candidate lexical substitution phrases using Wordnet which is a lexical database that
stores words as sets of synonyms called synsets. Only the synsets that match the POS- tag of the
candidate phrase are used for the substitution i.e. nouns are replaced only by nouns and verbs
are replaced only by verbs etc.
• The Synonyms that contain overly complex and technical terms or archaic terms are filtered out
on basis of frequency and syllable complexity. They are also lemmatized to match the format of
the candidate phrase in AMR graphs.
• To select the best candidate amongst the set of candidates we use contextual evaluation using
BERT masked language model (MLM). In this model the original sentence is considered, and
the concept node is replaced with a [MASK]. The BERT MLM assesses the correctness of each
synonym in the place of [MASK] and assigns a SoftMax probability as a score. This score is a
–  is the set of candidate substitutions (from WordNet/UMLS or any othe rmedical thesaurus
or BERT top-k)
–  [→] is the sentence  with phrase  replaced by candidate 
– PLL(⋅)is defined in Equation 1
• Only the candidate synonyms that surpass both the simplicity threshold and the contextuality fit
test are retained. If none of the synonyms pass this threshold then the original node remains
unchanged to avoid loss of semantic meaning or fluency. Thus,</p>
        <p>Accept() = {</p>
        <p>True, if PLL( [→] ) &gt; PLL() ⋅ (1 + )
False, otherwise
(1)
(2)
(3)
combination of contextual fluency and grammatical correctness. Candidates with top-k scores
are prioritized and further analysed on basis of simplicity heuristics. We calculate the PLL score

=2</p>
        <p>PLL() = − ∑ log (  ∣  1, … ,  −1 )
–  = ( 1,  2, … ,   ) is the tokenized sentence
– (  ∣  &lt; ) is the token-level conditional probability from BERT
– PLL is the negative log-likelihood of the sentence under BERT’s masked language model
• The top replacement candidates are now ranked on basis of simplicity- contextuality score that
considers factors like – Word length based on syllabic complexity, psycholinguistic metrics like
Age of Acquisition (AoA), familiarity and number of meanings (homophones) and lastly the BERT
MLM score. The best candidate selection logic follows this formula:</p>
        <p>BestCandidate = arg max PLL( [→] )</p>
        <p>∈
as
follows</p>
        <sec id="sec-5-3-1">
          <title>Where:</title>
        </sec>
        <sec id="sec-5-3-2">
          <title>Where:</title>
          <p>Where:
–  is the improvement threshold (set to 0.05)
–  is the original sentence
–  [→] is the new sentence with candidate 
5.4. Graph Update and Node Substitution
• After a simplified synonym is selected for substituting the concept node, the AMR graph is updated
with the new lexical substitution. Updating the graph ensures that the simplification is embedded
directly within the semantic structure of the graph before we begin sentence reconstruction
enabling preservation of integrity connectivity of the graph.
• Each concept node in a graph is represented using a surface string and a unique identifier
variable. The pipeline locates the corresponding concept node in the Penman graph representation
(AMR) and replaces it with the simplified synonym. Example: Node (m/myocardia-infarction) is
simplified to (m/heart-attack).
• The multiword expressions that are used as replacements are made AMR compatible using
hyphens and underscores to ensure the tokens remain a single concept in the graph. The updated
term is lemmatized and normalised to fir AMR conventions. Special characters in the replacements
are reformatted.
• Existing semantic edges like :ARGx, :mod etc are attached to the node and are preserved to ensure
there is no alteration in the nides structural role in the graph and changes are only in the content
of the nodes. This enables us to maintain the semantic relations in the data and prevents errors
during sentence regeneration.
5.5. Surface Realization
• After the nodes of the AMR graph are simplified using node substitution, the final stage of this
step is surface realization to convert the modified semantic graph back into a natural language
sentence. The reconstructed sentence now contains the lexically simplified phrases.
• The sentence reconstruction is performed using AMR-to-text generation model based on a
sequence-to-sequence Transformer architecture. The model linearizes the AMR graph into a
string representation by using the PENAM notation and decodes it into well formatted text
while handling ensuring fluency using word ordering, grammatical correctness and insertion of
grammatical function words like determiners, prepositions etc.
• As only the node concepts were altered and the graph structure remained unaltered, the
regenerated sentence retain the full semantic fidelity of the original sentences that were used to model
the graph. Further the simplified terms are smoothly integrated adjusting tense and agreement
as needed.
• Handling multiword expressions involves creation and alteration of the surrounding structure in
the graph to ensure fluency. Example- For a replacement like “heart attack”, the model contextually
analyses the sentence and adjusts the surrounding to “sufering from a heart attack”, depending
on the surrounding structure of graph. The reconstructed sentence serves as the output of this
stage.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Semantic Simplification</title>
      <p>Parallelly to the lexico-syntactic processing, the S-3 Pipeline performs semantic-level simplification
which involves transforming complex sentences into simpler versions by rephrasing and rewriting them
at a high level of abstraction. We use Transformer-based encoder-decoder architecture, specifically a
ifne-tuned variant of the PEGASUS model for the paraphrasing. In this stage, we holistically work on
the sentence meaning and capture the paraphrastic variation in the sentence to understand its core
meaning. Once the meaning of the sentence (or intent) is understood, the model uses its knowledge
base in natural language to find simplifications that express the intent while simplifying the sentence
using more frequent vocabulary.</p>
      <p>Pegasus is integrated in the pipeline to be used as a high-capacity neural simplifier, where its
behaviour is guided through generation hyperparameters, and post-hoc evaluation heuristics. It enables
the contextually natural rewriting of biomedical text. To maintain content fidelity, we employ controlled
decoding strategies like beam search with Width 5 to ensures a diverse yet focused exploration of
possible rewrites, maximum Length Constraint to avoid verbosity and enforce lexical economy and
early stopping to halt generation once a high probability sequence is reached.</p>
      <p>While the AMR-guided lexical and structural simplification phases help in improving precision and
transparency, the semantic phase introduces holistic fluency which is missing in rule-based substitution
systems. The deeper processing of core concepts is handled by the AMR- guided lexical approach
while sentence structure and semantic simplification is handled by the Pegasus approach. This avoids
unnecessary rewordings and allows Pegasus to focus on maximizing readability. The final output of
the S-3 pipeline recognizes the individual strengths of the semantic approach and uses it for sentence
framing, while lexico- graph output is used for phrase replacement.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Hybridization and Merging</title>
      <p>To capture both structural fluency and semantic precision in simplification the S-3 pipeline adopts a
hybrid simplification strategy that merges outputs from Pegasus-based semantic simplification and
AMR-based structural-lexical rewriting. Both Pegasus and AMR-based models ofer distinct advantages
that can be used to improve quality of
simplification• Pegasus is trained for paraphrasing and abstractive summarization and is adept at capturing
semantic intent, paraphrasing complex constructions, and producing fluent output, but it struggles
with over generalization and occasionally omits critical domain specific information. Sometimes
if not tuned properly to the domain, it may oversimplify the document.
• AMR-based simplification, on the other hand, performs simplification in through structural and
rule-based approach, making it structurally consistent and lexically transparent. It excels at
preserving critical terms and decomposing them into simpler phrases, but its output may sound
less fluent due to sentence reconstruction limitations.</p>
      <p>The goal of merging is to combine the best of both approaches: Pegasus for semantic abstraction
and fluency, and AMR-lexical simplification for domain faithfulness and structural precision. In most
cases the lexical substitute candidate provides simpler phases but leaves the clause structure a bit rigid
whereas the semantic candidate can rephrase more freely.</p>
      <p>The fusion model must be fed the lexical and semantic inputs and must explicitly know which input
did a token come from. This is done by using &lt;L&gt; and &lt;S&gt; tags for Lexical and Semantic tokens
respectively and build an input sequence like Lexical: &lt;L&gt; All tokens &lt;L: &lt;S&gt;Semantic&lt;S&gt; The model’s
encoder builds contextual embeddings that intermix information from Lexical and Semantic tokens.
During decoding, when the model generates token yt, its attention heads can attend jointly to simplifying
synonyms and fluent restructuring. Thus the merging works in three steps</p>
      <p>A multi‐source sequence‐to‐sequence generator (e.g., T5) implements the following abstract process:
• Encoding: Each token from either stream is mapped to a vector representation. As the tokens
appear in the fusion prompt the encoder can attend across them.
• Contextualization: The model learns what parts of the &lt;L&gt; and &lt;S&gt; should be combined using
stacked elf attention layers. It takes simplified phrases from &lt;L&gt; and structure from &lt;S&gt;. The
models attention heads can link these two.
• Decoding: The decoding layer consults the combined encoded representation and decides what
token is to be emitted (Ex: Copies, re-interpreted phases or unnecessary connectives). Greedy
decoding is used via beam search to select top probable next tokens.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Evaluation and Selection</title>
      <p>Though the fusion step itself produces a merged candidate M, the system compares MM against
the individual outputs of semantic and lexical states to cover cases where strong vocabulary-based
replacements or meaning based replacements might be more optimal.</p>
      <sec id="sec-8-1">
        <title>8.1. Metrics Used:</title>
        <p>1. BLEU (Bilingual Evaluation Understudy)</p>
        <p>BLEU measures n-gram overlap between two sentences. It counts the unigrams, bigrams, n-grams,
etc., that occur in both the candidate and the reference. We use BLEU to compare the lexical
and semantic simplifications to estimate consistency and mutual agreement. A high BLEU score
indicates that the simplifications are consistent.
2. SARI (System output Against References and Input)</p>
        <p>SARI evaluates how well the simplified sentence has added, deleted, or retained content relative
to the complex input sentence as well as reference simplifications. It measures the quality of
additions, keeps, and deletions.</p>
        <p>• When evaluating the lexical output, the Pegasus output is treated as the reference.
• For the semantic output, the AMR output is used as the reference.
• For the final merged output, both Pegasus and AMR outputs are treated as references along
with the original sentence.</p>
        <p>A high SARI score ensures a good balance between information retention and simplification.
3. FKGL (Flesch–Kincaid Grade Level)</p>
        <p>FKGL is used to measure the readability of the simplified sentence. It specifically indicates how
many years of schooling are required to understand the sentence. It is based on the number of
syllables and word length. A lower FKGL score indicates lower complexity.
4. Length Ratio</p>
        <p>This is simply the ratio of the length of the simplified sentence to the original complex sentence.
Ideally, a ratio between 0.65 and 0.85 is desirable to maintain a balance between over-simplification
and verbosity.
5. LM Fluency Score (Language Model Perplexity)</p>
        <p>This measures the fluency of a sentence according to a pretrained language model. It uses
pseudoperplexity (the negative log-likelihood of a token given the previous tokens). A lower perplexity
score indicates a more fluent sentence. We evaluate the fluency of each token to avoid generating
ungrammatical sentences.</p>
      </sec>
      <sec id="sec-8-2">
        <title>8.2. Weighted Scoring</title>
        <p>We aggregate these metrics into one single metric called “Universal Metric”. Whichever candidate has
the lowest universal score is selected. During several runs we found the hybrid output to be the most
likely (84%) to be selected. A simplification candidate with high BLEU score, lower FKGL score, Length
Ratio around 0.75 (to avoid aggressive overcutting and verbosity) and a high SARI score is adjudged to
be a good simplification.</p>
        <p>Score = 0.20 ⋅ (1 − BLEU) + 0.20 ⋅ (</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9. Results</title>
      <sec id="sec-9-1">
        <title>9.1. Task 1.1: Sentence-Level Simplification</title>
        <p>We evaluated our S-3 pipeline on the provided test datasets and assessed the performance using the
metrics we defined above. We also calculated change in FKGL ( ΔFKGL) to understand the improvement
in meaning using our model and ROUGE-L.</p>
        <p>As shown in Table 1, the system produces consistent and medically accurate simplifications. The
dataset has highest ΔFKGL, indicating that it has high level of efective simplifications. In Biomedical
systems a SARI of around 15 to 20 is deemed desirable to preserve meaning during simplification while
maintaining terminology accuracy and not rephrasing medical terms. The stable SLR scores indicate
structural and lexical integrity.</p>
      </sec>
      <sec id="sec-9-2">
        <title>9.2. Task 1.2: Paragraph-Level Simplification</title>
        <p>We further evaluated our S-3 model on paragraph-level input under Task 1.2. These results show that
our model maintains strong simplification quality while preserving essential biomedical content over</p>
      </sec>
      <sec id="sec-9-3">
        <title>9.3. Result Analysis</title>
        <p>• FKGL scores (Refer Figure 2) remained consistently around 10. This suggests that the output is
simple enough to be understood by non-domain experts like patients and general readers, while
still preserving the source content. An FKGL around 10 is considered optimal for the biomedical
domain, as excessive simplification could lead to a loss of precision.
• ΔFKGL scores, ranging from 2.5 to 3.1, indicate a substantial improvement in readability across
both subtasks. This increase—especially in the paragraph dataset—demonstrates that the S-3
pipeline is highly efective at simplifying longer and more complex sentences without
compromising their meaning.
• SARI scores of around 17 across both tasks are ideal for the biomedical domain. While modest
compared to general-domain simplification, they are appropriate here because aggressive editing
is discouraged. A SARI range of 16–20 reflects a balanced simplification that maintains semantic
integrity.
• SLR scores of approximately 0.75 show that although lexical simplification is applied, essential
terminology, abbreviations, and structural elements are retained. This minimizes the risk of
factual drift and preserves key biomedical meaning.
• Compression ratio (CR) values mostly fall within the ideal range of 0.65 to 0.85. This ensures
that important information is not lost and the simplified output is not overly verbose. The
moderate compression supports reduced complexity while maintaining content integrity. The
stability of CR across datasets and tasks indicates controlled and consistent simplification.
• MeaningBERT scores (Table 4) are especially high—around 0.8—for paragraph-level
simplification. This suggests the model excels at simplifying large or compound sentences. Combined
with a solid compression ratio, this confirms that the S-3 pipeline performs well in preserving
semantic clarity while reducing linguistic complexity.
10. Conclusion
In this paper we presented the S-3 Pipeline that is a hybrid system for biomedical text simplification
that seamlessly integrates semantic, sub-lexical and structural simplification through T-5 models,
AMR- parsing and pruning and BERT- guided lexical substitution. We evaluated the results across
multiple metrics like FKGL, SARI and MeaningBERT that demonstrated the pipeline achieves a strong
balance between information retrieval, simplification, conciseness and readability. The S-3 pipeline is a
promising tool that can be used to improve accessibility and comprehension of bio-medical literature.
Future work may explore tighter integration parameters for the structural and lexical modules and
extending the system for multilingual biomedical texts.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools for the research and while writing this paper.
1 is partially generated using Generative-AI tools.
of llm, in: Proceedings of TextGraphs-17, Association for Computational Linguistics, 2024, pp.
105–115.
[18] F. Zaman, F. Kamiran, M. Shardlow, S.-U. Hassan, A. Karim, N. R. Aljohani, Sats: Simplification
aware text summarization of scientific documents, Frontiers in Artificial Intelligence 7 (2024)
1375419. doi:10.3389/frai.2024.1375419.
[19] D. Beauchemin, H. Saggion, R. Khoury, Meaningbert: assessing meaning preservation between
sentences, Frontiers in Artificial Intelligence 6 (2023) 1223924.
[20] I. Beltagy, A. Cohan, K. Lo, Scibert: Pretrained contextualized embeddings for scientific text, CoRR
abs/1903.10676 (2019). URL: http://arxiv.org/abs/1903.10676. arXiv:1903.10676.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Clef 2025 simpletext track: Simplify scientific text (and nothing more)</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025:
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText track: Simplify scientific texts (and nothing more)</article-title>
          , in: J. Carrillo de Albornoz,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Attal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ondov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <article-title>A dataset for plain language adaptation of biomedical abstracts</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>8</article-title>
          . URL: https://doi.org/10.1038/s41597-022
          <article-title>-01920-3</article-title>
          . doi:
          <volume>10</volume>
          . 1038/s41597- 022- 01920- 3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Cochrane-auto: An aligned dataset for the simplification of biomedical abstracts</article-title>
          , in: M.
          <string-name>
            <surname>Shardlow</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Saggion</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Alva-Manchego</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zampieri</surname>
          </string-name>
          , K. North, S. Štajner, R. Stodden (Eds.),
          <source>Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kazanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Reina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Ramanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <source>Multilingual simplification of medical texts</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.12532. arXiv:
          <volume>2305</volume>
          .
          <fpage>12532</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ondov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Attal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <article-title>A survey of automated methods for biomedical text simplification</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>29</volume>
          (
          <year>2022</year>
          )
          <fpage>1976</fpage>
          -
          <lpage>1988</lpage>
          . doi:
          <volume>10</volume>
          .1093/jamia/ocac149.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belkadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Micheletti</surname>
          </string-name>
          , L. Han,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          , G. Nenadic,
          <article-title>Large language models for biomedical text simplification: Promising but not there yet</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2408. 03871. arXiv:
          <volume>2408</volume>
          .
          <fpage>03871</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Knappich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          , Boschai @ plaba
          <year>2023</year>
          :
          <article-title>Leveraging edit operations in end-to-end neural sentence simplification</article-title>
          ,
          <source>arXiv preprint arXiv:2311</source>
          .
          <year>01907</year>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L. J. Y.</given-names>
            <surname>Flores</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chheang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <article-title>Medical text simplification: Optimizing for readability with unlikelihood training and reranked beam search decoding</article-title>
          ,
          <source>in: Findings of EMNLP</source>
          <year>2023</year>
          , ACL,
          <year>2023</year>
          , pp.
          <fpage>4859</fpage>
          -
          <lpage>4873</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Phatak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Savage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ohle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mago</surname>
          </string-name>
          ,
          <article-title>Medical text simplification using reinforcement learning (teslea): Deep learning-based text simplification approach</article-title>
          ,
          <source>JMIR Medical Informatics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <article-title>e38095</article-title>
          . doi:
          <volume>10</volume>
          .2196/38095.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chetty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hallinan</surname>
          </string-name>
          ,
          <article-title>Knowledge-based intelligent text simplification for biological relation extraction</article-title>
          ,
          <source>Informatics</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <article-title>89</article-title>
          . doi:
          <volume>10</volume>
          .3390/informatics10040089.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>K. C. Sheang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ferrés</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          ,
          <article-title>Controllable lexical simplification for English</article-title>
          , in: S. Štajner,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shardlow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sheang</surname>
          </string-name>
          , K. North,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          , W. Xu (Eds.),
          <source>Proceedings of the Workshop on Text Simplification</source>
          , Accessibility, and
          <string-name>
            <surname>Readability</surname>
          </string-name>
          (TSAR-
          <year>2022</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi,
          <source>United Arab Emirates (Virtual)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>206</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .tsar-
          <volume>1</volume>
          .19/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .tsar-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>K. C. Sheang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          ,
          <article-title>Controllable sentence simplification with a unified text-to-text transfer transformer</article-title>
          ,
          <source>in: Proceedings of the 14th International Conference on Natural Language Generation</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>341</fpage>
          -
          <lpage>352</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .inlg-
          <volume>1</volume>
          .
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Lsbert: A simple framework for lexical simplification</article-title>
          , CoRR abs/
          <year>2006</year>
          .14939 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>C.-O. Truică</surname>
            ,
            <given-names>A.-I.</given-names>
          </string-name>
          <string-name>
            <surname>Stan</surname>
          </string-name>
          , E.-S. Apostol,
          <article-title>Simplex: a lexical text simplification architecture</article-title>
          ,
          <source>CoRR abs/2304</source>
          .07002 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Guzhva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Barbosa</surname>
          </string-name>
          ,
          <article-title>Semantic graphs for syntactic simplification: A revisit from the age</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>