<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. A. Dongre);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>NITK SCaLAR Lab at the CLEF 2025 SimpleText Track: Transformer-Based Models for Biomedical Sentence Simplification (Task 1.1)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arya Adwait Dongre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankita Vaadiraaju</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anand Kumar Madasamy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology, National Institute of Technology Karnataka Surathkal</institution>
          ,
          <addr-line>Mangalore 575025</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>This paper presents the participation of the SCaLAR Lab from the National Institute of Technology Karnataka Surathkal (India) in the CLEF 2025 SimpleText Lab. Biomedical texts are often dificult to understand due to complex vocabulary and sentence structures, which limit access to crucial scientific information for non-expert audiences. Making biomedical literature more accessible, we propose two transformer-based simplification pipelines: one combining BioBERT and BioBART with prompts providing definitions, and another using a ifne-tuned GPT-2 Medium model for direct simplification. Our dual approach demonstrates efective reduction of lexical and syntactic complexity while preserving medical accuracy, supporting clearer communication and laying the foundation for future work in multilingual and hybrid simplification systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Simplification</kwd>
        <kwd>Transformers</kwd>
        <kwd>Biomedical</kwd>
        <kwd>BioBERT</kwd>
        <kwd>BioBART</kwd>
        <kwd>GPT-2</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The CLEF 2025 SimpleText Track aims to make scientific information more accessible by developing
automatic text simplification systems that keep facts correct while reducing linguistic complexity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Scientific literature, especially in the biomedical field, often uses dense vocabulary, complicated sentence
structures, and technical jargon that pose major challenges for non-experts like patients, caregivers, and
healthcare professionals outside specialized areas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These language barriers hinder the fair spread of
important medical knowledge, increasing the risk of misunderstandings or misinformation that can
ultimately afect healthcare outcomes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To address these issues, the CLEF 2025 SimpleText Track invites research on natural language
processing (NLP) systems that can transform complex biomedical texts into simpler versions while
preserving crucial information such as dosages, biomarkers, treatment protocols, and statistical results
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Automatic text simplification not only improves readability but also supports informed
decisionmaking and promotes inclusivity by bridging the gap between experts and the wider public [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The SCaLAR Lab from the National Institute of Technology Karnataka Surathkal (India) participated
in CLEF 2025 SimpleText Task 1.1. Our team focused on leveraging large language models (LLMs) for
the task, mainly exploring three models: BioBERT, BioBART, and GPT-2.</p>
      <p>At the heart of recent advances in text simplification are transformer-based models, especially BERT
(Bidirectional Encoder Representations from Transformers) [7], which excels at capturing contextual
relationships in sentences using self-attention. However, general-purpose models like BERT often
struggle with domain-specific biomedical terminology. This gap is filled by BioBERT, a version of
BERT pre-trained on biomedical texts like PubMed abstracts and clinical notes [8], giving it an edge in
understanding specialized terms while maintaining their meaning.</p>
      <p>For generating simplified text, encoder-decoder architectures like BART (Bidirectional and
AutoRegressive Transformers) [9] have become popular in text-to-text tasks. BioBART, which extends
BART with biomedical pretraining, can rewrite complex sentences by aligning original tokens with
simplified ones through cross-attention, producing clear and accurate simplified texts [ 10]. Additionally,
generative models like GPT-2 [11], even though pre-trained on general data, provide strong baselines
for direct simplification when fine-tuned on pairs of complex and simplified biomedical sentences.</p>
      <p>In this work, we present two complementary approaches to biomedical text simplification for CLEF
2025 SimpleText Task 1.1. The first combines BioBERT and BioBART with entity recognition from
SciSpacy in a multi-step pipeline: first identifying sentences that need simplification, then detecting
and defining jargon, and finally using BioBART to generate simplified outputs. The second approach
ifne-tunes a GPT-2 Medium model on complex-to-simple sentence pairs to enable direct, end-to-end
simplification without intermediate steps. Both methods aim to simplify text while preserving key
medical concepts, avoiding oversimplification that could compromise accuracy or omit essential details,
such as information about control groups in clinical trials [12].</p>
      <p>
        We use the dataset provided in the cited paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for training, validation, and testing. It comprises
11,510 training, 1512 test samples and 1697 validation samples each containing complex-simple sentence
pairs annotated with labels such as rephrase, split, merge, ignore, and delete. Columns include complex
(original sentence), simple (simplified sentence), label (transformation type), para_id, sent_id, doc_pos,
doc_quint, and doc_len, capturing sentence positions and document context. Derived from Cochrane
systematic reviews, the dataset covers diverse biomedical topics, providing detailed metadata that
enables models to learn context-sensitive simplification strategies to transform technical medical text
into lay-friendly language. The rest of this paper is organized as follows: we first review relevant
literature on biomedical text simplification and transformer-based NLP models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][8], then describe our
methodology, including data preprocessing, model fine-tuning, and pipeline design. Finally, we present
experimental results, analyze and infer, and conclude with insights and future directions for improving
biomedical text simplification.
      </p>
      <p>Our experimental results show that the fine-tuned GPT-2 Medium model produced more readable
and semantically accurate sentence-level simplifications compared to the BioBERT + BioBART pipeline.
The GPT-2 approach efectively preserved essential biomedical concepts while simplifying vocabulary
and sentence structure for easier understanding by non-experts, highlighting the promise of direct
generative models fine-tuned on biomedical text pairs for accessible and reliable sentence simplification.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>The CLEF 2025 SimpleText Track tackles the important challenge of making scientific texts easier to
understand while keeping them accurate. This section reviews key methods and recent progress in
natural language processing (NLP) for simplifying biomedical texts, with a focus on domain-specific
models, architectures, and evaluation techniques that help ensure clear and precise communication.
Text simplification is a vital NLP task that turns complex, jargon-filled writing into more accessible
language, helping a wider range of people—including patients, caregivers, non-native speakers, and
those without technical backgrounds—better understand important medical and scientific information.</p>
      <p>The development of text simplification methods progressed from rule-based to more sophisticated,
data-driven methods, the latter driven by the emergence of machine and deep learning. Conventional
methods depended on pre-defined linguistic rules for text simplification, but these methods struggled to
scale and generalize over a wide range of text types, languages and domains. In contrast, contemporary
deep learning methods, especially using neural networks and large-scale pre-trained models, have
shown great success by learning intrinsic simplification patterns from large volumes of data. The latest
developments in NLP, specifically with transformer-based models like BERT, GPT, and T5, have created
new opportunities for text simplification. These models allow for more context-dependent, flexible
simplifications beyond basic lexical or syntactic modifications. They are capable of more advanced
transformations and maintaining meaning, while keeping content readable at varying levels[13].</p>
      <p>This literature review traces the evolution of deep learning approaches in text simplification, outlining
key techniques, challenges, and evaluation strategies. It emphasizes the shift from rule-based systems
to neural methods, highlighting recent advances such as large-scale language models, the use of parallel
corpora, and the dificulty of preserving semantic meaning. By reviewing current research, it identifies
emerging trends and unresolved issues in the development of simplification systems [14].</p>
      <p>Transformer-based models like T5 and BioBART are especially efective for simplification. T5
treats it as a sequence-to-sequence task, with the encoder handling complex input and the decoder
generating simpler text. BioBART, pretrained on biomedical texts, excels at managing technical content
and preserving essential details like dosages and biomarkers through cross-attention mechanisms.
Additionally, GPT-2 Medium—a decoder-only model—shows strong fluency when fine-tuned, efectively
rephrasing and simplifying text despite lacking explicit source-target alignment. These models illustrate
how deep learning facilitates simplification that balances readability, coherence, and factual accuracy
[15][16].</p>
      <p>The quality of text simplification is typically evaluated using a combination of automated metrics.
SARI, a widely adopted metric in simplification research, assesses the quality of additions, deletions,
and retentions by comparing system output with human references. It is particularly efective in
evaluating whether unnecessary complexity has been reduced without losing essential content. In
parallel, readability metrics such as the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease
(FKE) are employed to quantify how accessible the output is to a general audience. FKGL estimates the
educational level required to comprehend the text, while FKE provides a score indicating overall ease of
reading.</p>
      <p>Biomedical text simplification introduces specific challenges due to the complexity and density of
scientific language. Long-range dependencies, context-sensitive terminology, and the need to preserve
domain-critical information—such as drug mechanisms or statistical qualifiers—demand more than
surface-level simplification. Transformer-based models, while powerful, are limited by token constraints
that may truncate important content. To address these issues, hierarchical encoding strategies and
domain-adaptive pretraining are increasingly used. Additionally, hybrid architectures that integrate
rule-based components with neural models have shown promise in preserving meaning while enhancing
readability [17].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Biomedical simplification is an important NLP task that aims to make complicated scientific information
easier to understand for patients, caregivers, and medical professionals. Biomedical texts often include
lengthy sentences and specialized terms that can be hard to grasp without expert knowledge. To tackle
this problem, we developed two separate methods using transformer models. The first method uses
BioBERT to find complex sentences. It combines this with definitions from SciSpacy and then uses
BioBART to simplify the text while keeping its meaning intact. The second method fine-tunes GPT-2
Medium directly on a biomedical simplification dataset to create clearer sentence-level translations
from prompts. These two methods use both specific and general models to ofer flexible and efective
solutions for simplifying biomedical text. The following sections explain the model designs, training
methods, and system integration in detail.</p>
      <sec id="sec-3-1">
        <title>3.1. Classification with BioBERT</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. BioBERT Architecture</title>
          <p>BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) is
a domain-specific adaptation of BERT, a transformer-based model developed by Devlin et al. (2018).
BERT’s architecture is built on the transformer encoder, which uses self-attention mechanisms to
capture bidirectional contextual relationships within text. BioBERT extends this by pre-training on
large-scale biomedical corpora, including PubMed abstracts (approximately 4.5 billion words) and PMC
full-text articles (approximately 13.5 billion words), in addition to the general-domain corpora used for
BERT (e.g., Wikipedia and BooksCorpus). This pre-training equips BioBERT with a deep understanding
of biomedical terminology, syntactic patterns, and semantic nuances, making it ideal for tasks in the
biomedical domain.</p>
          <p>The model architecture consists of several key components. The input embedding layer converts
input tokens into dense vectors by combining token embeddings, positional embeddings (to capture
word order), and segment embeddings (which distinguish diferent sentences in tasks like question
answering). Since our task uses only single-sentence inputs, segment embeddings are uniform. The
transformer encoder layers in BioBERT (base model) include 12 layers of encoders. Within each
layer, the multi-head self-attention mechanism computes attention scores across all tokens in the
input sequence, allowing the model to weigh the importance of each token relative to others. This
enables capturing long-range dependencies and contextual relationships critical for understanding
complex biomedical sentences. Following the attention mechanism, feed-forward neural networks
apply a position-wise fully connected feed-forward network to each token’s representation, introducing
non-linearity and enhancing feature extraction. Layer normalization and residual connections
are used to stabilize training by normalizing layer outputs and adding skip connections that preserve
information flow across layers. Finally, the output layer for classification tasks takes the final hidden
state of the special [CLS] token (added to the input sequence) and passes it through a fully connected
layer with a softmax activation to produce a probability distribution over the output classes.</p>
          <p>In our pipeline, BioBERT is fine-tuned for binary classification to label complex sentences as either
rephrase or delete. Sentences labeled rephrase contain valuable content requiring simplification, while
delete indicates redundancy or irrelevance. Additional labels—merge, split, none, and ignore—are treated
as rephrase, as they imply restructuring rather than removal.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Classification Process</title>
          <p>The simplification process starts with tokenization, where the complex sentence is broken down using
BioBERT’s WordPiece tokenizer. This tokenizer splits words into smaller subword units—for example,
“cardiovascular” might become “cardi##” and “##ovascular.” This approach helps ensure compatibility
with BioBERT’s vocabulary and allows the model to handle words it hasn’t seen before. Special tokens,
[CLS] at the beginning (used for classification) and [SEP] at the end (marking sentence boundaries), are
added, resulting in a sequence like: [CLS] token1 token2 ... tokenN [SEP].</p>
          <p>Next comes embedding generation, where the tokenized sequence is transformed into input
embeddings by combining token, positional, and segment embeddings. These embeddings are then fed
into BioBERT’s transformer layers. In the contextual representation step, the transformer processes
the sequence to generate rich, contextualized representations for each token. The final hidden state of
the [CLS] token captures the overall meaning of the entire sentence.</p>
          <p>For the classification stage, this [CLS] representation is passed through a fully connected layer
with two outputs, one for each possible action: “rephrase” or “delete.” A softmax activation produces
probabilities for these classes, and the one with the highest probability becomes the prediction.</p>
          <p>Finally, in the output decision step, if the model predicts “delete,” the system outputs an empty
string, ending processing for that sentence. If it predicts “rephrase,” the sentence moves on to the next
stage for further simplification.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Fine-Tuning BioBERT</title>
          <p>Fine-tuning BioBERT on our labeled dataset involves adapting the pre-trained model to classify
biomedical sentences as either “rephrase” or “delete.” The process begins with dataset preparation, where the
training set contains pairs of complex sentences and their corresponding labels, while the validation
set is used to monitor performance and adjust hyperparameters. The loss function employed is
cross-entropy loss, which measures the discrepancy between predicted and true labels and guides the
optimization process. For optimization, the AdamW optimizer is used with a learning rate typically set
between 2e-5 and 5e-5; a linear learning rate scheduler with warmup is applied to help stabilize training.
To reduce the risk of overfitting, regularization techniques such as dropout (with a probability of 0.1)
within the transformer layers and weight decay are implemented. The model is trained for 3–5 epochs,
using early stopping based on validation performance to further prevent overfitting. Finally, a batch
size of 16 or 32 is chosen to strike a balance between computational eficiency and stable gradient
updates.</p>
          <p>Fine-tuning enables BioBERT to learn task-specific patterns, such as identifying sentences with
redundant technical details (e.g., methodological specifics irrelevant to a lay audience) versus those
containing critical information (e.g., treatment outcomes) that should be rephrased.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Rephrasing with BioBART</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. BioBART Architecture</title>
          <p>BioBART is a biomedical adaptation of BART, a sequence-to-sequence transformer model developed by
Lewis et al. (2019). BART combines BERT’s bidirectional encoding with GPT’s autoregressive decoding,
making it ideal for text generation tasks like rephrasing. BioBART is pre-trained on biomedical corpora,
ensuring familiarity with domain-specific language.</p>
          <p>The model architecture consists of an encoder and a decoder. The encoder is a bidirectional
transformer, similar to BERT, that processes the input sequence—the complex sentence—to generate
contextualized representations. It uses 6 layers in the base model, with each layer containing multi-head
self-attention and feed-forward networks. The decoder is an autoregressive transformer that generates
the output sequence—the rephrased sentence—token by token. Like the encoder, the decoder has 6
layers and incorporates cross-attention mechanisms to attend to the encoder’s outputs, ensuring that the
generated text remains conditioned on the input. Embedding layers with shared token embeddings are
used by both the encoder and decoder, complemented by positional embeddings that capture word order;
the vocabulary is specifically tailored to biomedical text. Finally, in the output layer, the decoder’s
ifnal hidden states are passed through a linear layer followed by a softmax activation to predict the
probability distribution over the vocabulary for each token, enabling text generation.</p>
          <p>BioBART’s pretraining involves a denoising objective, reconstructing corrupted text (e.g., with
masked tokens or shufled sentences), and equipping it with robust language modeling capabilities. For
our task, BioBART is fine-tuned to map complex biomedical sentences to their rephrased counterparts.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Rephrasing Process</title>
          <p>The rephrasing process begins with input tokenization, where the complex sentence identified by
BioBERT as needing simplification is broken down into tokens using BioBART’s tokenizer. The resulting
sequence is padded or truncated to a consistent length, such as 128 tokens, to ensure uniform input
size. In the encoding stage, this tokenized sequence is passed through BioBART’s encoder, which
produces contextualized representations that capture both the meaning and structure of the original
sentence. Next, during decoding, the rephrased sentence is generated one token at a time, starting with
a start-of-sequence token. At each step, the decoder uses cross-attention to incorporate information
from the encoder’s output—so the generated text stays faithful to the original—and self-attention to
maintain coherence by referencing the tokens it has already produced. The next token is chosen by
sampling from the probability distribution over the vocabulary, with techniques like beam search used
to improve the quality of the rephrased sentence. Finally, in the output generation step, the sequence
of generated tokens is detokenized to produce the final rephrased sentence.</p>
          <p>This output aims to express the original idea in simpler vocabulary and clearer sentence structures,
while preserving the intended meaning.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Fine-Tuning BioBART</title>
          <p>Fine-tuning BioBART involves training the model on pairs of complex and simplified sentences from
the dataset, where the label for each pair is “rephrase.”</p>
          <p>The process begins with dataset preparation, in which the training set provides complex-simple
sentence pairs, using the complex sentence as input and the simple sentence as the target, while the
validation set supports hyperparameter tuning. The loss function used is negative log-likelihood,
which measures the diference between the generated and target sequences. For optimization, the
AdamW optimizer is applied with a learning rate typically between 3e-5 and 1e-4, along with a
linear learning rate scheduler to adjust learning dynamics during training. To improve generalization,
regularization techniques such as dropout (with a rate of 0.1) and label smoothing are included. During
training, teacher forcing is employed by feeding the ground-truth tokens directly to the decoder
to stabilize learning, and during inference, beam search with a beam size of 5 is used to enhance
the quality of generated sentences. The model is trained for 4–6 epochs, using early stopping based
on validation loss to avoid overfitting. Finally, a batch size of 8 or 16 is selected to balance memory
constraints with training stability.</p>
          <p>Fine-tuning enables BioBART to learn rephrasing strategies, such as replacing technical terms (e.g.,
“myocardial infarction” with “heart attack”), splitting long sentences, and removing unnecessary details.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Prompt Engineering for BioBART</title>
          <p>To enhance BioBART’s rephrasing performance, we employ a prompt engineering strategy that
provides structured, context-rich input to guide the model. The prompt is designed to simulate the
role of a medical assistant, ensuring the rephrased output is clear, accurate, and tailored to non-expert
audiences. To investigate how prompt design afects biomedical text simplification, we tested three
diferent prompting strategies with BioBART: (1) a role-only prompt, where the model was told to act
only as a medical assistant (e.g., “You are a medical assistant. Simplify: sentence”); (2) a role with terms
prompt, which included a list of relevant medical terms along with the role instruction; and (3) a role
with terms and definitions prompt, which also added brief denfiitions of the key medical terms. The
results showed that using both terms and definitions (strategy 3) produced the best performance, with
improvements of about 8 to 12 % in BLEU and SARI scores compared to the simpler prompt versions.
Human readability assessments confirmed these quantitative improvements. The prompt structure is as
follows:
You are a medical assistant. Your task is to simplify a complex medical
sentence to make it understandable for non-experts.</p>
          <p>Complex sentence: complex sentence
Important terms with definitions:
NER1: Definition
NER2: Definition
...</p>
          <p>Simplify the complex sentence using the given information.</p>
          <p>The prompt used to guide BioBART is carefully structured to ensure the generated simplifications
are clear and patient-friendly. It starts with a role specification , using the instruction “You are a
medical assistant” to set the right context and encourage a professional yet empathetic tone. Next, the
task description clearly defines the goal with the directive “simplify a complex medical sentence to
make it understandable for non-experts,” emphasizing the importance of accessibility. The complex
sentence section simply replaces the placeholder {complex sentence} with the actual sentence identified
by BioBERT for rephrasing. In the important terms with definitions part, key named entities—such
as medical jargon or anatomical terms—are listed along with easy-to-understand definitions, helping
BioBART substitute technical language with simpler alternatives. Finally, the simplification
instruction reiterates the task, reminding BioBART to use the provided definitions to produce a clear, rephrased
version of the sentence that preserves the original meaning while making it easier for non-experts to
understand.</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>3.2.5. Named Entity Recognition with SciSpacy</title>
          <p>To identify important terms and their definitions, we use SciSpacy, a Python library with pre-trained
biomedical NER and entity linking models. First, SciSpacy’s NER extracts key entities like diseases or
treatments from the complex sentence (e.g., “myocardial infarction,” “angioplasty”). Then, in entity
linking, these entities are connected to UMLS to retrieve standardized definitions, or to simpler lay
definitions from a custom dictionary when needed—for example, “myocardial infarction” becomes “a
heart attack, where blood flow to the heart is blocked,” and “angioplasty” becomes “a procedure to open
blocked heart vessels.” Finally, the extracted entities and their definitions are inserted into the prompt
template, giving BioBART the context it needs for generating clear, simplified rephrasings.</p>
          <p>Example Prompt:
You are a medical assistant. Your task is to simplify a complex medical
sentence to make it understandable for non-experts.</p>
          <p>Complex sentence: Post-myocardial infarction, the patient underwent
angioplasty to restore coronary blood flow.</p>
          <p>Important terms with definitions:
Myocardial infarction: A heart attack, where blood flow to the heart is
blocked, causing heart muscle damage.</p>
          <p>Angioplasty: A procedure to open blocked blood vessels in the heart using a
balloon or stent.</p>
          <p>Simplify the complex sentence using the given information.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Integration of BioBERT and BioBART</title>
        <p>The two stages of the pipeline work together in a straightforward and efective way. First, during input
processing, each complex biomedical sentence is analyzed by BioBERT, which decides whether the
sentence should be rephrased or simply removed. At the decision point, if BioBERT predicts “delete,”
the system outputs an empty string right away, skipping any further steps. If it predicts “rephrase,”
the sentence moves on to BioBART for the rephrasing stage, where it’s rewritten in simpler, clearer
language. Finally, in the output delivery step, the pipeline returns either the rephrased sentence or an
empty string, depending on the initial prediction. This design makes the most of BioBERT’s accuracy
in identifying unnecessary content and BioBART’s strength in generating high-quality simplifications,
while keeping the pipeline flexible so each model can be updated on its own.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Simplification with GPT-2 Medium</title>
        <sec id="sec-3-4-1">
          <title>3.4.1. GPT-2 Medium Architecture</title>
          <p>GPT-2 Medium is a generative transformer-based language model developed by OpenAI, containing
24 transformer layers and 345 million parameters. Unlike encoder-decoder models like BART, GPT-2
Medium operates solely as a decoder in an autoregressive setup. It is pre-trained on a diverse corpus of
web text to generate coherent and fluent natural language, making it highly efective for text generation
tasks such as simplification when appropriately fine-tuned. Key components of GPT-2 Medium include
several architectural elements working together for efective text generation. First, self-attention
layers are used within each transformer block, employing masked multi-head self-attention so that the
model only attends to previous tokens during generation, which preserves causality. Since transformers
do not inherently capture word order, positional embeddings are learned and added to the input
tokens to encode their positions in the sequence. After each attention layer, feed-forward networks
process the token representations through position-wise fully connected layers, adding non-linearity
and improving the model’s expressive power. Finally, the output head consists of a linear layer followed
by a softmax activation, which produces a probability distribution over the entire vocabulary at each
time step, enabling the model to generate text one token at a time.</p>
          <p>GPT-2 Medium’s autoregressive nature makes it ideal for producing natural, fluent simplifications
directly from a prompt, without needing a separate encoder.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Simplification Process</title>
          <p>GPT-2 Medium simplifies biomedical text through a prompt-based, generative approach. The process
begins with prompt construction, where each biomedical sentence is preceded by a handcrafted
prompt such as “Simplify the biomedical sentence:\n{text}\nSimplified:” to clearly
cue the model toward the simplification task. During tokenization, the combined prompt and sentence
are tokenized using GPT-2’s tokenizer, with padding or truncation applied to fit a fixed maximum
length, typically around 300 tokens. In the text generation stage, GPT-2 generates simplified text
using controlled sampling strategies: top- sampling set to 50, top- (nucleus sampling) at 0.95 for
diversity, and a temperature of 0.7 to balance creativity with coherence, while limiting the generation
to a maximum of 100 new tokens for conciseness. Finally, during output extraction, the generated
sequence is decoded and the section following “Simplified:” is parsed to produce the final simplified
sentence.</p>
          <p>This approach enables sentence restructuring, jargon reduction, and verbosity control, making
complex biomedical content more accessible.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.4.3. Fine-Tuning GPT-2 Medium</title>
          <p>To tailor the GPT-2 Medium model for biomedical text simplification, we fine-tuned it using a
domainspecific dataset composed of sentence-level examples. The aim was to help the model better understand
and generate simplified biomedical content while preserving key information.</p>
          <p>The training process was carefully designed to help GPT-2 learn to simplify complex biomedical text
efectively. Each input example began with the prompt “Simplify the biomedical sentence:” followed
by a complex medical passage, with the corresponding simplified version serving as the target text.
The learning objective used Causal Language Modeling (CLM) loss, training the model to predict
each token based only on the tokens that came before it. This aligns with GPT-2’s autoregressive
nature and supports fluent, context-aware generation. For the training configuration , the model
was trained for three epochs with a learning rate of 5e− 5 and a batch size of one per device. Gradient
accumulation over two steps simulated a larger batch size, improving stability on limited hardware.
Mixed-precision training (fp16) was used to speed up computation and reduce memory usage.
Progress was logged every 100 steps, and checkpoints were saved at the end of each epoch. For
data collation and tokenization, DataCollatorForLanguageModeling with masked language
modeling disabled (mlm=False) ensured compatibility with CLM, while the tokenizer maintained
consistent formatting and padding across samples.</p>
          <p>This fine-tuning process helped GPT-2 Medium adapt to the linguistic style and complexity of
biomedical texts, enhancing its ability to produce simplified content that remains faithful to the original
meaning.</p>
        </sec>
        <sec id="sec-3-4-4">
          <title>3.4.4. Prompt Engineering for GPT-2 Medium</title>
          <p>Efective prompt design was key for guiding GPT-2 Medium in simplification tasks. However, prompts
were limited to simple role-based instructions. Practical constraints on computational resources and
time during experimentation kept us from evaluating more detailed prompting strategies in this setting.
The prompt used was:
Simplify the biomedical sentence:
&lt;original_text&gt;
Simplified:
&lt;simplified_text&gt;
This format encouraged the model to treat simplification as a continuation task, predicting
simplified text based on the structure learned during fine-tuning. The inclusion of the explicit
keyword Simplified: provided a clear delimiter, making the generation task more deterministic and
improving the quality of extracted simplifications.</p>
          <p>In inference, decoding was performed using nucleus sampling with top- = 50, top- = 0.95, and
a temperature of 0.7 to balance fluency and accuracy. The use of the same prompt structure during
training and inference ensured consistency and robustness in output generation.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Hyperparameter Tuning</title>
        <p>Hyperparameters for BioBART and GPT-2 Medium were selected by combining limited grid search with
standard defaults. Key parameters like learning rate and dropout were initialized from recommended
values in prior work and the model documentation. A small set of candidate configurations was tested
on a validation set, and final values were chosen based on SARI scores. As extensive tuning was not
feasible within practical constraints, most other settings used standard defaults to ensure stable training
and reasonable performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>4.1. Text Simplification Results on Cochraneauto Test Dataset</title>
        <sec id="sec-4-1-1">
          <title>Task 1.1: Sentence-Level Simplification</title>
          <p>SARI</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Oficial CLEF 2025 SimpleText Evaluation Results</title>
        <sec id="sec-4-2-1">
          <title>Task 1.1: Sentence-Level Simplification on Cochrane-auto Abstracts</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Task 1.2: Document-Level Simplification on Cochrane-auto Abstracts</title>
          <p>34.39
1.01</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Task 1.2: Document-Level Simplification on Plain Language Summaries</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Inference and Analysis</title>
        <p>All models were trained exclusively on the provided sentence-level training data to ensure they aligned
with the language distribution and specific content of the task. To further test how well these models
could generalize, we also evaluated them on paragraph-level texts as an experimental extension, even
though they were never explicitly trained on longer passages.</p>
        <p>One key finding was GPT-2 Medium’s superior readability and SARI scores. As a powerful
autoregressive language model pre-trained on a large, diverse corpus, GPT-2 Medium excelled at generating
lfuent and natural text. When fine-tuned for sentence-level simplification, it consistently produced
simpler, clearer sentences while preserving meaning. This was reflected in its strong performance
across readability and simplification metrics, indicating efective vocabulary reduction and improved
sentence structure without losing important content.</p>
        <p>In addition, GPT-2 Medium demonstrated strong semantic preservation. The high semantic similarity
scores showed that even after simplifying the surface complexity, it retained the essential meaning
of biomedical content. This highlights GPT-2 Medium’s large-scale language understanding, which
allowed it to handle the nuances of technical medical text and produce faithful yet easier-to-understand
outputs.</p>
        <p>However, there were clear limitations in the BioBERT + BioBART pipeline. While this combination
leveraged domain-specific knowledge well, it often lagged in readability scores. This can be explained by
BioBERT’s design: it excels at extracting and understanding biomedical information but wasn’t built for
generating fluent, readable text. Although BioBART added generative capabilities, the outputs frequently
retained dense, technical language, making the simplified sentences still dificult for non-expert readers.</p>
        <p>We also observed that high SARI scores don’t always indicate efective simplification. When
simplifying longer sentences or paragraphs, models like BioBERT + BioBART often preserved much of
the original text. Because SARI rewards overlap between the input and output, this led to inflated
scores even when the text remained hard to understand. This highlights a mismatch between automatic
metrics like SARI and actual readability improvements as perceived by human readers.</p>
        <p>A key factor was the domain versus generalization trade-of. The BioBERT + BioBART pipeline
tended to prioritize preserving precise biomedical terminology and sentence structure, ensuring
accuracy but often at the cost of accessibility. In contrast, GPT-2 Medium applied more aggressive
simplification and generalization strategies that improved readability but introduced a potential risk of
oversimplification—though this was largely controlled by careful fine-tuning.</p>
        <p>We also noted the efects of task granularity. Even though GPT-2 Medium was only trained on
sentence-level data, it performed reasonably well on longer sentences, showing some ability to generalize.
However, its efectiveness dropped slightly as sentence length and contextual complexity increased,
which is expected since longer passages naturally pose greater challenges for simplification.</p>
        <p>Finally, the influence of training data was clear. Fine-tuning all models on sentence-level biomedical
data helped them learn domain-specific patterns. However, models pre-trained on general language
corpora, like GPT-2 Medium, were better able to adapt these patterns into simpler, more readable
outputs. Meanwhile, models pre-trained exclusively on biomedical texts, such as BioBERT, tended to
simplify more conservatively, limiting their ability to produce easier-to-read rephrasings.</p>
        <p>The results demonstrate that GPT-2 Medium, despite not being domain-specialized, is highly efective
for biomedical text simplification when properly fine-tuned, especially for improving readability and
ease of understanding. Meanwhile, the BioBERT + BioBART approach is valuable when domain fidelity
is paramount. Together, these approaches highlight the complementary strengths of general-purpose
and domain-specific models, and point toward promising hybrid strategies for future research in
multilingual and user-centered scientific communication.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Implementation Details</title>
        <p>All experiments conducted on Kaggle T4 notebooks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Scope</title>
      <p>Future research could explore combining specialized biomedical models with general-purpose language
models to create even more efective simplification tools and extend these methods to support multiple
languages for broader global impact. In this study, we explored two ways to make complex scientific
and biomedical texts easier to understand without losing their core meaning. Both approaches were
trained on the provided training data to ensure domain relevance and efectiveness. One approach used
BioBERT to identify challenging sentences, enriched them with helpful definitions from SciSpacy, and
then simplified them using BioBART. The other approach took a more straightforward path, using
GPT-2 Medium to directly rewrite the text in a simplified way. Both methods broke down complicated
information into simpler, clearer sentences by removing unnecessary jargon, trimming excess detail,
and improving overall readability. Our results showed that each approach successfully made the content
easier to grasp while still preserving important medical information. Together, these methods highlight
promising steps toward making scientific knowledge more accessible to everyone.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>We thank the organizers of the CLEF 2025 SimpleText track for designing the evaluation framework,
providing valuable datasets, and ofering timely support throughout the competition. We also appreciate
the reviewers’ feedback, which helped improve our methods and analysis. Additionally, we acknowledge
the contributions of colleagues and mentors who provided insights during model development and
experimentation.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check and Paraphrase and reword. After using these tools/services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
[7] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers
for language understanding, 2019. URL: https://arxiv.org/abs/1810.04805. arXiv:1810.04805.
[8] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical
language representation model for biomedical text mining, Bioinformatics 36 (2019) 1234–1240.</p>
      <p>URL: https://doi.org/10.1093/bioinformatics/btz682. doi:10.1093/bioinformatics/btz682.
[9] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer,
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation,
and comprehension, 2019. URL: https://arxiv.org/abs/1910.13461. arXiv:1910.13461.
[10] H. Yuan, Z. Yuan, R. Gan, J. Zhang, Y. Xie, S. Yu, Biobart: Pretraining and evaluation of a biomedical
generative language model, 2022. URL: https://arxiv.org/abs/2204.03905. arXiv:2204.03905.
[11] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised
multitask learners, 2019. URL: https://cdn.openai.com/better-language-models/language_models_
are_unsupervised_multitask_learners.pdf.
[12] Supriyono, A. P. Wibawa, Suyono, F. Kurniawan, A survey of text summarization: Techniques,
evaluation and challenges, Natural Language Processing Journal 7 (2024) 100070. URL: https://www.
sciencedirect.com/science/article/pii/S2949719124000189. doi:https://doi.org/10.1016/j.
nlp.2024.100070.
[13] P. Sahoo, A. K. Singh, S. Saha, V. Jain, S. Mondal, A. Chadha, A systematic survey of prompt
engineering in large language models: Techniques and applications, 2025. URL: https://arxiv.org/
abs/2402.07927. arXiv:2402.07927.
[14] S. Wubben, A. van den Bosch, E. Krahmer, Sentence simplification by monolingual machine
translation, in: H. Li, C.-Y. Lin, M. Osborne, G. G. Lee, J. C. Park (Eds.), Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Association for Computational Linguistics, Jeju Island, Korea, 2012, pp. 1015–1024. URL: https:
//aclanthology.org/P12-1107/.
[15] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring
the limits of transfer learning with a unified text-to-text transformer, 2023. URL: https://arxiv.org/
abs/1910.10683. arXiv:1910.10683.
[16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, 2023. URL: https://arxiv.org/abs/1706.03762. arXiv:1706.03762.
[17] K. Omelianchuk, V. Raheja, O. Skurzhanskyi, Text simplification by tagging, 2021. URL: https:
//arxiv.org/abs/2103.05070. arXiv:2103.05070.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Clef 2025 simpletext track</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>433</lpage>
          . doi:https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>63</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Phatak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Savage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ohle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mago</surname>
          </string-name>
          ,
          <article-title>Medical text simplification using reinforcement learning (teslea): Deep learning-based text simplification approach</article-title>
          ,
          <source>JMIR Med Inform</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <article-title>e38095</article-title>
          . URL: https://medinform.jmir.org/
          <year>2022</year>
          /11/e38095. doi:
          <volume>10</volume>
          .2196/38095.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Coster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kauchak</surname>
          </string-name>
          , Simple English Wikipedia:
          <article-title>A new text simplification task</article-title>
          , in: D.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Matsumoto</surname>
          </string-name>
          , R. Mihalcea (Eds.),
          <article-title>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Portland, Oregon, USA,
          <year>2011</year>
          , pp.
          <fpage>665</fpage>
          -
          <lpage>669</lpage>
          . URL: https://aclanthology.org/P11-2117/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chandrasekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <article-title>Motivations and methods for text simplification</article-title>
          ,
          <source>in: COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics</source>
          ,
          <year>1996</year>
          . URL: https://aclanthology.org/C96-2183/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nisioi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>Exploring neural text simplification models</article-title>
          , in: R. Barzilay, M.-Y. Kan (Eds.),
          <source>Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>91</lpage>
          . URL: https://aclanthology.org/P17-2014/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -2014.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Cochrane-auto: An aligned dataset for the simplification of biomedical abstracts</article-title>
          , in: M.
          <string-name>
            <surname>Shardlow</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Saggion</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Alva-Manchego</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zampieri</surname>
          </string-name>
          , K. North, S. Štajner, R. Stodden (Eds.),
          <source>Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>