<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>at the CLEF 2025 SimpleText Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Taiki Papandreou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Bakker</string-name>
          <email>j.bakker@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <email>kamps@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Storage and Retrieval</institution>
          ,
          <addr-line>Natural Language Processing, Text Simplification, Jargon Detection, Overgen-</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reports on the University of Amsterdam's participation in the CLEF 2025 SimpleText track. We participated in Task 1 both for sentence-level and document-level text simplification. We explored scientific text simplification using BART fine-tuning and jargon-aware prompting with LLaMA 3.1. Our plan-guided BART model achieved the highest SARI score at the sentence level, while long input document-level text simplification approaches scored close. LLaMA performed competitively without domain-specific training, highlighting the promise of large language models for zero-shot simplification. More generally, document-level coherence and handling of domain-specific terms remain key challenges for future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rise of the internet and social media has granted us access to an extraordinary amount of information,
but it also brings significant risks, particularly the rapid spread of misinformation and disinformation.
Scientific knowledge has long been regarded as the most efective counter to such falsehoods, and the
importance of scientific literacy is widely acknowledged. However, in reality, many non-experts shy
away from scientific sources, often perceiving them as too complex. Therefore, it is crucial to eliminate
barriers that prevent the general public from engaging with scientific texts.</p>
      <p>
        The CLEF 2025 SimpleText track investigates the barriers ordinary citizens face when accessing
scientific literature head-on, by making available corpora and tasks to address diferent aspects of the
problem. For details on the exact track setup, we refer to the Track Overview paper CLEF 2025 LNCS
proceedings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as well as the detailed task overviews in the CEUR proceedings [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>We conduct an extensive analysis of the three tasks of the track: Task 1 on Text Simplification ; Task 2
on Controlled Creativity; and Task 3 on SimpleText 2024 Revisited. We submitted multiple runs for Task 1,
focusing on both sentence- and document-level simplification approaches. We also submitted runs for
Task 2, although they are closely related to Task 1. No runs were submitted for Task 3.</p>
      <p>The rest of this paper is structured as follows. Next, in Section 2, we discuss our experimental setup
and the specific runs submitted. Section</p>
      <sec id="sec-1-1">
        <title>3 discusses the results of our runs and provides a detailed analysis of the corpus and results for each task. We end in Section 4 by discussing our results and outlining the lessons learned.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Setup</title>
      <p>In this section, we will detail our approach for the CLEF 2025 SimpleText track tasks.
CEUR</p>
      <p>ceur-ws.org</p>
      <sec id="sec-2-1">
        <title>2.1. Experimental Data</title>
        <p>
          For details of the exact task setup and results, we refer the reader to the detailed overview of the track
in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Our focus is on text simplification (Tasks 1.1, 1.2, and 2.3), and the basic ingredients of the track
are:
Corpus The new CLEF 2025 SimpleText corpus is based on biomedical literature abstracts and lay
summaries from Cochrane systematic reviews, and is called Cochrane-auto [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Train data The specific train data for Task 1 consists of 1,085 documents, 4,171 paragraphs, and 14,719
sentences, with paired content from the abstract and the plain language summary.
Test data The primary test data consists of 217 new Cochrane abstracts with paired plain English
summaries, composed of 4,293 source sentences.</p>
        <p>
          References There are two sets of references for the new Cochrane data in the test set. First, a subset
of 37 abstracts and 587 sentences, paired with 37 plain language summaries with 388 sentences,
aligned and filtered as in Cochrane-auto [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Second, the full set of 217 abstracts with 4,293 source
sentences, paired with 217 plain language summaries with 3,641 sentences, contained document
level pairs of the results and conclusions sections [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          We used a number of additional resources for our jargon aware simplification approaches.
Additional Sources For Task 1, we used the MedReadMe training set [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for jargon detection, which
was used as a part of our prompt to simplify text during inference.
        </p>
        <p>
          Additional Train References For Task 1, the training set of MedReadMe [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] was used for jargon
detection. This dataset contains 2,587 annotated sentences with a total of 5,207 jargon terms,
sourced from 15 established medical simplification resources such as Cochrane Plain Language
Summaries, NIH MedlinePlus articles, and clinical guideline adaptations from professional
associations. Annotations were created by undergraduate students without medical training, simulating
layperson comprehension challenges. Each sentence is labeled using a hierarchical classification
scheme distinguishing between:
• Binary: jargon vs. non-jargon
• 3-class: medical jargon, general/multisense terms, abbreviations
• 7-class: including Google-Easy/Hard distinctions
We used the Roberta-large model trained on binary labeled training data since the detection-rate
was the highest.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Oficial Submissions</title>
        <sec id="sec-2-2-1">
          <title>We created runs for both tasks of the track, which we will discuss in order.</title>
          <p>Task 1 This task asks simplify scientific text. We submitted seven runs in total, for both the
sentencelevel (1.1) and document level (1.2) tasks, as shown in Table 1.</p>
          <p>
            Five of our runs were created using the trained BART models that we introduced in the Cochrane-auto
paper [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The baseline was trained on the document pairs in the original Cochrane corpus [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], while
the other models were trained on the sentence, paragraph, and document pairs in Cochrane-auto. Our
plan-guided system is inspired by the work of Cripwell et al. [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. It consists of a classifier that specifies
how each sentence should be simplified—should it be copied, rephrased, split, merged, or deleted?—and
a BART model that simplifies each sentence conditioned on the predicted simplification action.
          </p>
          <p>For the submissions UvA_task11_llama31 and UvA_task12_llama31, we used the trained
Robertalarge model to detect jargon terms present in every sentence of each abstract. A Llama-3.1-8B-Instruct1
model was then prompted to simplify each abstract either sentence by sentence or completely, replacing
the detected jargon terms where possible. The prompt was designed to preserve numerical values and
essential terminology while allowing lexical simplifications where possible to prevent hallucination.
Hyperparameters included a temperature of 0.3, top-p sampling of 0.95, repetition penalty of 1.3, and
max new tokens set to 512. For document-level processing, we used NLTK to split abstracts into
sentences, simplified each independently, and reassembled the output. Post-processing of noisy outputs
played a key role in improving clarity and factual consistency.</p>
          <p>For both sentence level and document level text simplification, we used the prompt in Table 2. We
ifrst detected jargon terms as described above, and then we provided the detected terms to the prompt.
The defined instructions helped us to keep the correct information.</p>
          <p>Task 2 This task asks identify and avoid hallucination. Indirectly, our submissions to Task 1 above
can also be evaluated in terms of the evaluation measures of Task 2.3. Hence, in a sense, we submitted
the same runs as already shown in Table 1.</p>
          <p>Task 3 This task asks selected tasks by popular request. We did not do specific experiments for this
Task, but the Task 1 test sentences and abstracts include the sources of the CLEF 2024 Simplify Scientific
Text Task. Hence, the Task 1 submissions as shown in Table 1 can also be evaluated in terms of their
out-of-domain efectiveness against the CLEF 2024 reference simplifications.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>In this section, we will present the results of our experiments in three self-contained subsections
following the CLEF 2025 SimpleText Track tasks.</p>
      <sec id="sec-3-1">
        <title>3.1. Task 1: Text Simplification</title>
        <p>We discuss our results for Task 1, asking to simplify scientific text. They are shown in Table 3.</p>
        <sec id="sec-3-1-1">
          <title>1https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct</title>
          <p>Complex paragraph
There is some evidence that thrombolysis can probably improve outcomes after aneurysmal subarachnoid
haemorrhage, without increasing the risk of haemorrhagic complications. Thrombolysis likely reduces the
risk of poor functional outcome and cerebral artery vasospasm, and may reduce the risk of delayed cerebral
ischaemia, but it likely makes little to no diference to case fatality or hydrocephalus, and may make little to
no diference to the risk of cerebral infarction. However, the current evidence is still uncertain. The uncertainty
is primarily due to the small total number of participants and outcome events. Data from further studies
are required to confirm the eficacy of thrombolysis for improving outcomes after aneurysmal subarachnoid
haemorrhage.</p>
          <p>Simplification by jargon-aware Llama model
Some studies suggest that a treatment called thrombolytic therapy may help patients recover better from
bleeding in the brain caused by a ruptured blood vessel, without causing further bleeding problems. This
treatment might lower chances of: * Poor recovery * Blood vessel narrowing in the brain (after a stroke) *
Brain damage from lack of blood flow (later on) It probably won’t afect how many people die after getting
this condition. It also doesn’t seem to greatly change the chance of another blockage happening in the same
area of the brain. The facts aren’t clear yet. There aren’t enough people in the study for a reliable answer.
More research is needed to see if a treatment called thrombolysis can help people recover better from bleeding
in their brain caused by a weak blood vessel bursting.
3.1.1. Task 1.1 Sentence-level Simplification
When computed against the references in the newly created Cochrane-auto test set, our plan-guided
system achieves the highest SARI score, but it is only very slightly higher than that of the baseline.
This indicates that training on Cochrane-auto does not ofer a substantial advantage over training on
the Cochrane corpus for document-level simplification.</p>
          <p>As for the jargon-aware prompt method, despite not being trained on the text simplification domain
dataset, it achieves a similar SARI score compared to the ones trained on the Cochrane-auto corpus.
3.1.2. Task 1.2 Document-level Simplification
The initial results for the BART models trained at the paragraph and document levels further demonstrate
that training on Cochrane-auto rather than the Cochrane corpus does not improve performance and
may even harm the SARI score. The Llama model that simplifies entire abstracts also achieved a lower
SARI score than the Llama model operating at the sentence level.
3.1.3. Analysis
Table 4 displays the last paragraph of an abstract in the CLEF 2025 SimpleText test set, which is dense
with jargon. It also shows the output of the LLaMa-3.1 model that simplified this abstract
sentenceby-sentence, conditioned on the jargon terms as detected by the trained RoBERTa classifier. It can
be seen that the model successfully simplifies various jargon terms such as aneurysmal subarachnoid
haemorrhage into easier-to-read alternatives like bleeding in the brain caused by a ruptured blood vessel.
Thus, the meaning of the original paragraph is preserved, while the text is made more accessible to a
general audience. However, thrombolysis is substituted with a treatment called thrombolysis twice, while
it would be suficient to mention that thrombolysis is a treatment once. This is a result of simplifying
each sentence in isolation.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Task 2: Controlled Creativity</title>
        <p>3.2.1. Task 2.3 Sentence-level Simplification
We continue with Task 2, asking to identify and avoid hallucination. While we did not submit a special
pair of runs with and without particular grounding by design components, our Task 1 submission did
take special care to avoid overgeneration or other information distortion.</p>
        <p>First, our Cochrane-auto trained models are conservative and avoid gratuitous changes. This may
not optimize readability as much as some other approaches, but it leads to an accurate rendition of the
content without risk of information distortion. We feel that such a conservative approach is important
in the context of scientific text simplification.</p>
        <p>Second, our jargon-aware runs attempt to address impenetrable terminology by actively promoting the
deletion, rephrasing, or explanation of jargon in the text. This can result in significant content insertion.
However, the used models and prompt were prone to ”noise” resulting in potential overgeneration.
3.2.2. Results
We employ a simple alignment of source and prediction sentences, specifically examining overgeneration
or noise at the end of the prediction. Suppose the alignment reaches the end of the source tokens, while
the prediction still has another sentence (or additional content after the last sentence). In that case, this
is flagged as “overgeneraton.” This approach is more reliable at the sentence level, as the alignment and
spurious content can be detected with relative ease.</p>
        <p>Table 5 shows the results. We observe here that, indeed, the Cochrane-auto runs have marginal
overgeneration and are conservative in their edits. we also see that the jargon-aware LLaMA run has a
significant fraction of spurious content. While part of this may be due to additional explanations of
jargon and helpful, there is also a significant number of cases in which ”noise” or LLM commentary
is added. At the document level, the alignment can be tricky due to the length of the abstracts and
extensive sentence deletions. Partly due to the smaller number of cases, errors seem more pronounced for
document-level text simplification. This may be partly due to the more complex alignment of documents,
but also due to complexities in removing ”noise” or spurious content in very long predictions. The
relative fraction still serves as a useful indicator of spurious content, and we observe almost twice as
much spurious content in the baseline (unaligned) Cochrane train data. The aligned Cochrane-auto
models fare much better. The LLaMA model again sufers from a relatively high number of cases with
spurious content.
We continue with Task 3, asking for selected tasks by popular request. As noted, the CLEF 2024 text
simplification test data were included in the CLEF 2025 test corpus. Hence, the performance of the
exact same models on a diferent domain can be evaluated.</p>
        <p>We leave this evaluation and analysis for future research, as the track organizers have not yet released
the references and evaluation for the additional abstracts and sentences in the test set.</p>
        <p>
          For reference, similar Cochrane-trained BART models were submitted to the CLEF 2024 Simpletext
Track last year [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. On the scientific abstracts on technology and AI, these models obtained SARI scores
of 26.7 (sentence level), 33.2 (document level), and 35.1 (paragraph level). These scores in another
domain are notably lower than those in the biomedical domain this year, possibly also due to the
sentence-level references.
        </p>
        <p>
          For further reference, similar Cochrane-trained sentence-level BART and Contextual BART models
were submitted to the TREC 2024 PLABA Track last year [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. On the Medline abstracts, these models
obtained SARI scores of 28.8 (sentence level BART), and 30.5 (sentence level BART with whole abstract
as context). These scores are also notably lower than those observed this year, possibly due to the
diferent nature of Medline abstracts and the choice to run planner-based models on a one-to-one
sentence simplification task.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>This paper detailed the University of Amsterdam’s participation in the CLEF 2025 SimpleText track. We
conducted a range of experiments for the diferent tasks of the track.</p>
      <p>Our primary focus was on the core Task 1 on Text Simplification , where we evaluated multiple
approaches to scientific text simplification, including BART-based fine-tuning and jargon-aware prompting
with LLaMA 3.1. Our plan-guided BART model achieved the highest SARI score on sentence-level
simplification, indicating that structured simplification actions can slightly improve performance.
However, training on the Cochrane-auto dataset did not significantly outperform the baseline trained on
the original Cochrane corpus, especially at the document level. The LLaMA-based method performed
competitively without domain-specific training, demonstrating the potential of large language models
in zero-shot simplification when guided by jargon detection and structured prompts. These results
suggest that while current models are efective at sentence-level simplification, maintaining coherence
and factual accuracy across longer texts remains a challenge. Future work should focus on better
discourse modeling and more robust handling of domain-specific terminology.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was conducted as part of the final research project for the Master’s in Artificial Intelligence program
at the University of Amsterdam.</p>
      <p>We thank the track and task organizers for their amazing service and efort in making realistic benchmarks for
scientific text simplification available.</p>
      <p>Jan Bakker is partly funded by the Netherlands Organization for Scientific Research (NWO NWA # 1518.22.105).
Jaap Kamps is partly funded by the Netherlands Organization for Scientific Research (NWO CI # CISC.CC.016,
NWO NWA # 1518.22.105), the University of Amsterdam (AI4FinTech program), and ICAI (AI for Open Government
Lab). Views expressed in this paper are not necessarily shared or endorsed by those funding the research.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check. After using these tools/services, the authors reviewed and edited the content as needed and
take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText track: Simplify scientific texts (and nothing more)</article-title>
          , in: J. Carrillo de Albornoz,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText Task 1: Simplify Scientific Text</article-title>
          , in: [ 10],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText Task 2: Identify and Avoid Hallucination</article-title>
          , in: [10],
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Cochrane-auto: An aligned dataset for the simplification of biomedical abstracts</article-title>
          , in: M.
          <string-name>
            <surname>Shardlow</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Saggion</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Alva-Manchego</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zampieri</surname>
          </string-name>
          , K. North, S. Štajner, R. Stodden (Eds.),
          <source>Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR</source>
          <year>2024</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>51</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .tsar-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Devaraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Paragraph-level simplification of medical texts</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>4972</fpage>
          -
          <lpage>4984</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . naacl-main.
          <volume>395</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl- main.395.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , W. Xu,
          <article-title>MedReadMe: A systematic study for fine-grained sentence readability in medical domain</article-title>
          , in: Y.
          <string-name>
            <surname>Al-Onaizan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>Y.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Miami, Florida, USA,
          <year>2024</year>
          , pp.
          <fpage>17293</fpage>
          -
          <lpage>17319</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .emnlp-main.
          <volume>958</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .emnlp- main.958.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cripwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Legrand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gardent</surname>
          </string-name>
          ,
          <article-title>Document-level planning for text simplification</article-title>
          , in: A.
          <string-name>
            <surname>Vlachos</surname>
          </string-name>
          , I. Augenstein (Eds.),
          <source>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , Association for Computational Linguistics, Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>993</fpage>
          -
          <lpage>1006</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .eacl-main.
          <volume>70</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          . eacl- main.70.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          , G. Yüksel, J. Kamps, University of amsterdam at the CLEF 2024
          <article-title>simpletext track</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), Grenoble, France,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2024</year>
          , volume
          <volume>3740</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>3182</fpage>
          -
          <lpage>3194</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-310.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Papandreou-Lazos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Biomedical text simplification models trained on aligned abstracts and lay summaries</article-title>
          , in: I. Soborof, G. Awad,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ellis (Eds.),
          <source>The Thirtythird Text REtrieval Conference Proceedings (TREC</source>
          <year>2024</year>
          ),
          <article-title>National Institute for Standards and Technology</article-title>
          .
          <source>NIST Special Publication</source>
          <volume>1329</volume>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025:
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>