<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University of Avignon at the CLEF 2025 SimpleText Track: Guided Medical Abstract Simplification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ygor Gallina</string-name>
          <email>ygor.gallina@univ-avignon.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tania Jiménez</string-name>
          <email>tania.jimenez@univ-avignon.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stéphane Huet</string-name>
          <email>stephane.huet@univ-avignon.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Avignon Université</institution>
          ,
          <addr-line>LIA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the participation of the SimpLIA team to the CLEF2025 SimpleText challenge on Documentlevel Scientific Text Simplification. The goal of the task is to simplify a scientific abstract in the biomedical domain for lay readers. To achieve this, we tried diferent approaches such as translation, keyword extraction and summarization. Our best performing approaches used LLM prompting guided by human-defined guidelines. The various approaches employed in this study rely on readily available tools and models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;text simplification</kwd>
        <kwd>biomedical</kwd>
        <kwd>scientific abstracts</kwd>
        <kwd>prompting</kwd>
        <kwd>large language models (LLM)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Access to high-quality healthcare information, in a language that people can understand, is a major
societal challenge. Lack of clarity around their health information can hinder patients’ ability to take
an active role in their care. Without a solid understanding of their diagnosis, treatment options, and
ongoing needs, patients may struggle to follow treatment plans, prioritize self-care, and make informed
decisions about their health [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This issue is particularly acute for vulnerable groups such as children,
non-native speakers, and those with lower educational backgrounds, exacerbating health disparities.
In order to reduce the persistent gap in health literacy between medical professionals and the general
public, automatic text simplification approaches can be leveraged to present complex health information
in a clear and concise manner, ensuring that essential details are retained while maintaining the accuracy
of the original content.
      </p>
      <p>
        The CLEF 2025 SimpleText lab [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is dedicated to enhancing accessibility to scientific texts for all
users. Unlike conventional text, scientific documents have dense specialized vocabulary and complex
sentence structures; successfully adapting these texts to a non-specialized audience while maintaining
accuracy is still a challenge.
      </p>
      <p>SimpleText2025 consider three tasks:
1. Text Simplification: Simplify scientific text [3]
1.1 Sentence-level Scientific Text Simplification
1.2 Document-level Scientific Text Simplification
2. Controlled Creativity: Identify and Avoid Hallucination [4]
3. SimpleText 2024 Revisited: Selected tasks by popular request</p>
      <p>In order to simplify a text, one must understand the whole document as well as its surrounding
context. We have decided to only focus on the Subtask 1.2 [5] at the document level which is more
challenging than the sentence level. In a nutshell, our approaches focus on prompting Large Language
Models (LLMs) because of their capacity of dealing with large context and thus whole documents.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data description</title>
      <p>Two randomised clinical trials were eligible for inclusion. One trial compared biliary lavage with
hydrocortisone versus saline in 17 patients. Hydrocortisone tended to increase adverse events (pancreatitis,
cholangitis with septicaemia, paranoid ideas, fluid retention) (RR 3.43, 95% CI 0.51 to 22.9) and had no
x cholangiographic improvement, which led to termination of the trial. The other trial compared budesonide
lp versus prednisone in 18 patients. Patients had statistically significant higher serum bilirubin concentration
e
omafter treatment with prednisone compared with budesonide (MD 10.4 µmol/litre, 95% CI 1.16 to 19.64</p>
      <sec id="sec-2-1">
        <title>C µmol/litre). No other statistically significant efects on clinical or biochemical outcomes were reported on any of the evaluated interventions. There is no evidence to support or refute peroral glucocorticosteroids for patients with primary sclerosing cholangitis. The intrabiliary application of corticosteroids via nasobiliary tube seems to induce severe adverse efects.</title>
      </sec>
      <sec id="sec-2-2">
        <title>One trial compared biliary lavage with hydrocortisone versus saline. The other trial compared oral</title>
        <p>lep administration of budesonide versus prednisone. No statistically significant efects were found on mortality,
imserum activity of alkaline phosphatases, serum bilirubin, and adverse events for any of the evaluated</p>
      </sec>
      <sec id="sec-2-3">
        <title>S intervention regimens. Two trials on glucocorticosteroids for primary sclerosing cholangitis were identified.</title>
        <p>The SimpleText2025 test data gathers a variety of document sources. Table 1 describes the number
of documents, number of words and length ratio (the length of complex documents over simple
documents). The majority of documents comes from the cochrane-auto1 [6] dataset, which we used as
our development dataset before knowing it was part of the test set. It was constructed from systematic
reviews of biomedical scientific literature, produced by Cochrane 2 a not-for-profit organization.</p>
        <p>A sample is composed of a complex document (the input) and a simple version (the reference). The
complex version is the "Main results" part of the Cochrane Review abstract and the simple version is
a part of the "Plain language summary" starting from the "What did we find?" paragraph to the end,
excluding titles. Figure 1 showcases a document from cochrane-auto3.</p>
        <p>By looking at the validation samples, we found that the simplification relies on the reformulation
of scientific wording, rather than the contextualization of the study and the explanation of specialty
terms. In particular, text describing results and confidence intervals are generally removed in the simple
version. In contrast, we also noticed that specialized terms tend to be untouched in the simple version,
since the Cochrane lay summary targets an audience interested and educated in the health domain.</p>
        <p>The SimpleText2025 test set contains 4 sources of documents mostly from the scientific medical
domain: Cochrane (literature review), cochrane-auto (literature review) and Medline (scientific articles),
and also from the computer science domain: SimpleText2024. The documents from Cochrane, Medline
1github.com/JanB100/cochrane-auto
2cochranelibrary.com
3The original document can be found at pubmed.ncbi.nlm.nih.gov/20091555.
and SimpleText2024 do not match the average length of the cochrane-auto dataset; Cochrane documents
are longer by ≃ 200 words, Medline and SimpleText2024 documents are shorter by ≃ 150 and ≃ 200
words respectively.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approaches</title>
      <sec id="sec-3-1">
        <title>3.1. Naive prompt</title>
        <p>We started to experiment with the most straightforward prompt we can imagine: "Simplify this text in
English." (cf. Listing 4). This prompt was tested using the R library rollama [7] with several widely
available and established LLMs: LLaMA3.3 (70.6B), LLaMA4 (108.6B), Gemma2 (9B), Gemma3 (4.3B),
Mistral-Small (22.2B). It was also tested using LLaMA3.2 (3.2B) and qwen (4B) but the scores were too
low to be reported.
3.2. Cascade: Extractive summarization followed by prompting
We also considered a two-step approach: we first reduce the size of the document using extractive
summarization following [8] and then ask for simplification using the prompt in Listing 4. We also
evaluated the extractive summarization alone, but the results were not satisfactory.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Backtranslation</title>
        <p>We hypothesize that by repeatedly translating the same document to a target language, the documents
wording will gradually simplify and is likely to shrink in size. We chose the NLLB [9] model for its
wide range of available languages, and Spanish as the target language.</p>
        <p>We adopted the following pipeline:
• Translate original (English) document to another language (Spanish).
• Translate the resulting text back to English.</p>
        <p>• Repeat these two steps n times.</p>
        <p>Because the NLLB models only allow inputs of 512 tokens, the document is split by sentences, which
are then grouped up to 512 tokens, translated and concatenated to form the full document.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Keyword simplification</title>
        <p>One way to simplify a text involves explaining or replacing specialty terms that lay readers are not
familiar with. Following this idea, we identify keywords (used as surrogates for specialty terms) using
the MultipartiteRank [10] algorithm4. Then a large language model is prompted (cf. Listing 5) to
generate a simpler version of the term. Once the simpler versions of the words are generated, we search
and replace the keywords with their simpler version. The simplified document is then very similar to
the original.</p>
        <p>To prevent the LLM from deviating from the expected YAML format, the initial prompt was modified
to instruct the generation of a definition of the term before its simplification, which acts like a
chain-ofthought.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Cochrane guidance</title>
        <p>Simplification, as well as summarization, is not performed in the same way depending on the goal
and the intended audience. In the context of Cochrane’s Lay Summaries, the objective is defined in
guidebooks destined to help professionals write a Cochrane Review. The "Lay Summary" aims at
ensuring that "anyone looking for information about the key points of a Cochrane Review can read and
understand them." [11]. As such, the "Template and guidance for writing a Cochrane Plain language
summary"5 details how a Lay Summary should be written, and what are the aspects to focus on.</p>
        <p>We identified diferent parts of the handbook to help guide the simplification process.</p>
        <p>Guidelines The 5th section of the handbook, namely "General advice on writing in plain language",
describes redaction advice regarding language, style and structure. Because the "simple" version of
the dataset does not contain headings, bullet lists nor paragraphs, we only considered the language
and style advice. We created a prompt that includes all the extracted advice (Guidelines-all), only the
language one (Guidelines-lang) and lastly, just the style one (Guidelines-styl) (cf. Listing 2).</p>
        <p>The handbook’s advice section is formatted as multilevel bullet points list, but we manually rewrote
them as sentences to create clearer guidelines (cf. Appendix B). For example:
• Avoid (or, when this is not possible or desirable, explain):
– long words. For example, use ‘blood thinners’ as an alternative to ‘anticoagulants’.
– research jargon. Use:
∗ ‘study’ rather than ‘trial’;
∗ ‘people with [condition]’, ‘women’, ‘children’ etc. rather than ‘participants’;
∗ [. . . ]
was manually rewritten to:
• Avoid (or explain)6 long words. For example, use ‘blood thinners’ as an alternative to
‘anticoagulants’.
• Avoid (or explain) research jargon. For example, use ‘study’ rather than ‘trial’; ‘people with
[condition]’, ‘women’, ‘children’ etc. rather than ‘participants’; [. . . ]
Fewshot The handbook’s Appendices 3 and 4 contain two curated examples of lay summaries to help
writers in their work. We hypothesize that these chosen examples can help a language model reproduce
the style of human written lay summaries. To match the cochrane-auto dataset and to act as few-shot
examples, the two summaries were copied from the "What did we find?" paragraph up to "How up to
date is this review?" (excluded) and paragraph names were removed.</p>
        <p>In order to validate the curation choice of these examples, two other documents were randomly
chosen from the validation set of cochrane-auto7 (Fewshot-rand) to act as few-shot examples.
Fewshotcoch is expected to perform better than Fewshot-rand given the samples’ curation. The prompt used is
defined in Listing 3.</p>
        <p>Prompting lay In order to understand how the guidance afects the lay summaries, we created a
simple prompt (distinct from the "Naive prompt" approach described in Section 3.1) that serves as a
base for "Guidelines" and "Fewshot" prompts (cf. Listing 1).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Results on the validation set of cochrane-auto are reported in Tables 2 and 3. Scores are computed
using only 1 run. During development the approaches were evaluated with BLEU [12], SARI [13],
BertScore [14] and LENS [15]. Although the "Naive prompt" approach was also assessed through LLaMA4
(109B), these results were not reported in Table 2 due to space constraints; overall, its performances
remain lower than LLaMA3.3 70B, with SARI, LENS and length ratio of 33.9, 71.1 and, 0.7 respectively.</p>
      <p>All approaches (except "Keyword") surpass "Baseline input"(where the input is left untouched, cf.
Table 3) in terms of LENS (41.7), but none in terms of SARI (46.0).
5training.cochrane.org/handbook/current/chapter-iii-s2-supplementary-material
6Some parts were removed such as ", when this is not possible or desirable,".
7The documents ids are CD013028 and CD013733.
Comments on LLMs With regard to SARI score, Mistral-Small almost always achieves the higher
scores. With respect to LENS, its superiority is not as striking, but it sill obtains the best score most of
the time.</p>
      <p>LLaMA3.3 with 70B parameters gets the best LENS score in 3 out of 4 runs, but its SARI score never
surpasses Mistral-Small. Its execution is very slow due to the large number of parameters, therefore we
did not run this model in all settings. Gemma2 with 9B parameters obtains a lower SARI overall, but is
comparable to Mistral-Small with 24B parameters on the LENS scores.</p>
      <p>Comments on approaches In general, including guidelines inside the prompt, in particular following
the "Guidelines-styl" approach, is benefiting to LLMs. Indeed, the "Prompting Lay" is always improved
with adding guidance (except for "Fewshot-coch" using Mistral-Small, cf. Table 2) and the highest scores
are observed with "Guidelines-styl". However, we observed one exception to this rule, Gemma3 having
higher LENS scores using the "Cascade" and "Naive prompt" approaches.</p>
      <p>From the "Fewshot" experiments (cf. Table 2), it seems that the examples chosen by Cochrane are
less efective than random ones.</p>
      <p>Document Length Overall the "Guidelines" approaches produced much longer documents (≃360
words) than the reference (≃195 words). This can be explained by the fixed limit of 500 words stated in
the prompt (cf. Appendix 2), which was computed using the longest document of the validation set;
this criterion implicitly encourages LLMs to produce lengths close to this limit.</p>
      <p>Another set of experiments led with the "Guidelines" approaches revisited the word limit with a
new value of 850. The results were consistently better with the original value of 500. This implies that
generating smaller documents would result in higher performances and we could for example set the
word limit dynamically according to the length ratio and the length of the input document.
Run submission The LENS score was used as the primary indicator of performance because it takes
into consideration both the input and the reference, while capturing semantics. SARI for its part, only
relies on edit distance.</p>
      <p>We chose to send runs that obtained a LENS score greater than ≃ 70. We thus submitted eight runs
and a baseline. Table 4 summarizes our submissions and reports the scores on the SimpleText25 test set.</p>
      <p>Due to the long inference time of LLaMA3.3 linked to its large numbers of parameters we were not able
to submit all approaches on time for the track. Similarly, the "Guidelines-styl" run with Mistral-Small,
while it achieves the best SARI score overall, was not submitted.</p>
      <p>By comparing the evaluations performed on the validation set and the SimpleText25 test set, we
observed the same ranks of our systems w.r.t. SARI. Our prior findings regarding the benefits of
incorporating guidelines, notably with Mistral-Small, are confirmed with the experiments done on these
new data.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Manual analysis</title>
      <p>In an efort to gain more insight on the produced simplified documents, 79 predictions coming from 4
randomly chosen documents8 were analyzed. To minimize bias, the documents were annotated solely
relying on the document ID, the original and the simplified documents, while the approach used to
produce the document was not shown. The analytical process involved an initial labeling stage where
open-label categories were applied to 20 documents, followed by a second phase where label revisions or
consolidations were made to better align with the underlying themes and phenomena. After adjusting
the label set, more documents were annotated.</p>
      <p>Low quality predictions All analyzed documents from the "Backtranslation" approach were too
short and only simplified the start of the document. In the same way, documents from the qwen LLM
were undersized and the end of the simplification seemed truncated. On the contrary, simplification
from the "Keyword" approach were too long.</p>
      <p>Factuality 7 out of the 79 analyzed simplification contained fake information, the majority of these
spurious data being mixed facts (In the original document A does B and C does D, but the prediction
states that A does D.), and one occurrence of wrong information (with "Guidelines-all", Gemma3
generated "A vena cava filter is a small, portable electronic device that counts the number of steps you
take.", whereas it is actually a filter catching blood clots).
8Examples were taken from the cochrane-auto validation set: CD006212, CD009242, CD011768, CD013792.</p>
      <p>Interestingly, 3 predictions contained correctly inferred data from the results described in the
complex document. For example, the document CD009242 compares the age at which children can walk
independently when using treadmills or not and reported the study results as "(MD -4.00, 95% CI -6.96 to
-1.04)", which was correctly simplified as "4 months earlier" by the LLM. This shows that the language
model was able to understand the information and rephrase it accordingly.</p>
      <p>In 15 descriptions, contextual details were incorporated on the experiments, mainly a description
of the studied condition and an explanation of the treatments. For example, the original document
CD013792 about treating miscarriages does not give context about the compared medicines (diferent
kinds of progesterone), but using "Guidelines-lang", Mistral-Small was able to add context to understand
why these medicines are used: "These medicines are types of progestogens, which are hormones that help
maintain pregnancy."
Writing Style Some conclusions directly address the reader to use the information with caution and
to fact-check such as "If you are a parent or caregiver, you may want to talk to your child’s doctor or
therapist about [treatment]." , "If you’re pregnant and at risk of miscarriage, talk to your doctor about your
options." Of the 13 predictions containing these conclusions destined to users, 50% of them come from
the "Guidelines-styl" approach.</p>
      <p>Document structure Every LLM organized its response (in markdown syntax) in at least one of
its predictions depending on the approach. The output structure ranges from one bullet point list to
multiple sections with titles, and bold or italic text. More precisely, 90% of generated text coming from
Gemma3 was structured, 50% from Gemma2, and ≃30% from LLaMA32, LLaMA33, and Mistral-Small.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper presents the University of Avignon’s participation in the SimpleText CLEF 2025 evaluation on
Tasks 1.2, using LLMs of various sizes. We tried several approaches, experimenting with backtranslation
using a Machine Translation system and with prompting to LLMs to guide them toward the task. The
best results were obtained by giving instructions drawn from the Cochrane Guidelines 3.5 on how to
simplify documents. Future works should investigate closely the size of the produced summaries and
further analyses the outputs to gain insights on explaining the results.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research was funded, in part, by Bpifrance under the PARTAGES project. We thank Avignon
University’s IUT for allowing us to use their local ollama instance. This work was performed using
HPC resources from GENCI-IDRIS (Grant 2025-A0181016171).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
the Sixteenth International Conference of the CLEF Association (CLEF 2025), Lecture Notes in
Computer Science, Springer, 2025.
[3] J. Bakker, B. Vendeville, L. Ermakova, J. Kamps, Overview of the CLEF 2025 SimpleText Task 1:</p>
      <p>Simplify Scientific Text, in: [16], 2025.
[4] B. Vendeville, J. Bakker, L. Ermakova, J. Kamps, Overview of the CLEF 2025 SimpleText Task 2:</p>
      <p>Identify and Avoid Hallucination, in: [16], 2025.
[5] J. Bakker, L. Ermakova, Overview of the CLEF 2025 SimpleText Task 1: Simplify Scientific Text,</p>
      <p>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025) (2025).
[6] J. Bakker, J. Kamps, Cochrane-auto: An Aligned Dataset for the Simplification of Biomedical
Abstracts, in: M. Shardlow, H. Saggion, F. Alva-Manchego, M. Zampieri, K. North, S. Štajner,
R. Stodden (Eds.), Proceedings of the Third Workshop on Text Simplification, Accessibility and
Readability (TSAR 2024), Association for Computational Linguistics, Miami, Florida, USA, 2024,
pp. 41–51. URL: https://aclanthology.org/2024.tsar-1.5/. doi:10.18653/v1/2024.tsar-1.5.
[7] J. B. Gruber, M. Weber, rollama: An R package for using generative large language models through</p>
      <p>Ollama, 2024. URL: https://arxiv.org/abs/2404.07654v1.
[8] D. Majumdar, Reinforcement Learning in NLP, 2025. URL: tutorials/reinfnlp/reinfnlp.html.
[9] N. Team, M. R. Costa-jussà, J. Cross, O. Çelebi, M. Elbayad, K. Heafield, K. Hefernan, E. Kalbassi,
J. Lam, D. Licht, J. Maillard, A. Sun, S. Wang, G. Wenzek, A. Youngblood, B. Akula, L. Barrault,
G. M. Gonzalez, P. Hansanti, J. Hofman, S. Jarrett, K. R. Sadagopan, D. Rowe, S. Spruit, C. Tran,
P. Andrews, N. F. Ayan, S. Bhosale, S. Edunov, A. Fan, C. Gao, V. Goswami, F. Guzmán, P. Koehn,
A. Mourachko, C. Ropers, S. Saleem, H. Schwenk, J. Wang, No language left behind: Scaling
humancentered machine translation, 2022. URL: https://arxiv.org/abs/2207.04672. arXiv:2207.04672.
[10] F. Boudin, Unsupervised Keyphrase Extraction with Multipartite Graphs, in: Proceedings of
NAACL-HLT 2018, Association for Computational Linguistics, 2018. URL: http://arxiv.org/abs/
1803.08721.
[11] N. Pitcher, D. Mitchell, C. Hughes, Template and guidance for writing a Cochrane Plain language
summary (2022).
[12] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: a method for automatic evaluation of machine
translation, in: Proceedings of the 40th Annual Meeting on Association for Computational
Linguistics - ACL ’02, Association for Computational Linguistics, Philadelphia, Pennsylvania, 2001,
p. 311. URL: http://portal.acm.org/citation.cfm?doid=1073083.1073135. doi:10.3115/1073083.
1073135.
[13] W. Xu, C. Napoles, E. Pavlick, Q. Chen, C. Callison-Burch, Optimizing Statistical Machine
Translation for Text Simplification, Transactions of the Association for Computational Linguistics 4
(2016) 401–415. URL: https://direct.mit.edu/tacl/article/43364. doi:10.1162/tacl_a_00107.
[14] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTSCORE: EVALUATING TEXT</p>
      <p>GENERATION WITH (2020).
[15] M. Maddela, Y. Dou, D. Heineman, W. Xu, LENS: A Learnable Evaluation Metric for Text
Simplification, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
Association for Computational Linguistics, Toronto, Canada, 2023, pp. 16383–16408. URL: https:
//aclanthology.org/2023.acl-long.905/. doi:10.18653/v1/2023.acl-long.905.
[16] G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025: Conference and Labs
of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2025.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Used Prompts</title>
      <p>Given t h e f o l l o w i n g s c i e n t i f i c a b s t r a c t c r e a t e a l a y summary ( 5 0 0
words maximum ) .</p>
      <p>A b s t r a c t :</p>
      <p>{ { a b s t r a c t } }
Listing 1: Prompt used for the Prompting lay approach.</p>
      <p>Given t h e f o l l o w i n g s c i e n t i f i c a b s t r a c t c r e a t e a l a y summary ( 5 0 0
words maximum ) u s i n g t h e f o l l o w i n g g u i d e l i n e s .
{% f o r g i n g u i d e l i n e %}
− { { g } }
{% e n d f o r %}
A b s t r a c t :</p>
      <p>{ { a b s t r a c t } }
Listing 2: Prompt used for the Guideline approach.</p>
      <p>Given t h e f o l l o w i n g s c i e n t i f i c a b s t r a c t c r e a t e a l a y summary ( 5 0 0
words maximum ) . Here a r e some e x a m p l e s o f l a y s u m m a r i e s .
{% f o r e i n e x a m p l e s %}
Example :
{ { e } }
{% e n d f o r %}
A b s t r a c t :</p>
      <p>{ { a b s t r a c t } }
Listing 3: Prompt used for the Few-shot approach.
queryLLama33 &lt;− make_ q u e r y (
t e x t = c o c h r a n e a u t o _ d o c s _ v a l $complex ,
prompt = " S i m p l i f y t h i s t e x t i n e n g l i s h " ,
system = " P l e a s e do NOT u s e a f i r s t i n t r o d u c t o r y s e n t e n c e , o n l y
t h e s i m p l i f i e d t e x t ’ " ,
p r e f i x = " Here i s t h e t e x t : "
)
r e s p _ l l a m a 3 3 _ v a l = q u e r y ( queryLLama33 , s c r e e n = FALSE , o u t p u t = " t e x t "
)
Listing 4: Prompt used for the Naive prompt approach.</p>
      <p>I’m a highschool student interested in healthcare. I want to understand some difficult
words in a scientific abstract.</p>
      <p>After the --- is an abstract and a list of {{ n }} words I find difficult to understand.
Define the words and give me a simpler version that I can understand (it’s okay if some
meaning is lost). The simpler version should be a direct replacement for the words
in the abstract.</p>
      <p>The answer should be formatted as a yaml list of objects with 3 attributes: the word,
the definition and the simpler version of the word.</p>
      <p>The answer should only contain yaml no other text or comment. The output must be
readable by a yaml parser.
- word: "WORD 1"
definition: "definition of WORD 1"
simple: "simpler WORD 1"
- word: "WORD 2"
definition: "definition of WORD 2"
simple: "simpler WORD 2"
- ...
--**Abstract** : {{ abstract }}
**Words to explain** :
{% for t in terms %}
- {{ t }}
{% endfor %}
Listing 5: Prompt used for the Keyword Simplification approach.
- word: "AMD"
definition: "Age-related macular degeneration - a condition that affects the central
part of the retina and can cause vision loss."
simple: "macular degeneration"
- word: "Evidence"
definition: "Scientific proof or data that supports a claim or idea."
simple: "proof"
- word: "Fatty Acid Supplements"
definition: "Supplements containing substances like omega-3 that are made of fats."
simple: "fish oil"
- word: "Low Risk"
definition: "A situation where something is unlikely to cause harm or problems."
simple: "safe"
- word: "Omega"
definition: "A chemical element (Omega) but in this context, it refers to a specific
type of fatty acid."
simple: "fish oil"
- word: "People"
definition: "Individuals or participants in a study."
simple: "patients"
- word: "Placebo"
definition: "An inactive substance or treatment given to a control group in a study,
used to see if the treatment itself has an effect."
simple: "dummy pill"
- word: "Progression"
definition: "The process of something developing or worsening over time."
simple: "worsening"
- word: "Trials"
definition: "Research studies, particularly clinical trials."
simple: "studies"
- word: "USA"
definition: "United States of America"
simple: "America"
Listing 6: Simplified keywords from Gemma2 for document CD010015.</p>
    </sec>
    <sec id="sec-10">
      <title>B. Cochrane Writing Advice</title>
      <p>This section compiles the tips available in Cochrane’s "Template and guidance for writing a Cochrane
Plain language summary". The tips were manually edited to form full sentences.</p>
      <p>Language advice
• Use everyday language. For example, refer to ‘people’ instead of ‘study participants’.
• Avoid (or explain) long words. For example, use ‘blood thinners’ as an alternative to
‘anticoagulants’.
• Avoid (or explain) research jargon. For example, use ‘study’ rather than ‘trial’; ‘people with
[condition]’, ‘women’, ‘children’ etc. rather than ‘participants’; the name of the intervention instead
of ‘intervention’; the name of the control or comparison instead of ‘control’ or ‘comparison’; the
name of the outcome instead of ‘outcome’.
• Avoid (or explain) words or phrases with dual or nuanced meanings. For example, use ‘medicines’
instead of ‘drugs’. ‘Significant’ means ‘important’ for a lay reader.
• Explain ‘common’ medical words. For example: ‘acute condition’: a condition or state that
develops suddenly and lasts a short time; ‘chronic condition’: a condition or state that lasts for a
long time.
• Explain technical medical terms. Plain language does not always mean ‘lay language’. Your
reader may know the topic via the technical term – especially if they are a patient or carer, so
it might be best to include the technical term and explain it. For example, to explain the action
of anticoagulants, you could write: ‘Anticoagulants are medicines that stop harmful blood clots
forming. However, these medicines may cause unwanted efects such as bleeding.’ Or you could
write the term in plain language followed by the technical term in brackets. For example, ‘blood
thinners (anticoagulants)’.
• Avoid acronyms and abbreviations. If you cannot avoid them, make sure you define them when
you first mention them. For example, ‘nicotine replacement therapy (NRT)’. Use phrases like
‘for example’, ‘such as’, ‘in other words’, ‘and so on’ instead of ‘e.g.’, ‘i.e.’ or ‘etc.’, as they are not
always understood if you are writing for a wide audience.
• Write for an international audience. Avoid regional words or terms; for example, use ‘hospital
emergency care’ instead of ‘Accident &amp; Emergency (A&amp;E)’ (UK) or ‘Emergency Room (ER)’ (USA).
Style advice
• Keep paragraphs and sentences short, but vary your sentence length occasionally to keep the
readers’ attention. Aim for an average of 20 words in a sentence. Break up longer sentences into
shorter ones. For example, instead of ‘Most people who smoke want to stop, however many find
it dificult to do so, even though they may use medicines that are designed to help them stop’,
you could write ‘Most people who smoke want to stop, but many find it dificult. People who
smoke may use medicines to help them stop.’.
• Use the active voice. For example, write ‘We compared and summarized the results of the studies’
instead of ‘The results of the studies were compared and summarized’.
• Use pronouns. Write in the first-person plural. For example, use ‘we assessed’ instead of ‘the
review authors assessed’. Address your reader using the second-person pronoun ‘you’. For
example, write ‘A pedometer is a small, portable electronic device that counts the number of
steps you take.’.
• Use verbs. For example, say ‘the students investigated’ not ‘the students conducted an
investigation’, or ‘we analyzed the data’ not ‘we carried out an analysis of the data’.
• Write numbers as numerals (1, 2, 3. . . ) rather than words. However, avoid starting a sentence
with a numeral. If necessary, rewrite the sentence. For example, write ‘The studies included 3260
people’ instead of ‘Three-thousand, two-hundred and sixty people took part in the studies’.
• Be concise. A Plain language summary can be up to 850 words long, but you do not have to
ifll the word limit. You should aim to keep it as short as possible while still including the most
important information.
• Replace ‘wordy’ phrases with shorter alternatives: use ‘during’ instead of ‘during the course of’;
use ‘often’, instead of ‘it was often the case that’; use ‘some’ or ‘many’, instead of ‘a number of’;
and o use ‘because’ instead of ‘due to the fact that’.</p>
    </sec>
    <sec id="sec-11">
      <title>C. Hyperparameters</title>
      <p>The LLMs were used with ollama’s default parameters or the model’s default if provided. Unless stated
otherwise, for all LLMs used in this work the temperature, topn and topp are set, respectively, to 0.8, 40
and 0.90. For Gemma3, the same parameters are set to 1.0, 64 and 0.95.</p>
      <sec id="sec-11-1">
        <title>Guidelines - all, Mistral-Small, SARI 34.8, LENS 47.7, length ratio 4.9</title>
      </sec>
      <sec id="sec-11-2">
        <title>We reviewed two studies that looked at diferent treatments for people with a condition called primary</title>
        <p>sclerosing cholangitis (PSC). PSC is a chronic condition where the bile ducts in the liver become inflamed and
scarred, making it hard for bile to drain properly. This can lead to serious health problems.
The first study compared two treatments: hydrocortisone and saline. Hydrocortisone is a type of steroid medicine
that reduces inflammation. Saline is a saltwater solution often used as a comparison treatment in studies. The
study involved 17 people with PSC. Unfortunately, the people who received hydrocortisone had more side efects
than those who got saline. These side efects included pancreatitis (inflammation of the pancreas), cholangitis
with septicaemia (infection and inflammation of the bile ducts that can spread to the bloodstream), paranoid
ideas, and fluid retention. The study was stopped early because hydrocortisone did not improve the condition of
the bile ducts and caused these serious side efects.</p>
      </sec>
      <sec id="sec-11-3">
        <title>The second study compared two diferent steroid medicines: budesonide and prednisone. Both are used to reduce</title>
        <p>inflammation but work in slightly diferent ways. This study included 18 people with PSC. After treatment, those
who received prednisone had higher levels of bilirubin in their blood than those who got budesonide. Bilirubin is
a waste product made by the liver that can build up if the bile ducts are blocked. Higher levels of bilirubin can
indicate that the liver is not working properly.</p>
      </sec>
      <sec id="sec-11-4">
        <title>However, neither study found any other significant efects on how people felt or on other measures of their</title>
        <p>health. This means we still do not have clear evidence to support or refute the use of steroid medicines taken
by mouth for people with PSC. The studies also suggest that putting corticosteroids directly into the bile ducts
through a tube in the nose (intrabiliary application) may cause severe side efects.</p>
      </sec>
      <sec id="sec-11-5">
        <title>In summary, more research is needed to find efective and safe treatments for people with primary sclerosing cholangitis.</title>
      </sec>
      <sec id="sec-11-6">
        <title>Guidelines - lang, Mistral-Small, SARI 37.3, LENS 71.9, length ratio 4.7</title>
      </sec>
      <sec id="sec-11-7">
        <title>Two studies looked at diferent treatments for people with a condition called primary sclerosing cholangitis</title>
        <p>(PSC). This is a chronic liver disease where the bile ducts become inflamed and scarred, afecting how bile flows
from the liver. The first study compared two treatments: hydrocortisone (a type of steroid) mixed with saline
solution versus just saline solution alone. The treatment was given through a process called biliary lavage, which
involves flushing out the bile ducts.</p>
      </sec>
      <sec id="sec-11-8">
        <title>The study included 17 people and found that those who received hydrocortisone had more side efects such as</title>
        <p>pancreatitis (inflammation of the pancreas), cholangitis with septicaemia (infection in the bile ducts spreading to
the bloodstream), paranoid ideas, and fluid retention. These side efects were serious enough to stop the study
early. There was no improvement in the condition of the bile ducts seen through imaging tests.</p>
      </sec>
      <sec id="sec-11-9">
        <title>The second study compared two diferent steroids: budesonide versus prednisone. This study included 18 people.</title>
      </sec>
      <sec id="sec-11-10">
        <title>The results showed that people who took prednisone had higher levels of bilirubin (a waste product made by the</title>
        <p>liver) in their blood after treatment compared to those who took budesonide. Bilirubin is a substance produced
by the body and processed by the liver, so high levels can indicate problems with liver function.</p>
      </sec>
      <sec id="sec-11-11">
        <title>However, there were no other significant diferences in how well people felt or in other measures of health</title>
        <p>between the two groups. This means that neither steroid showed clear benefits over the other for treating PSC.</p>
      </sec>
      <sec id="sec-11-12">
        <title>Overall, these studies do not provide strong evidence to support or reject the use of steroids taken by mouth</title>
        <p>(peroral glucocorticosteroids) for people with primary sclerosing cholangitis. The study also suggests that using
corticosteroids directly in the bile ducts through a nasobiliary tube can cause severe side efects.</p>
      </sec>
      <sec id="sec-11-13">
        <title>In summary, more research is needed to find efective and safe treatments for people with PSC.</title>
      </sec>
      <sec id="sec-11-14">
        <title>Guidelines - styl, Mistral-Small, SARI 37.2, LENS 75.1, length ratio 3.1</title>
      </sec>
      <sec id="sec-11-15">
        <title>We reviewed two clinical trials to understand the efects of diferent treatments on people with a condition</title>
        <p>called primary sclerosing cholangitis (PSC). This condition afects the bile ducts in the liver.</p>
      </sec>
      <sec id="sec-11-16">
        <title>The first trial compared hydrocortisone and saline. We found that hydrocortisone increased adverse events, such</title>
        <p>as pancreatitis, infections, paranoid ideas, and fluid retention. The study had to stop early because of these
severe side efects. Hydrocortisone did not improve bile duct imaging results either.</p>
      </sec>
      <sec id="sec-11-17">
        <title>The second trial compared budesonide and prednisone. We found that people treated with prednisone had higher</title>
        <p>levels of bilirubin in their blood. Bilirubin is a waste product made by the liver. Higher levels can indicate liver
problems. However, we did not find any other significant diferences between these two treatments.</p>
      </sec>
      <sec id="sec-11-18">
        <title>Overall, our review does not provide clear evidence to support or refute the use of oral glucocorticosteroids for people with PSC. We also found that applying corticosteroids directly into the bile ducts via a nasobiliary tube can cause severe side efects.</title>
      </sec>
      <sec id="sec-11-19">
        <title>You should talk to your healthcare provider about the best treatment options for you if you have PSC. More research is needed to find efective and safe treatments for this condition.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <article-title>Poor health literacy: a 'hidden' risk factor</article-title>
          ,
          <source>Nature Reviews Cardiology</source>
          <volume>7</volume>
          (
          <year>2010</year>
          )
          <fpage>473</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText track: Simplify scientific texts (and nothing more)</article-title>
          , in: J. Carrillo de Albornoz,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          . Proceedings of
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>