<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Zaragoza, Spain
* Corresponding author.
†These authors contributed equally.
$ lplaza@lsi.uned.es (L. Plaza); lurdes@lsi.uned.es (L. Araujo); flopez@lsi.uned.es (F. López-Ostenero); juaner@lsi.uned.es
(J. Martinez-Romo)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Prompting Large Language Models for Spanish Easy-to-Read Text Generation: UNED-INEDA at CLEARS 2025</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Plaza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lourdes Araujo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando López-Ostenero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Martinez-Romo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Educación a Distancia, Calle Juan del Rosal</institution>
          ,
          <addr-line>16, 28040, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper describes the participation of the UNED-INEDA team in Subtask 2 of the CLEARS shared task at IberLEF 2025, which focuses on the automatic adaptation of texts into the Easy-to-Read (E2R) format in Spanish, following the UNE 153101:2018 EX guidelines. We explore three prompting strategies-zero-shot, random few-shot, and similarity-based few-shot-applied to three instruction-tuned language models: Dolphin Mistral, Starling-LM, and LLaMA 3.2-1B-Instruct. Our oficial submission employs Dolphin Mistral in a zero-shot setting, which obtained competitive results, ranking fourth in both semantic similarity and readability according to the oficial evaluation. These results highlight the potential of large language models and prompt-based strategies for controlled text simplification in Spanish. We discuss the trade-ofs between semantic fidelity and accessibility, and outline future work aimed at improving rule compliance and linguistic simplification through adaptive prompting and ifne-tuning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text simplification</kwd>
        <kwd>Easy-to-read</kwd>
        <kwd>Text Accessibility</kwd>
        <kwd>UNE 153101</kwd>
        <kwd>2018 EX</kwd>
        <kwd>Prompt Engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ensuring that written content is understandable to all individuals is a key factor of an inclusive
society. Yet, a substantial proportion of everyday texts—ranging from public service announcements to
news articles and administrative documents—remain inaccessible to people with cognitive or reading
dificulties. This challenge has generated interest in developing strategies that aim to improve text
comprehensibility and ensure equitable access to information.</p>
      <p>
        Movements such as Plain Language (PL)[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have also emerged in response to the growing recognition
that complex and intricate language can exclude large segments of the population from understanding
essential information. The PL movement advocates for the use of clear, concise, and well-structured
language that allows readers to easily find, understand, and use the information they need [ 2]. Initially
driven by legal and governmental bodies seeking to improve transparency and public trust, PL has since
expanded into fields such as healthcare, education, and finance [3, 4].
      </p>
      <p>Another strategy to enhance the accessibility of written content is Easy-to-Read (E2R). E2R is a
structured approach designed to support readers with conditions such as dyslexia, autism spectrum
disorders, intellectual disabilities, or acquired brain injuries. E2R promotes clear, concise, and
wellorganized language, guided by specific formatting and content rules, such as those outlined in the UNE
153101:2018 standard for the Spanish language. These adaptations are intended to enhance readability
without compromising the essential meaning of the original content, thus enabling individuals with
cognitive challenges to fully understand written information.</p>
      <p>While both Plain Language (PL) and Easy-to-Read (E2R) approaches aim to improve the accessibility
of written information, they difer in scope, audience, and degree of simplification. Plain Language
is intended for a broad audience, including people with lower literacy levels or limited language
proficiency, and promotes clarity, conciseness, and well-structured information. In contrast,
Easy-toRead is designed specifically for individuals with intellectual or cognitive disabilities, such as autism
or acquired brain injuries, and adheres to stricter guidelines concerning sentence length, vocabulary,
syntax, and formatting—such as those defined in the UNE 153101:2018 standard. While all E2R texts
should comply with the principles of Plain Language, the reverse does not hold true: not all PL texts
meet the cognitive accessibility criteria required for E2R. E2R can be seen as a highly specialized and
standardized form of Plain Language, and as such, the automation of E2R poses unique challenges that
go beyond standard text simplification.</p>
      <p>In this paper, we describe our participation in the CLEARS Challenge for Plain Language and
Easyto-Read Adaptation for Spanish texts; in particular, Subtask 2: Adaptation of texts to E2R. Traditionally,
converting standard texts into E2R versions has relied heavily on human expertise—linguists, educators,
and accessibility specialists—who apply the guidelines manually [5]. However, given the volume of
content generated daily and the growing demand for accessible communication, this manual approach
is increasingly unsustainable. Advances in Natural Language Processing (NLP) and Machine Learning
(ML) ofer efective means to automate this transformation, helping save time and costs while making
the process more consistent and scalable [6].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Prior research on accessible communication has focused on defining and applying guidelines for making
texts understandable to people with cognitive or reading dificulties. In the Spanish-speaking context,
the UNE 153101:2018 standard serves as a cornerstone for Easy-to-Read (E2R) adaptation, specifying
rules related to sentence structure, vocabulary, visual formatting, and content organization [7]. These
rules are commonly applied manually by trained experts such as educators, linguists, and accessibility
professionals. Studies have shown that such adaptations can significantly improve comprehension and
usability of written content among users with intellectual disabilities, autism, or acquired brain injuries
[8].</p>
      <p>Early attempts at automating text simplification focused on rule-based systems and syntactic
transformations, such as sentence splitting, clause reordering, and removal of subordinate structures [9].
These methods relied heavily on hand-crafted grammars and often sufered from limited scalability
and generalizability. Lexical simplification, which involves identifying complex words and replacing
them with simpler alternatives, was typically addressed using frequency-based word lists or manually
curated dictionaries [10].</p>
      <p>With the advent of supervised learning, statistical and machine learning methods became increasingly
prominent. Parallel corpora of original and simplified texts (e.g., Simple Wikipedia) enabled the
development of phrase-based machine translation models for simplification tasks [ 11]. These models
improved fluency and coverage but struggled to preserve meaning.</p>
      <p>More recently, deep learning—particularly encoder-decoder architectures based on recurrent neural
networks and transformers—has revolutionized the field. Models such as T5, BART, and GPT have been
ifne-tuned on simplification datasets to generate fluent and coherent simplified texts [ 12, 13] . Some
approaches also incorporate control tokens or complexity constraints to steer the generation process,
allowing for diferent levels of simplification.</p>
      <p>Despite these advances, current systems often fall short of meeting the strict structural and stylistic
requirements of E2R guidelines. For instance, sentence independence, the avoidance of metaphors or
idioms, and the use of explicit references are rarely addressed explicitly in existing models. Moreover,
most simplification datasets are not aligned with the cognitive accessibility criteria needed for E2R
users.</p>
      <p>In the Spanish language, the availability of high-quality corpora and NLP tools tailored to E2R remains
limited. Recent initiatives have begun to explore annotation schemes and simplification pipelines that
conform to UNE 153101:2018, but publicly available benchmarks and evaluation methodologies are still
scarce. These gaps hinder the development of robust, trustworthy, and equitable systems for automated
E2R generation. To help address this challenge, the CLEARS shared task [14] at IberLEF 2025 [15] has
been launched to promote research in this area by providing annotated datasets, well-defined evaluation
protocols, and a competitive framework that fosters innovation and comparability across systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Definition and Dataset</title>
      <p>Our team participates in Subtask 2, titled “Adaptation of Texts to Easy-to-Read (E2R)”. According to the
task description provided by the organizers, participants are expected to adapt the supplied texts into
the E2R format, following the relevant UNE 153101 EX guidelines and applying accessibility principles
to enhance comprehension for target audiences with cognitive or reading dificulties. To this end,
NLP techniques—including algorithmic and machine learning approaches—may be used to generate
simplified versions that align with E2R standards. While adherence to all UNE rules is encouraged,
full compliance is not mandatory. Evaluation considers the number of UNE-compliant rules efectively
applied in the generated output. An example from the task website is shown below.</p>
      <p>Example of E2R Adaptation
The original text “Del 17 al 23 de febrero, Madrid se convertirá en la capital del atletismo con tres citas
destacadas.”
is adapted into the following E2R version:
Madrid será la ciudad del atletismo
en los próximos días.
3 pruebas importantes de atletismo
se harán en Madrid del 17 al 23 de febrero.</p>
      <p>The dataset used for this task consists of 3,000 news articles from various municipalities in the
province of Alicante (Spain), covering topics such as sports, culture, leisure, and local festivities. It was
developed and validated by a team of domain experts in text accessibility. Each article in the corpus has
two adapted versions: (a) a Plain Language (PL) version, referred to as the “facilitated version,” which
applies basic adaptation criteria but is more flexible in terms of layout and structure (related to Subtask
1); and (b) an Easy-to-Read (E2R) version, which strictly adheres to the UNE 153101:2018 EX guidelines,
including both linguistic and formatting requirements (related to Subtask 2). A detailed description of
the dataset can be found in [16].</p>
      <p>The dataset is split into 70% (2,100 texts) for training and 30% (900 texts) for testing. Spanish was
chosen as the target language because, as noted by Instituto Cervantes, despite being the world’s fourth
most spoken language and the second most widely spoken mother tongue, there is a shortage of corpora
specifically designed for PL and E2R adaptation. Addressing this gap is essential for developing NLP
tools capable of efectively improving textual accessibility and comprehension. The dataset is publicly
available at [17].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Approaches</title>
      <p>We experimented with diferent prompts issued to three diferent Large Language Models (LLM).
1. Instruction-based prompting (Zero-shot): The prompt includes a description of the task, along
with a summary of the UNE 153101:2018 guidelines for Easy-to-Read (E2R) texts, as described by
the AENOR Spanish agency for standardization.
2. Instruction with randomly selected examples (Few-shots): In addition to the task description
and UNE guidelines, the prompt incorporates a set of three randomly chosen training examples.</p>
      <p>These examples are distinct from those used in the test set.
3. Instruction with similarity-based examples (Few-shots): Similar to the previous strategy, but
instead of selecting the examples randomly, they are retrieved based on their semantic similarity
to the test instance being processed.</p>
      <p>To evaluate these prompting strategies, we issued them to three large language models, each with
distinct characteristics:
• Dolphin Mistral: A fine-tuned variant from the Mistral model family, optimized for
instructionfollowing tasks. Dolphin Mistral is designed to generate coherent and contextually appropriate
responses, particularly in tasks requiring structured reasoning and linguistic control.
• Starling-LM: A large language model trained using Reinforcement Learning from Human
Feedback (RLHF). Starling-LM is tailored for open-ended instruction-based tasks, prioritizing
factual accuracy, helpfulness, and safe outputs, which makes it especially suitable for
accessibilityoriented applications.
• LLaMA 3.2–1B–Instruct: A lightweight instruct-tuned model from Meta’s LLaMA 3 family,
comprising approximately 1 billion parameters. It ofers an eficient trade-of between performance
and computational cost, making it ideal for scenarios with limited resources.</p>
      <p>All experiments were conducted using the Ollama [18] framework to run the selected
instructiontuned models locally. Ollama enables streamlined model deployment and inference, providing a practical
solution for executing large language models with minimal setup while ensuring reproducibility and
data privacy.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation and Results</title>
      <p>This section presents the evaluation methodology and experimental results obtained using both the
training set provided by the task organizers and the oficial test set evaluated by the competition
organizers. The goal is to assess the performance of the three instruction-tuned language models under
the diferent prompting strategies described in Section 4.</p>
      <sec id="sec-5-1">
        <title>5.1. Evaluation Metrics</title>
        <p>To quantify system performance, we employ two complementary evaluation metrics. The first is a
semantic similarity measure that captures both surface-level and deeper contextual correspondence
between the automatically simplified texts and the reference texts created by experts. The second metric
focuses on readability, with particular attention to Spanish-language texts, to ensure alignment with
the Easy-to-Read (E2R) guidelines. These metrics are detailed below.</p>
        <p>• Cosine similarity/ Cosine similarity using embeddings: This is a composite metric defined
as the average of two cosine similarity measures. The first computes cosine similarity between the
references and the automatically simplified texts based on term frequency vectors, thus reflecting
lexical overlap. The second computes cosine similarity between sentence embeddings obtained
from a pre-trained language model, capturing deeper semantic relationships beyond surface-level
word matches.
• The Fernández Huerta Readability Index: This readability metric, designed specifically for
Spanish texts, is based on the Flesch-Kincaid readability formula for English. It evaluates texts by
considering average sentence length and average syllable length, assigning a readability score. A
higher readability score suggests a text that is easier to understand.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results</title>
        <p>Tables 1 and 2 present the results obtained using the training set provided by the organizers, under the
three prompting strategies described in Section 4. In the case of the few-shot experiments (Prompts 2
and 3), the examples included in the prompts were excluded from the evaluation to ensure fairness.</p>
        <p>In terms of semantic similarity (see Table 1), measured as the mean of cosine similarity between terms
and embeddings, both Dolphin Mistral and Starling-LM achieve the highest scores (0.80) under Prompt
1 (zero-shot setting). These results remain relatively stable under Prompt 2 (random few-shot), but
show a slight drop with Prompt 3 (similarity-based examples), particularly for Llama-3.2-1B-Instruct,
whose performance decreases more markedly across prompts (from 0.77 to 0.70).</p>
        <p>Regarding the Fernández-Huerta readability index (see Table 2), the results are consistently high
across all models and prompts, indicating that all systems are capable of generating lexically simple
texts. Interestingly, Dolphin Mistral achieves its best score with Prompt 2 (82.33), suggesting that
providing examples may help guide generation in this model. In contrast, Starling-LM shows a slight
drop in Prompt 2, while Llama-3.2-1B-Instruct remains relatively stable but below the other models.</p>
        <p>Overall, these results suggest that zero-shot prompting yields competitive performance, particularly
in terms of semantic fidelity, while the impact of few-shot strategies is model-dependent. Larger or
more instruction-tuned models may benefit more from structured examples, although careful selection
appears to be critical.</p>
        <p>The oficial evaluation results for Subtask 2 are summarized in Tables 3 and 4, ranking participating
teams according to two complementary criteria: semantic similarity and readability.</p>
        <p>In terms of semantic adequacy, as measured by the average of cosine similarity metrics (lexical and
embedding-based), the system submitted by UNED-INEDA ranks fourth, with a score of 0.68. Our
system used the Dolphin Mistral model with Prompt 1, corresponding to the zero-shot setting. While
the top-performing team, NIL-UCM, achieved a slightly higher score of 0.72, the margin separating
the top four systems is relatively small, suggesting competitive performance across the board. The
results also confirm the capacity of zero-shot prompting to achieve strong semantic alignment between
original and simplified texts without relying on examples.</p>
        <p>Regarding readability, assessed using the Fernández-Huerta index, our system again ranks fourth,
with a score of 72.39. This indicates that the texts generated by our model were considerably more
readable than those of NIL-UCM (69.40), but did not reach the simplicity achieved by Vicomtech (85.44)
or UR (85.12).</p>
        <p>Taken together, the results suggest that our approach ofers a solid balance between semantic
preservation and readability, particularly considering the use of a zero-shot strategy. While there is
room for improvement in terms of enhancing readability, the performance of our system confirms the
viability of instruction-based prompting with powerful language models such as Dolphin Mistral in the
context of E2R text generation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>This paper has presented the participation of the UNED-INEDA team in the CLEARS shared task at
IberLEF 2025, specifically in Subtask 2: the adaptation of Spanish texts to the Easy-to-Read (E2R) format.
Our approach explored three prompting strategies applied to three instruction-tuned language models,
with a focus on evaluating their performance in terms of semantic preservation and readability.</p>
      <p>Among the diferent strategies, the system submitted oficially employed the Dolphin Mistral model
with a zero-shot prompt that provided the model with the definition of the task to accomplish and the
specifications of the UNE 153101:2018 EX standard. This configuration delivered competitive results:
our system ranked fourth in both semantic similarity and FH index, and not far from the top ranked
system (particularly in the similarity metric).</p>
      <p>Nonetheless, the gap in readability scores between our system and top-performing teams suggests
opportunities for improvement. Future work will focus on integrating hybrid prompting strategies,
leveraging few-shot examples selected for both lexical and syntactic similarity. Additionally, we plan
to explore the use of constraint-based generation and fine-tuning approaches aligned with the UNE
153101:2018 EX standard to enhance compliance with E2R guidelines.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This work has been partially supported by the 2024 UNED project for the innovations teaching group
GID2017-1 and by the by the Spanish Ministry of Science, Innovation and Universities (project
FairTransNLP PID2021-124361OB-C32) funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A
way of making Europe.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to check grammar and spelling.
[2] R. Petelin, Considering plain language: issues and initiatives, Corporate Communications: An</p>
      <p>International Journal 15 (2010) 205–216.
[3] Stableford, S., Mettger, W, Plain language: A strategic response to the health literacy challenge,
2007.
[4] A. Rossetti, P. Cadwell, S. O’Brien, “the terms and conditions came back to bite”:, in: HCI
International 2020 – Late Breaking Papers: Universal Access and Inclusive Design, 2020, pp.
699–711.
[5] M. Nomura, G. S. Nielsen, B. Tronbacke, Guidelines for Easy-to-Read Materials, IFLA Professional
Reports 120, International Federation of Library Associations and Institutions (IFLA), Library
Services to People with Special Needs Section, 2010. URL: https://repository.ifla.org/handle/123456789/
31, revision on behalf of the IFLA/Library Services to People with Special Needs Section.
[6] Valentina Ospina-Henao, Sebastián López Flórez, V. Juan M. Núñez, Óscar Loureda Lamas,
Fernando De la Prieta, Generative ai: Simplifying text for cognitive impairments and non-native
speakers, methodologies and intelligent systems for technology enhanced learning, in: 14th
International Conference, 2024, pp. 33–44.
[7] UNE 153101:2018 EX. Lectura Fácil. Pautas y recomendaciones para la elaboración de documentos,
Norma UNE 153101:2018 EX, Asociación Española de Normalización (UNE), Madrid, España, 2018.
URL: https://www.une.org/encuentra-tu-norma/busca-tu-norma/norma/?c=N0060036, norma
experimental que especifica las pautas y recomendaciones para la adaptación, creación y validación
de documentos en Lectura Fácil.
[8] T. I. Rebekah Joy Sutherland, The evidence for easy-read for people with intellectual disabilities:
A systematic literature review, Journal of Policy and Practice in Intellectual Disabilities 13 (2016)
297–310.
[9] R. Chandrasekar, B. Srinivas, Automatic induction of rules for text simplification, Knowledge-Based</p>
      <p>Systems 10 (1997) 183–190.
[10] S. S. Al-Thanyyan, A. M. Azmi, Automated text simplification: A survey, ACM Comput. Surv. 54
(2021).
[11] Z. Zhu, D. Bernhard, I. Gurevych, A monolingual tree-based translation model for sentence
simplification, in: Proceedings of the 23rd International Conference on Computational Linguistics,
2010, p. 1353–1361.
[12] X. Zhang, M. Lapata, Sentence simplification with deep reinforcement learning, in: Proceedings
of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 584–594.
[13] T. Wang, P. Chen, J. Rochford, J. Qiang, Text simplification using neural machine translation,
in: AAAI’16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, p.
4270–7271.
[14] B. Botella-Gil, I. Espinosa-Zaragoza, A. Bonet-Jover, M. Madina, L. Molino Piñar, P. Moreda,
I. Gonzalez-Dios, M. T. Martín Valdivia, Ureña, Overview of clears at iberlef 2025: Challenge for
plain language and easy-to-read adaptation for spanish texts, Procesamiento del Lenguaje Natural
75 (2025).
[15] J. Á. González-Barba, L. Chiruzzo, S. M. Jiménez-Zafra, Overview of IberLEF 2025: Natural
Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the
Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the
Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS. org, 2025.
[16] I. Espinosa-Zaragoza, J. Abreu-Salas, P. Moreda, M. Palomar, Automatic text simplification for
people with cognitive disabilities: Resource creation within the ClearText project, in: S. Štajner,
H. Saggio, M. Shardlow, F. Alva-Manchego (Eds.), Proceedings of the Second Workshop on Text
Simplification, Accessibility and Readability, INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria,
2023, pp. 68–77. URL: https://aclanthology.org/2023.tsar-1.7/.
[17] B. Botella-Gil, I. Espinosa-Zaragoza, P. Moreda, M. Palomar, Corpus ClearSim, 2024. URL: http:
//hdl.handle.net/10045/151688.
[18] Ollama, Ollama: Run large language models locally, 2023. URL: https://ollama.com, accessed:
2025-06-01.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <article-title>The plain language movement</article-title>
          ,
          <source>in: The Oxford Handbook of Language and Law</source>
          , Oxford University Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>