<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentence-level Scientific Text Simplification With Just a Pinch of Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marvin M. Agüero-Torales</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Rodríguez Abellán</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos A. Castaño Moraga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CoE of Data Intelligence</institution>
          ,
          <addr-line>Fujitsu, Camino Cerro de los Gamos, 1, Pozuelo de Alarcón, 28224, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present our CLEF 2025 SimpleText Task 1.1 submission (Lenguaje-Claro team), demonstrating that competitive sentence simplification of scientific text can be achieved with only a 'pinch' of high quality data. Our approach uses GPT-3.5-Turbo, o4-mini and T5-Efficient with zero-shot and three-shot prompt on three sample sentences, complemented by a lightweight ensemble and rule-based model simplifiers. Then, a unified LLM-based judge selects or, if necessary, regenerates outputs below a quality threshold. Experiments show that GPT-3.5-Turbo with a three-shot prompt outperforms all other modules, establishing a good baseline in data-scarce settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text Simplification</kwd>
        <kwd>Plain Language</kwd>
        <kwd>Few-Shot Learning</kwd>
        <kwd>Ensemble Methods</kwd>
        <kwd>LLM-as-a-judge</kwd>
        <kwd>Low-resource NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>and
three-shot
prompting
of</p>
      <p>
        GPT-3.5-Turbo
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
o4-mini
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and
      </p>
      <sec id="sec-1-1">
        <title>T5-Efficient-small [4] models,</title>
        <p>• A rule-based model,
• A lightweight ensemble (choosing the shorter simplified text) and,</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        In this section, we describe our methodology in more detail about our participation in SimpleText
shared-task [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] about scientific text simplification at sentence-level [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. First, we describe the data
1The SARI score is the arithmetic mean of the accuracies and recoveries of n-grams for add, keep and delete. A higher SARI
score indicates greater simplicity or readability [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
used for few-shot prompt, the simplifiers, and then the ensembles and LLM-judges over the simplifiers’
results.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Few-Shot Prompting with Pinch of Data</title>
        <p>
          We curate three representative pairs of complex-simple sentence samples for a few-shot prompting
with Microsoft Copilot tool [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. These examples were synthetically created to cover three core
phenomena rather than being chosen at random: (i) biomedical terminology, (ii) numerical information,
and (iii) discourse marker splitting. Previous studies [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ] showed that high quality targeted samples
outperform random sampling (in our pilots, more than 0.2 SARI), demonstrating that a small but
carefully curated ’pinch’ of data can reliably guide LLM simplification.
        </p>
        <p>Each input to GPT-3.5-Turbo and o4-mini is prefixed with the prompt listed in Listing 1.
You a r e a h e l p f u l p l a i n l a n g u a g e a s s i s t a n t f o r c l e a r , e a s y , p l a i n ,
and s i m p l e t e x t s i m p l i f i c a t i o n .</p>
        <p>S i m p l i f y t h e f o l l o w i n g s c i e n t i f i c − s t y l e s e n t e n c e i n t o a c o n c i s e ,
p l a i n − E n g l i s h s e n t e n c e t h a t p r e s e r v e s meaning .</p>
        <p>Example 1 : Complex : " . . . " S i m p l i f i e d : " . . . "
Example 2 : Complex : " . . . " S i m p l i f i e d : " . . . "
Example 3 : Complex : " . . . " S i m p l i f i e d : " . . . "
Now s i m p l i f y :
Complex : " . . . "
S i m p l i f i e d ( l i m i t t o max . 45 c h a r a c t e r s ) :</p>
        <p>Listing 1: Prompt for GPT-based models.</p>
        <p>On the other hand, for the T5-Efficient-small model we use a more simple prompt (see Listing
2). These minimal settings ensure that the model learns key simplification patterns without extensive
ifne-tuning.</p>
        <p>S i m p l i f y :
I n p u t : " . . . " O u t p u t : " . . . "
I n p u t : " . . . " O u t p u t : " . . . "
I n p u t : " . . . " O u t p u t : " . . . "
S i m p l i f y :
I n p u t : " . . . "
O u t p u t ( l i m i t t o max . 45 c h a r a c t e r s ) :</p>
        <sec id="sec-2-1-1">
          <title>Listing 2: T5-Efficient-small prompt.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Modular Simplifiers</title>
        <p>In addition to the aforementioned models, we implement a rule-based simplifier and ensemble model.
Rule-based simplifier The rule-based simplifier removes parentheticals and splits at discourse
markers. We used these markers {and, but, or, so, because} for the simple approach, and then add
these for the complex approach {although, however, therefore, moreover, meanwhile, nevertheless,
nonetheless, yet, still, then, thus, consequently}.</p>
        <p>
          Ensemble simplifier To improve robustness and ensure output quality in low-data scenarios, we
implemented a lightweight heuristic ensemble strategy that selects the best simplification among
available candidates. Specifically, we collect output from the rule-based, T5-based, and optionally
GPTbased modules, discard any empty or whitespace-only strings, and then choose the shortest nonempty
output inspired by readability studies linking brevity to simplicity. In a quantitative analysis, 68% of the
shortest candidates retained the full factual content while removing peripheral clauses, justifying the
length as a proxy for simplicity [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>In cases of ties, we apply a fixed priority: GPT &gt; T5 &gt; Rule. This simple yet efective selection
mechanism favors brevity and leverages the higher performance of GPT-based simplifications when
available. Our ensembler uses the logic shown in Appendix A.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Unified Evaluation and Ensemble</title>
        <p>The outputs with the three highest SARI metric scores (see the results in Table 1) are passed as candidates
to our LLM-as-a-judge Unified Evaluator, which scores each simplification candidate in terms
of fluency , adequacy, and simplicity on a scale from 1 to 5. This scoring is based on a consistent
prompting format in which the original and simplified sentences are provided, and the model is asked
to return three comma-separated numbers (one per criterion).</p>
        <p>
          The evaluator is designed to be model-agnostic and supports both local Hugging Face models (e.g.,
LLaMA, Gemma, etc.) and remote deployments via Azure OpenAI [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (e.g., GPT-3.5-Turbo) through
the ChatCompletion API [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. If the average score of all three criteria for the top candidate falls below
a configurable threshold (2.5 in our setting), the evaluator triggers a fallback regeneration process using
GPT-3.5-Turbo. This regeneration prompt difers slightly from Listing 1 and is more concise (see
Listing 3).
        </p>
        <p>P l e a s e s i m p l i f y t h i s s c i e n t i f i c s e n t e n c e i n t o c o n c i s e , p l a i n E n g l i s h
( l i m i t t o max . 4 5 c h a r a c t e r s ) : " . . . " .</p>
        <p>Listing 3: Unified Evaluator’s fallback prompt.</p>
        <p>The freshly generated simplification replaces the weaker candidates. The final selection always favors
the highest-rated candidate or the regenerated version when necessary.</p>
        <p>This unified evaluation pipeline enables automatic quality assessment and controlled generation in a
consistent and scalable manner across diferent model settings.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>We evaluated our systems and models on the oficial Task 1.1 test set. Additionally, we performed
experiments with the text length of complex sentences, in order to define the adequate length of the
simplified sentences ( Truncation in Table 1). We set the optimal length to 45 characters. Table 1
summarizes the performance.</p>
      <p>Although absolute diferences are modest, GPT-3.5-Turbo with three-shot prompting consistently
surpasses all other modules by +1.8 SARI average. Fallback generation recovers 0.3 SARI when initial
candidates fail.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>We demonstrate that high quality sentence simplification can be obtained with minimal and synthetic
data via strategic prompt engineering and a modular ensemble. GPT-3.5-Turbo with three-shot
prompting sets a new low-resource baseline for CLEF SimpleText Task 1.1. Future work will explore
automated example selection, use no-synthetic few-shot data, try more setups, and domain adaptation.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was supported by cloud credits from Fujitsu’s Microsoft Azure subscription. The authors
thank the organizers of CLEF 2025 and SimpleText Lab 2025 for creating the perfect ecosystem for the
shared task.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Microsoft Copilot and Writefull, in order to:
Grammar and spelling check. Further, the authors used DeepL in order to: Translation. After using these
tools and services, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Ensemble logic</title>
      <p>
        The corresponding logic is implemented in the ensemble component as shown below in Python
language:
" " "
s e l e c t s h o r t e s t o f non −empty o u t p u t s ,
t i e − b r e a k by p r i o r i t y [ ' gpt ' , ' t 5 ' , ' r u l e ' ]
" " "
c a n d i d a t e s = { ' r u l e ' : r _ o u t , ' t 5 ' : t 5 _ o u t , ' gpt ' : g p t _ o u t }
f i l t e r e d = { k : v f o r k , v i n c a n d i d a t e s . i t e m s ( ) i f v . s t r i p ( ) }
i f n o t f i l t e r e d : r e t u r n r _ o u t
p i c k s h o r t e s t , t h e n by p r i o r i t y
b e s t = min (
f i l t e r e d . i t e m s ( ) , key = lambda x : ( l e n ( x [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ) ,
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ondov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Attal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <article-title>A survey of automated methods for biomedical text simplification</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>29</volume>
          (
          <year>2022</year>
          )
          <fpage>1976</fpage>
          -
          <lpage>1988</lpage>
          . doi:
          <volume>10</volume>
          .1093/jamia/ocac149.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] OpenAI, Model - openai
          <source>api - gpt-3</source>
          .5-turbo,
          <year>2025</year>
          . https://platform.openai.com/docs/models/gpt-3. 5-turbo [Date:
          <fpage>2025</fpage>
          -06-16].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] OpenAI, Model - openai api - o4
          <source>-mini</source>
          ,
          <year>2025</year>
          . https://platform.openai.com/docs/models/o4-mini [Date:
          <fpage>2025</fpage>
          -06-16].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abnar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yogatama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <article-title>Scale eficiently: Insights from pretraining and finetuning transformers</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/forum?id= f2OYVDyfIB.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          ,
          <article-title>Evaluation under imperfect benchmarks and ratings: A case study in text simplification</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2504.09394. arXiv:
          <volume>2504</volume>
          .
          <fpage>09394</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pavlick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <article-title>Optimizing statistical machine translation for text simplification</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>4</volume>
          (
          <year>2016</year>
          )
          <fpage>401</fpage>
          -
          <lpage>415</lpage>
          . URL: https://aclanthology.org/Q16-1029/. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00107</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 SimpleText track: Simplify scientific texts (and nothing more)</article-title>
          , in: J.
          <string-name>
            <surname>Carillo de Albornoz</surname>
          </string-name>
          , et al. (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), LNCS, Springer-Verlag,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 simpletext task 1: Simplify scientific text</article-title>
          , in: G.
          <string-name>
            <surname>Faggioli</surname>
          </string-name>
          , et al. (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Microsoft</surname>
          </string-name>
          ,
          <source>What is microsoft 365 copilot?</source>
          ,
          <year>2025</year>
          . https://learn.microsoft.com/en-us/copilot/ microsoft-365/microsoft-365
          <string-name>
            <surname>-</surname>
          </string-name>
          copilot-overview [Date:
          <fpage>2025</fpage>
          -06-18].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>Few-shot text generation with natural language instructions</article-title>
          , in: M.
          <article-title>-</article-title>
          <string-name>
            <surname>F. Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W.-t. Yih (Eds.),
          <source>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>390</fpage>
          -
          <lpage>402</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <volume>32</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp-main.
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <article-title>Few-shot table-to-text generation with prototype memory</article-title>
          , in: M.
          <article-title>-</article-title>
          <string-name>
            <surname>F. Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W.-t. Yih (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>910</fpage>
          -
          <lpage>917</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .findings-emnlp.
          <volume>77</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .findings-emnlp.
          <volume>77</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siddharthan</surname>
          </string-name>
          ,
          <article-title>Syntactic simplification</article-title>
          and text cohesion,
          <source>Research on Language and Computation</source>
          <volume>4</volume>
          (
          <year>2006</year>
          )
          <fpage>77</fpage>
          -
          <lpage>109</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11168-006-9011-1.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Microsoft</surname>
            <given-names>Azure</given-names>
          </string-name>
          , Azure openai in foundry models,
          <year>2025</year>
          . https://azure.microsoft.com/en-us/ products/ai-services/openai-service [Date:
          <fpage>2025</fpage>
          -07-07].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Microsoft</surname>
            <given-names>Learn</given-names>
          </string-name>
          ,
          <article-title>Work with chat completion models - azure openai in azure ai foundry models</article-title>
          ,
          <year>2025</year>
          . https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/chatgpt [Date:
          <fpage>2025</fpage>
          -07-07].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>