1. Introduction

Sentence-level Scientific Text Simplification With Just a Pinch of Data

Marvin M. Agüero-Torales

Carlos Rodríguez Abellán

Carlos A. Castaño Moraga

0 0 CoE of Data Intelligence , Fujitsu, Camino Cerro de los Gamos, 1, Pozuelo de Alarcón, 28224, Madrid , Spain

We present our CLEF 2025 SimpleText Task 1.1 submission (Lenguaje-Claro team), demonstrating that competitive sentence simplification of scientific text can be achieved with only a 'pinch' of high quality data. Our approach uses GPT-3.5-Turbo, o4-mini and T5-Efficient with zero-shot and three-shot prompt on three sample sentences, complemented by a lightweight ensemble and rule-based model simplifiers. Then, a unified LLM-based judge selects or, if necessary, regenerates outputs below a quality threshold. Experiments show that GPT-3.5-Turbo with a three-shot prompt outperforms all other modules, establishing a good baseline in data-scarce settings.

eol>Text Simplification Plain Language Few-Shot Learning Ensemble Methods LLM-as-a-judge Low-resource NLP

1. Introduction

and three-shot prompting of

GPT-3.5-Turbo [ 2 ], o4-mini [ 3 ] and

T5-Efficient-small [4] models,

• A rule-based model, • A lightweight ensemble (choosing the shorter simplified text) and,

2. Methodology

In this section, we describe our methodology in more detail about our participation in SimpleText shared-task [ 7 ] about scientific text simplification at sentence-level [ 8 ]. First, we describe the data 1The SARI score is the arithmetic mean of the accuracies and recoveries of n-grams for add, keep and delete. A higher SARI score indicates greater simplicity or readability [ 6 ]. used for few-shot prompt, the simplifiers, and then the ensembles and LLM-judges over the simplifiers’ results.

2.1. Few-Shot Prompting with Pinch of Data

We curate three representative pairs of complex-simple sentence samples for a few-shot prompting with Microsoft Copilot tool [ 9 ]. These examples were synthetically created to cover three core phenomena rather than being chosen at random: (i) biomedical terminology, (ii) numerical information, and (iii) discourse marker splitting. Previous studies [ 10, 11 ] showed that high quality targeted samples outperform random sampling (in our pilots, more than 0.2 SARI), demonstrating that a small but carefully curated ’pinch’ of data can reliably guide LLM simplification.

Each input to GPT-3.5-Turbo and o4-mini is prefixed with the prompt listed in Listing 1. You a r e a h e l p f u l p l a i n l a n g u a g e a s s i s t a n t f o r c l e a r , e a s y , p l a i n , and s i m p l e t e x t s i m p l i f i c a t i o n .

S i m p l i f y t h e f o l l o w i n g s c i e n t i f i c − s t y l e s e n t e n c e i n t o a c o n c i s e , p l a i n − E n g l i s h s e n t e n c e t h a t p r e s e r v e s meaning .

Example 1 : Complex : " . . . " S i m p l i f i e d : " . . . " Example 2 : Complex : " . . . " S i m p l i f i e d : " . . . " Example 3 : Complex : " . . . " S i m p l i f i e d : " . . . " Now s i m p l i f y : Complex : " . . . " S i m p l i f i e d ( l i m i t t o max . 45 c h a r a c t e r s ) :

Listing 1: Prompt for GPT-based models.

On the other hand, for the T5-Efficient-small model we use a more simple prompt (see Listing 2). These minimal settings ensure that the model learns key simplification patterns without extensive ifne-tuning.

S i m p l i f y : I n p u t : " . . . " O u t p u t : " . . . " I n p u t : " . . . " O u t p u t : " . . . " I n p u t : " . . . " O u t p u t : " . . . " S i m p l i f y : I n p u t : " . . . " O u t p u t ( l i m i t t o max . 45 c h a r a c t e r s ) :

Listing 2: T5-Efficient-small prompt. 2.2. Modular Simplifiers

In addition to the aforementioned models, we implement a rule-based simplifier and ensemble model. Rule-based simplifier The rule-based simplifier removes parentheticals and splits at discourse markers. We used these markers {and, but, or, so, because} for the simple approach, and then add these for the complex approach {although, however, therefore, moreover, meanwhile, nevertheless, nonetheless, yet, still, then, thus, consequently}.

Ensemble simplifier To improve robustness and ensure output quality in low-data scenarios, we implemented a lightweight heuristic ensemble strategy that selects the best simplification among available candidates. Specifically, we collect output from the rule-based, T5-based, and optionally GPTbased modules, discard any empty or whitespace-only strings, and then choose the shortest nonempty output inspired by readability studies linking brevity to simplicity. In a quantitative analysis, 68% of the shortest candidates retained the full factual content while removing peripheral clauses, justifying the length as a proxy for simplicity [ 12 ].

In cases of ties, we apply a fixed priority: GPT > T5 > Rule. This simple yet efective selection mechanism favors brevity and leverages the higher performance of GPT-based simplifications when available. Our ensembler uses the logic shown in Appendix A.

2.3. Unified Evaluation and Ensemble

The outputs with the three highest SARI metric scores (see the results in Table 1) are passed as candidates to our LLM-as-a-judge Unified Evaluator, which scores each simplification candidate in terms of fluency , adequacy, and simplicity on a scale from 1 to 5. This scoring is based on a consistent prompting format in which the original and simplified sentences are provided, and the model is asked to return three comma-separated numbers (one per criterion).

The evaluator is designed to be model-agnostic and supports both local Hugging Face models (e.g., LLaMA, Gemma, etc.) and remote deployments via Azure OpenAI [ 13 ] (e.g., GPT-3.5-Turbo) through the ChatCompletion API [ 14 ]. If the average score of all three criteria for the top candidate falls below a configurable threshold (2.5 in our setting), the evaluator triggers a fallback regeneration process using GPT-3.5-Turbo. This regeneration prompt difers slightly from Listing 1 and is more concise (see Listing 3).

P l e a s e s i m p l i f y t h i s s c i e n t i f i c s e n t e n c e i n t o c o n c i s e , p l a i n E n g l i s h ( l i m i t t o max . 4 5 c h a r a c t e r s ) : " . . . " .

Listing 3: Unified Evaluator’s fallback prompt.

The freshly generated simplification replaces the weaker candidates. The final selection always favors the highest-rated candidate or the regenerated version when necessary.

This unified evaluation pipeline enables automatic quality assessment and controlled generation in a consistent and scalable manner across diferent model settings.

3. Experiments

We evaluated our systems and models on the oficial Task 1.1 test set. Additionally, we performed experiments with the text length of complex sentences, in order to define the adequate length of the simplified sentences ( Truncation in Table 1). We set the optimal length to 45 characters. Table 1 summarizes the performance.

Although absolute diferences are modest, GPT-3.5-Turbo with three-shot prompting consistently surpasses all other modules by +1.8 SARI average. Fallback generation recovers 0.3 SARI when initial candidates fail.

4. Conclusion

We demonstrate that high quality sentence simplification can be obtained with minimal and synthetic data via strategic prompt engineering and a modular ensemble. GPT-3.5-Turbo with three-shot prompting sets a new low-resource baseline for CLEF SimpleText Task 1.1. Future work will explore automated example selection, use no-synthetic few-shot data, try more setups, and domain adaptation.

Acknowledgments

This research was supported by cloud credits from Fujitsu’s Microsoft Azure subscription. The authors thank the organizers of CLEF 2025 and SimpleText Lab 2025 for creating the perfect ecosystem for the shared task.

Declaration on Generative AI

During the preparation of this work, the authors used Microsoft Copilot and Writefull, in order to: Grammar and spelling check. Further, the authors used DeepL in order to: Translation. After using these tools and services, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

A. Ensemble logic

The corresponding logic is implemented in the ensemble component as shown below in Python language: " " " s e l e c t s h o r t e s t o f non −empty o u t p u t s , t i e − b r e a k by p r i o r i t y [ ' gpt ' , ' t 5 ' , ' r u l e ' ] " " " c a n d i d a t e s = { ' r u l e ' : r _ o u t , ' t 5 ' : t 5 _ o u t , ' gpt ' : g p t _ o u t } f i l t e r e d = { k : v f o r k , v i n c a n d i d a t e s . i t e m s ( ) i f v . s t r i p ( ) } i f n o t f i l t e r e d : r e t u r n r _ o u t p i c k s h o r t e s t , t h e n by p r i o r i t y b e s t = min ( f i l t e r e d . i t e m s ( ) , key = lambda x : ( l e n ( x [ 1 ] ) ,

[1]

Ondov ,

Attal ,

Demner-Fushman , A survey of automated methods for biomedical text simplification , Journal of the American Medical Informatics Association 29 ( 2022 ) 1976 - 1988 . doi: 10 .1093/jamia/ocac149.

[2] OpenAI, Model - openai api - gpt-3 .5-turbo, 2025 . https://platform.openai.com/docs/models/gpt-3. 5-turbo [Date: 2025 -06-16].

[3] OpenAI, Model - openai api - o4 -mini , 2025 . https://platform.openai.com/docs/models/o4-mini [Date: 2025 -06-16].

[4]

Tay ,

Dehghani ,

Rao ,

Fedus ,

Abnar ,

H. W.

Chung ,

Narang ,

Yogatama ,

Vaswani ,

Metzler , Scale eficiently: Insights from pretraining and finetuning transformers , in: International Conference on Learning Representations , 2022 . URL: https://openreview.net/forum?id= f2OYVDyfIB.

[5]

Liu ,

Nam ,

Cui ,

Swayamdipta , Evaluation under imperfect benchmarks and ratings: A case study in text simplification , 2025 . URL: https://arxiv.org/abs/2504.09394. arXiv: 2504 . 09394 .

[6]

Xu ,

Napoles ,

Pavlick ,

Chen ,

Callison-Burch , Optimizing statistical machine translation for text simplification , Transactions of the Association for Computational Linguistics 4 ( 2016 ) 401 - 415 . URL: https://aclanthology.org/Q16-1029/. doi: 10 .1162/tacl_a_ 00107 .

[7]

Ermakova ,

Azarbonyad ,

Bakker ,

Vendeville ,

Kamps , Overview of the CLEF 2025 SimpleText track: Simplify scientific texts (and nothing more) , in: J. Carillo de Albornoz , et al. (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2025 ), LNCS, Springer-Verlag, 2025 .

[8]

Bakker ,

Vendeville ,

Ermakova ,

Kamps , Overview of the clef 2025 simpletext task 1: Simplify scientific text , in: G. Faggioli , et al. (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025 ), CEUR Workshop Proceedings, CEUR-WS.org, 2025 .

[9] Microsoft , What is microsoft 365 copilot? , 2025 . https://learn.microsoft.com/en-us/copilot/ microsoft-365/microsoft-365 - copilot-overview [Date: 2025 -06-18].

[10]

Schick ,

Schütze , Few-shot text generation with natural language instructions , in: M. -

F. Moens , X.

Huang , L.

Specia , S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , Association for Computational Linguistics, Online and

Punta

Cana , Dominican Republic, 2021 , pp. 390 - 402 . URL: https://aclanthology.org/ 2021 .emnlp-main. 32 /. doi: 10 .18653/v1/ 2021 .emnlp-main. 32 .

[11]

Su ,

Meng ,

Baker ,

Collier , Few-shot table-to-text generation with prototype memory , in: M. -

F. Moens , X.

Huang , L.

Specia , S. W.-t. Yih (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021 , Association for Computational Linguistics , Punta Cana, Dominican Republic, 2021 , pp. 910 - 917 . URL: https://aclanthology.org/ 2021 .findings-emnlp. 77 /. doi: 10 .18653/v1/ 2021 .findings-emnlp. 77 .

[12]

Siddharthan , Syntactic simplification and text cohesion, Research on Language and Computation 4 ( 2006 ) 77 - 109 . doi: 10 .1007/s11168-006-9011-1.

[13] Microsoft

Azure

, Azure openai in foundry models, 2025 . https://azure.microsoft.com/en-us/ products/ai-services/openai-service [Date: 2025 -07-07].

[14] Microsoft

Learn

, Work with chat completion models - azure openai in azure ai foundry models , 2025 . https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/chatgpt [Date: 2025 -07-07].