1. Introduction

SINAI Participation in SimpleText Task 2 at CLEF 2024: Zero-shot Prompting on GPT-4-Turbo for Lexical Complexity Prediction

Jenny Ortiz-Zambrano

César Espin-Riofrio

Arturo Montejo-Ráez

1 0 University of Guayaquil , Delta Av. s/n, Guayaquil, 090510 , Ecuador 1 University of Jaén , Las Lagunillas s/n, Jaén, 23071 , Spain

In this article, we present our participation in Tasks 2.1 and 2.2 of the SimpleText track of CLEF 2024. Our work focused on the implementation of zero-shot learning using the GPT-4 Turbo autoregressive model. To this end, we develop and evaluate various cues to optimize the model's ability to predict lexical complexity. The results of our experiments indicated that GPT-4 Turbo can perform this task with remarkably robust performance, demonstrating its potential to assess language complexity efectively without the need for additional training.

eol>Lexical Complexity Prediction Auto-regressive models GPT-4 Turbo Prompting Zero-shot learning

1. Introduction

Readability is defined as the quality that makes a text more accessible and easier to read [ 1 ], but for many people, the way a text is written can be a barrier to understanding its content [ 2 ] due to the presence of infrequent or unknown words, and phrases with lexical and semantic complexity which drastically complicate the reader’s understanding [ 3 ]; this is especially evident in cases such as children, non-native speakers [ 4 ] and people with various cognitive abilities or reading disabilities [ 5 ]. The success or failure of understanding a text will depend on the reader’s prior knowledge of the meaning of the words [ 6 ].

Information technologies have facilitated access to a wide and abundant amount of information in various fields such as education, news, social networks, health, government, and also science; In the case of scientific literature to the general public, it has increased thanks to digitalization. However, this wealth of information is not available to everyone, since many people face significant obstacles to understanding , such as the complexity of grammatical structures, the use of technical language and the length of sentences, which directly afects individuals with intellectual disabilities, people with low levels of literacy, and even university students who, despite their academic training and specialized knowledge, can also be found among those who experience dificulties in reading and understanding complex texts [ 7 ]. In the case of scientific information, a significant barrier persists that makes direct access to scientific knowledge from the original sources dificult, where one of the main obstacles lies in the complexity of scientific texts, which presents dificulties for those without experience in the field due to to the lack of prior specialized knowledge [ 8 ].

The SimpleText laboratory [ 9 ] is part of the CLEF 2024 [ 10 ] initiative, which promotes the systematic evaluation of information access systems through experimentation with shared tasks. SimpleText addresses the challenges of text simplification, especially in the context of making scientific information accessible. In this sense, it provides relevant data and reference points, as the general public tends to avoid reliable sources such as scientific literature due to its complex language and lack of prior knowledge. Instead, they rely on superficial and derivative sources on the web and social media, which are often published for commercial or political rather than informational purposes.

Can simplifying the text help overcome these barriers to access? That is the central question that this task addresses. Its objective is to generate a simplified summary of several scientific documents, based on a popular science query. This summary will provide the user with an accessible overview of the specific topic, with the hope of making scientific information more accessible and understandable to a broader audience.

The main objective of this research is to demonstrate the capacity of the language model based on Transformers like GPT-4 Turbo for lexical simplification. To achieve this, several variants of prompts without samples have been created and evaluated. This approach allows us to determine which concepts in scientific summaries require additional explanation and contextualization to help the reader understand the scientific text more efectively.

The article follows the following structure: first, a brief description is given of the current stateof-the-art in the prediction of lexical complexity and the identification of complex words (which are synonymous with the same natural language processing task). Section 3 presents Task 2 of the Simple Text track. Section 5 details our solution and the results obtained using diferent indication variants. Finally, Section 6 presents the conclusions and raises some ideas for future research.

2. Related Work

In recent years, Complex Word Identification (CWI) has attracted great interest in the scientific community and among computational linguistics researchers. These initiatives have significantly boosted development and research in the field of text simplification and improving information accessibility. The proposals include conferences, workshops and specific tasks that address challenges and promote advances in the simplification of texts, thus facilitating greater access to information for various audiences.

This advancement is reflected in the development of computational semantic analysis systems, as illustrated by several shared Complex Word Identification (CWI) tasks at notable events such as SemEval 2016 [ 11 ], NAACL-HTL 2018 [ 12 ], ALexS at IberLEF 2020 [ 13 ], the 15th edition of SemEval and the first lexical complexity prediction task [ 14 ], TSAR-2022 - the workshop on simplicfiation, accessibility and readability of texts [ 15 ], the SimpleText tasks in CLEF in 2021 [16], 2022 [17], and 2023 [ 8 ] among other relevant initiatives in this area.

2.1. GPT-4 Turbo for solving NLP tasks

State-of-the-art deep learning models, such as BERT [18], RoBERTa [19], GPT-3 [20], and GPT-4 Turbo [21], significantly outperform existing traditional approaches. GPT-4 Turbo, the transformer-based large language model generation developed by OpenAI1, is a prominent example of these advances. These models have demonstrated superior capabilities in various natural language processing tasks, raising the bar for performance and accuracy in the field.

GPT-3 Contains 175 billion parameters. Due to its extensive language knowledge, enormous processing power, and potential to learn from large amounts of online text data, GPT-3 can perform a wide variety of general natural language-based tasks with unprecedented ease, such as the generation and classification of text. [ 22]. The enormous scale of the model allows generating results with quality, precision and diversity of the generated content. This development has sparked considerable interest and concern in the field of NLP, the machine learning industry in general, the media, the AI ethics communities and civil society [23], The latest GPT-4 Turbo model of the OpenAI has been recognized for its ability to process text and images [24].

Although GTP-3 is a generative model, several approaches may be necessary to classify text, mainly ranging from zero-shot classification to single-shot and few-shot classification [ 22]. In zero-shot learning no samples of data are needed [25] .

3. SimpleText@CLEF-2024 Tasks

3.1. Task 2: “What is unclear?” Dificult concept identification and explanation The goal of this task is to identify key concepts that need to be contextualized with a definition, example, and/or use-case and provide useful and understandable explanations for them. Thus, there are three subtasks: 1. To predict what are the terms in a passage of a document and their dificulty (easy/medium/dificult). 2. To generate a definition and an explanation only for the dificult terms. 3. To retrieve the provided definitions of the dificult terms in “correct” order: manual (2), generated positive 1 (1), generated positive 2 (1), generated negative 1 (0), generated negative 2 (0).

In Task 2.1, for each passage of a document, participants should provide a list of terms with corresponding dificulty scores (easy/medium/dificult). Passages (sentences) are considered to be independent, so term repetition is allowed (the same term can be detected in diferent sentences, even in the same document).

In Task 2.2, for each dificult term (terms that have been evaluated with the highest level of dificulty), participants should provide the definition and explanation which will be evaluated both from a qualitative point of view (manual review by terminologists) and from a quantitative point of view (applying overlapping text measures like BLUE, ROUGE, etc.).

In Task 2.3, participants should rank the set of definitions provided for the dificult terms in a way that the “best” definitions are ranked higher in the list of definitions. In particular, for each term there will be one manual definition (considered the best one) and two automatically generated good definitions that should be placed at the top of the list of retrieved definitions. Quantitative metrics (for example, P@1, P@3, rank correlation measures) will be used to evaluate participants’results.

Task 2.1 and Task 2.3 can be performed independently. Participants that want to run experiments on Task 2.2 need to accomplish Task 2.1 first.

4. Proposed system

As mentioned above, our approach is to apply learning cues without prior examples to the GPT-4 Turbo model, extracting predictions from the generated sequence. Details of the model configuration are presented in Table 1. We use the OpenAI API in Python to interact with the model, allowing us to easily integrate GPT-4 Turbo into our workflow. In addition, the OpenAI Playground ofers an intuitive web interface that facilitates experimentation and rapid generation of code necessary for our tests and developments.

4.1. Experiments with zero-shot learning

We execute the model applying zero-shot learning where no examples are specified for learning and a sample of a generated output

Prompt PRM_ZS_TASK2_V1:

Performing the following actions according to a given text: Select a number from 1 to 5 terms within the sentence named text Assign the difficulty of the words: easy, medium and difficult, giving the initial letter of each difficulty as an answer ([e]fácil/[m]edium/[d]ifficul) Give the definition of each difficult term, add an explanation and use case according to context of the sentences.

Provide the response in following JSON format: { "word": <word>, "difficulty": <word_difficult>, "definition": "<definition>", "explanation": <explanation> } Context: {{context}} Text : {{text}} Response: . ## Context: Digital assistant Text : Abstract Novel technological advances in mobile devices and applications can be exploited in wildfire confrontation, enabling endusers to easily conduct several everyday tasks, such as access to data and information, sharing of intelligence and coordination of personnel and vehicles.

Response: json [ { }, { "word": "exploited", "difficulty": "m", "definition": "make full use of and derive benefit from (a resource)", "explanation": "In the context, ’exploited’ refers to utilizing the new technological capabilities of mobile devices and applications to their full advantage, specifically in managing and combating wildfires.", "use_case": "The novel technologies can be exploited to improve the efficiency and effectiveness of wildfire management." "word": "confrontation", "difficulty": "m", "definition": "a hostile or argumentative meeting or situation between opposing parties", "explanation": "Here, ’confrontation’ is used metaphorically to describe the act of dealing with wildfires, suggesting a direct and active engagement with the fires.", "use_case": "Mobile technologies are being used in the confrontation of wildfires, aiding in direct engagement and management." "word": "coordination", "difficulty": "m", "definition": "the organization of the different elements of a complex body or activity so as to enable them to work together effectively", "explanation": "In this context, ’coordination’ refers to the organized management of personnel and vehicles to tackle wildfires efficiently, facilitated by mobile technology.", "use_case": "Effective coordination of personnel and vehicles is crucial for the successful management of wildfire incidents."

Prompt PRM_ZS_TASK2_V2:

Depending on the text, select up to 5 terms considered complex within the sentence, then assign the complexity of the terms to easy, medium and difficult using the first letter of each word, for example if the complexity is easy add the letter "e", If the complexity of the term is medium, add the letter "m", and if the complexity of the term is difficult, you should add the letter "d". Then, from each complex term generate its definition, an explanation and a use case according to the context of the sentence. Finally, provide the response in the following JSON format: { } "word": <word>, "difficulty" <difficult_word>, "definition": "<definition>", "explanation" : <explanations> ## Context: Drug discovery Sentence: A pharmacophore analysis approach was used to investigate and compare different classes of compounds relevant to the drug discovery process (specifically, drug molecules, compounds in high throughput screening libraries, combinatorial chemistry building blocks and nondrug molecules).

Response: json [ { }, { "word": "pharmacophore", "difficulty": "d", "definition": "A set of structural features in a molecule that is recognized by a biological macromolecule and is responsible for its biological activity.", "explanation": "In drug discovery, a pharmacophore analysis is used to identify these structural features which can interact with a biological target, such as a protein, to induce a therapeutic effect. This helps in designing or identifying new drugs that can mimic or block the biological activity.", "use_case": "In the sentence, a pharmacophore analysis approach helps in comparing different classes of compounds to find potential drug candidates." "word": "combinatorial chemistry", "difficulty": "d", "definition": "A method in chemistry where different combinations of building blocks are systematically mixed to generate a large number of different compounds.", "explanation": "Combinatorial chemistry is used in drug discovery to rapidly synthesize and screen large libraries of compounds for potential drug activity. It allows researchers to }, { }, { }, { explore a wide variety of chemical structures.", "use_case": "In the sentence, combinatorial chemistry building blocks are mentioned as part of the classes of compounds relevant to drug discovery." "word": "high throughput screening", "difficulty": "d", "definition": "A method used in drug discovery to quickly conduct millions of chemical, genetic, or pharmacological tests.", "explanation": "High throughput screening (HTS) is crucial in the early stages of drug discovery. It allows researchers to quickly identify active compounds, antibodies, or genes that modulate a particular biomolecular pathway.", "use_case": "The sentence refers to compounds in high throughput screening libraries, indicating these are tested in large-scale assays to find promising drug candidates." "word": "nondrug", "difficulty": "m", "definition": "Substances or compounds that are not considered drugs and do not have therapeutic effects.", "explanation": "In the context of drug discovery, nondrug molecules are those that are used as controls or are part of the compound libraries but are not expected to lead to a therapeutic drug.", "use_case": "The sentence compares nondrug molecules with drug molecules and other compounds to understand their relevance or irrelevance in the drug discovery process." "word": "compounds", "difficulty": "e", "definition": "Chemical substances composed of two or more elements chemically bonded together.", "explanation": "In the context of the sentence, ’compounds’ refers to various chemical entities that are being investigated in the drug discovery process to find those with potential as drug molecules.", "use_case": "The sentence discusses investigating different classes of compounds in relation to their relevance to drug discovery."

Prompt PRM_ZS_TASK2_V3

Identify up to a total of 5 complex words in the sentence and determine the difficulty of each complex word by classifying it into the category of "easy", "medium" or "difficult". For the "easy" difficulty place the letter "e", for the "medium" difficulty place the letter "m", and for the "hard" difficulty place the letter "d". Furthermore, in the case of words identified as complex taking into account the context of the sentence, three things must be generated: a definition, an explanation, and a use case.

Finally, provide the response in the following JSON format: { "word": <word>, "difficulty" :<hard_word>, "definition": "<definition>", "explanation" : <explanations> ## Context: Humanoid robots Sentence: In many situations, users walk on typical paths between specific destinations at which the service of a mobile robot is needed. }, { }, { } "word": "humanoid", "difficulty": "m", "definition": "having an appearance or character resembling that of a human.", "explanation": "In the context of robots, ’humanoid’ refers to robots designed to look and sometimes behave like humans, often used to interact with environments built for human use.", "use_case": "The humanoid robot was programmed to assist customers in the store." "word": "destinations", "difficulty": "m", "definition": "the place to which someone or something is going or being sent.", "explanation": "In the given sentence, ’destinations’ refers to the specific places where the mobile robot is required to perform services.", "use_case": "The tourists set off early in the morning to reach their destinations before sunset." "word": "mobile", "difficulty": "e", "definition": "able to move or be moved freely or easily.", "explanation": "Here, ’mobile’ describes a robot that is capable of moving around rather than being fixed in one location.", "use_case": "The company developed a mobile app to increase accessibility for users on the go." ] ###

4.2. Results

The Table 2 presents various results in response to the approach of sub-task 2.1. To automatically support this task, we have proposed an innovative approach that explores the potential of GPT-4 Turbo, an advanced large language model (LLM) developed by OpenAI as in the work done by [21]. On this occasion, we seek to evaluate the improvements and additional capabilities of GPT-4 Turbo in comparison with the proposal made for the same task in the SimpleText 2023 workshop where the results obtained were outstanding, demonstrating the efectiveness of the GPT-3 model, text-davinci-003 version [26].

The Table 3 presents several results of sub-task 2.2 applying GPT-4 Turbo with zero-shot learning according to the PRM_ZS_TASK2_V2 prompt. As we can see, the table illustrates an example that corresponds to Snt_id G01.1_1000902583_1. In the solution proposed for sub-task 2.2, the model generates a definition, an explanation, and also generates a use case illustrating in a broad way the complexity of the terms for diferent groups of users, making suggestions for definitions, explanations and use cases based on the context of the sentence (Abstract).

As we can see, the table illustrates an example that corresponds to Snt_id G01.1_1000902583_1. In the solution proposed for sub-task 2.2, the model generates a definition, an explanation, and also generates a use case illustrating in a broad way the complexity of the terms for diferent groups of users, making suggestions for definitions, explanations and use cases based on the context of the sentence (Abstract).

Table 4, we present the oficial results published by the organizers [ 27], including the scores obtained d m m d m d m m d m e m d m in our participation in subtasks 2.1 and 2.2. The meaning of each column is the following: • recall overall: the proportion of terms (independently from the dificulty) that were found. • recall average: the average of the recall of terms computed per sentence. • recall dificult terms:

the proportion of dificult terms that were found. • precision dificult:

the precision of terms that were labeled as dificult.

• bleu_nx: the BLEU score computed with ngrams n =1, 2, 3, 4.

According to the results presented in table 4, the UboNLP_Task2.1_phi3-oneshot team showed the best overall performance in terms of recall_overall (0.5424) and recall_average (0.5620), which indicates a high ability to retrieve relevant terms. However, its precision in dificult terms is relatively low (0.3743). Sharingans_Task2.2_GPT also performs well with a recall_overall of 0.4722 and a recall_average of 0.5302, in addition, it has a high recall_terms (0.5448), demonstrating a good ability to handle dificult terms.

According to the performance analysis in the dificult terms, the AIIRLab_Task2.2_LLaMAFT and SINAI_task_2_PRM_ZS_TASK2_V3 teams have a precision_dificult_terms of 1.0000 and 0.8333 respectively, which indicates that these models are very precise with dificult terms, although their recall_overall and recall_average are not as high in comparison with other equipment.

In the performance in BLEU Scores, the unipd_t21t22_chatgpt teams and their variants (mod1 and mod2) have the best bleu_n1_average, bleu_n2_average, bleu_n3_average and bleu_n4_average demonstrating good performance in terms of precision of n-grams in text generation. In the case of AIIRLab_Task2.2_Mistral, this team has a good balance with a recall_overall of 0.4118 and precision_dificult_terms of 0.4907, suggesting that their model is quite robust in general terms and in dificult terms. The SINAI team variants show variations in their results, with SINAI_task_2_PRM_ZS_TASK2_V2 having the best performance among its variants, with a recall_overall term exploited confrontation coordination d m m

Definition: make full use of and derive benefit from (a resource).

Explication: In the context of the sentence, exploited refers to utilizing the new technological capabilities of mobile devices and applications to their fullest potential in managing and combating wildfires. Use case: The novel technologies can be exploited to improve the eficiency and efectiveness of wildfire management.

Definition: a hostile or argumentative meeting or situation between opposing parties.

Explication: Here, confrontation is used metaphorically to describe the act of dealing with or managing wildfires, which are seen as the opposing party to human safety and property. Use case: Technological advances aid in the confrontation of wildfires by providing better tools for managing the crisis.

Definition: the organization of the diferent elements of a complex body or activity so as to enable them to work together efectively.

Explication: In this context, coordination refers to the organized management of personnel and vehicles, ensuring they operate in a synchronized and eficient manner during wildfire emergencies. Use case: Efective coordination of personnel and vehicles is crucial for rapid and eficient wildfire suppression.

Snt_id: G01.1_1000902583_1 Abstract: Novel technological advances in mobile devices and applications can be exploited in wildfire confrontation, enabling end-users to easily conduct several everyday tasks, such as access to data and information, sharing of intelligence and coordination of personnel and vehicles.

dificulty of 0.1556 and precision_dificult_terms of 0.7746. There are teams with values of 0 in almost all metrics, which indicates very low performance or problems in the implementation of their models.

5. Conclusions and Future Work

The model has proven to be able to generate robust responses based on the text or instruction (message) provided. We observe that GPT-4 Turbo analyzes the text, uses its extensive knowledge to identify words that can be considered complex, and categorizes them according to their complexity. Our preliminary analysis evaluates the model’s ability to understand and generate arguments in specific contexts. The results show that GPT-4 Turbo is highly competent in natural language processing tasks, such as predicting lexical complexity, demonstrating its efectiveness in identifying and categorizing complex terms accurately and consistently.

We have applied the GPT-4 Turbo model in the construction of several solutions for sub-tasks 2.1 and 2.2, where the model has demonstrated robust performance in the execution of natural language processing tasks, specifically in lexical simplification and identification of complex words. Not only has it successfully tackled these tasks, but it has also been able to generate detailed illustrations by creating definitions, explanations and use cases based on the context of the sentence (text). By identifying complex words and phrases and ofering simpler versions, it facilitates access to scientific sources in an understandable way, helping readers to understand the complexity of terms in diferent user groups, which makes it very useful for a general audience. and especially valuable for readers with reading dificulties or cognitive disabilities.

The results showed a variety of performances among the participating teams. Some teams, such as UboNLP_Task2.1_phi3-oneshot and Sharingans_Task2.2_GPT, stand out for their overall performance and ability to handle dificult terms, while other teams show specific strengths such as accuracy on dificult terms or good BLEU scores. However, there are also teams with significantly low performances, which could indicate the need for improvements in their approaches or implementations.

6. Acknowledgments

This work has been partially supported by projects CONSENSO (PID2021-122263OB-C21), MODERATES (TED2021-130145B-I00), SocialTOX (PDC2022-133146-C21) funded by the Spanish Government. in: Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), 2022. [16] L. Ermakova, P. Bellot, P. Braslavski, J. Kamps, J. Mothe, D. Nurbakova, I. Ovchinnikova, E. Sanjuan, Overview of SimpleText CLEF 2021 workshop and pilot tasks, in: 12th Conference and Labs of the Evaluation Forum (CLEF 2021), volume 2936, CEUR-WS. org, 2021, pp. 2212–2227. [17] L. Ermakova, E. SanJuan, J. Kamps, S. Huet, I. Ovchinnikova, D. Nurbakova, S. Araújo, R. Hannachi, E. Mathurin, P. Bellot, Overview of the clef 2022 simpletext lab: Automatic simplification of scientific texts, in: A. Barrón-Cedeño, G. Da San Martino, M. Degli Esposti, F. Sebastiani, C. Macdonald, G. Pasi, A. Hanbury, M. Potthast, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction, Springer International Publishing, Cham, 2022, pp. 470–494. [18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [19] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, RoBERTa: A robustly optimized BERT pretraining approach, arXiv preprint arXiv:1907.11692 (2019). [20] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information processing systems 33 (2020) 1877–1901. [21] K. K. Shahandashti, M. Sivakumar, M. M. Mohajer, A. B. Belle, S. Wang, T. C. Lethbridge, Evaluating the efectiveness of gpt-4 turbo in creating defeaters for assurance cases, 2024. arXiv:2401.17991. [22] S. Kublik, S. Saboo, GPT-3: Building innovative NLP products using large language models,

O’Reilly Media (2022). [23] A. Chan, GPT-3 and InstructGPT: technological dystopianism, utopianism, and “Contextual” perspectives in AI ethics and industry, AI and Ethics 3 (2023) 53–64. [24] Y. Hirano, S. Hanaoka, T. Nakao, S. Miki, T. Kikuchi, Y. Nakamura, Y. Nomura, T. Yoshikawa, O. Abe, Gpt-4 turbo with vision fails to outperform text-only gpt-4 turbo in the japan diagnostic radiology board examination, Japanese Journal of Radiology (2024) 1–9. [25] A. Bhattacharjee, R. Morafah, J. Garland, H. Liu, Zero-shot llm-guided counterfactual generation for text, 2024. arXiv:2405.04793. [26] J. Ortiz-Zambrano, C. Espin-Riofrio, A. Montejo-Ráez, Sinai participation in simpletext task 2 at clef 2023: Gpt-3 in lexical complexity prediction for general audience (2023). [27] G. M. D. Nunzio, et al., Overview of the CLEF 2024 SimpleText task 2: Identify and explain dificult concepts, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.

[1] W. H. DuBay, The principles of readability ., Online Submission ( 2004 ).

[2]

Saggion ,

Štajner ,

Bott ,

Mille ,

Rello ,

Drndarevic , Making it simplext: Implementation and evaluation of a text simplification system for spanish , ACM Transactions on Accessible Computing (TACCESS) 6 ( 2015 ) 1 - 36 .

[3]

North ,

Zampieri ,

Shardlow , Lexical complexity prediction: An overview , ACM Computing Surveys 55 ( 2023 ) 1 - 42 .

[4]

Segura-Bedmar ,

Martinez , Simplifying drug package leaflets written in spanish by using word embedding , Journal of Biomedical Semantics 8 ( 2017 ). doi:10.1186/s13326-017-0156-7.

[5]

Saggion ,

Štajner ,

Bott ,

Mille ,

Rello ,

Drndarevic , Making it simplext: Implementation and evaluation of a text simplification system for spanish , ACM Trans. Access. Comput. 6 ( 2015 ). URL: https://doi.org/10.1145/2738046. doi: 10 .1145/2738046.

[6]

Anula , Lecturas adaptadas a la enseñanza del español como l2: variables lingüísticas para la determinación del nivel de legibilidad, La evaluación en el aprendizaje y la enseñanza del español como LE L 2 ( 2008 ) 162 - 170 .

[7]

Alarcón ,

Moreno ,

Martínez , Hulat-ALexS CWI Task- CWI for Language and Learning Disabilities Applied to University Educational Texts, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020 ), CEUR-WS, Malaga , Spain, 2020 .

[8]

Ermakova ,

Azarbonyad ,

Bertin ,

Augereau , Overview of the clef 2023 simpletext task 2: Dificult concept identification and explanation ( 2023 ).

[9]

Ermakova , et al., Overview of CLEF 2024 SimpleText track on improving access to scientific texts , in: L. Goeuriot , et al. (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024 ), Lecture Notes in Computer Science, Springer, 2024 .

[10]

Faggioli ,

Ferro ,

Galuščáková , A . García Seco de Herrera (Eds.), Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum , CLEF 2024 , CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France, 2024 .

[11]

Paetzold , L. Specia, SemEval 2016 Task 11 : Complex Word Identification , 2016 , pp. 560 - 569 . doi: 10 .18653/v1/ S16 -1085.

[12]

S. M.

Yimam ,

Biemann ,

Malmasi ,

Paetzold ,

Specia ,

Štajner ,

Tack ,

Zampieri , A Report on the Complex Word Identification Shared Task 2018 , in: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications , Association for Computational Linguistics, New Orleans, Louisiana, 2018 , pp. 66 - 78 . URL: https://aclanthology. org/W18-0507. doi: 10 .18653/v1/ W18 -0507.

[13]

Ortiz-Zambrano ,

Montejo-Ráez , Overview of ALexS 2020: First workshop on lexical analysis at SEPLN, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020 ), volume 2664 , 2020 , pp. 1 - 6 .

[14]

Shardlow ,

Evans ,

G. H.

Paetzold , M. Zampieri, SemEval -2021 task 1: Lexical complexity prediction , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval2021) , Association for Computational Linguistics , Online, 2021 , pp. 1 - 16 . URL: https://aclanthology. org/ 2021 .semeval- 1 .1. doi: 10 .18653/v1/ 2021 .semeval- 1 .1.

[15]

Štajner ,

Saggion ,

Ferrés ,

Shardlow ,

K. C.

Sheang , K. North,

Zampieri , W. Xu, Proceedings of the Workshop on Text Simplification , Accessibility, and Readability (TSAR- 2022 ),