-

ITL-International Journal of Applied Lin

10.3389/fpsyg.2022.707630

Introducing MultiLS-IT: A Dataset for Lexical Simplification in Italian

Laura Occhipinti

0 0 University of Bologna , Italy

2024

29 6057 6062

Lexical simplification is a fundamental task in Natural Language Processing, aiming to replace complex words with simpler synonyms while preserving the original meaning of the text. This task is crucial for improving the accessibility of texts, particularly for users with reading dificulties, second language learners, and individuals with lower literacy levels. In this paper, we present MultiLS-IT, the first dataset specifically designed for automatic lexical simplification in Italian, as part of the larger multilingual Multi-LS dataset. We provide a detailed account of the data collection and annotation process, including complexity scores and synonym suggestions, along with a comprehensive statistical analysis of the dataset. With MultiLS-IT, we fill a significant gap in the field of Italian lexical simplification, ofering a valuable resource for developing and evaluating automatic simplification models. Our analysis highlights the diversity of complexity levels in the dataset and discusses the moderate agreement among annotators, underscoring the subjective nature of lexical complexity assessment.

eol>lexical simplification lexical complexity prediction Italian dataset human annotations

1. Introduction

1. the prediction of word complexity, which in

volves identifying the words that need to be simplified [15]; 2. the replacement of complex words with simple synonyms [16].

Lexical simplification is a highly complex task within Natural Language Processing, encompassing broader automatic text simplification eforts [ 1]. It is defined as the task of replacing complex words with simpler synonyms that are more accessible to speakers, while preserving Lexical complexity prediction (1) normally involves the original text’s meaning [2]. A complex word is one assigning a complexity value to a lexical item in conthat is dificult for some readers to decode due to various text, ranging from 0 to 1, where 0 represents maximum characteristics that hinder comprehension [3, 4]. simplicity and 1 denotes complexity [4]. This approach

This area of research is of significant interest both is a more advanced evolution of the traditional binary socially and in computational applications. Socially, au- Complex Word Identification (CWI) [ 3], which classified tomatic simplification can enhance text comprehension words simply as complex or not complex. By moving for individuals with reading dificulties [ 5, 6], second lan- towards a gradualism approach, lexical complexity preguage learners [7], those with cognitive disabilities [8], diction provides a finer-grained, continuous assessment or individuals with lower literacy levels [9]. In general, of word dificulty, allowing for more tailored simplificamaking texts accessible to everyone is a democratic act, tion eforts. as it ensures that information and knowledge are avail- The replacement of complex words with simpler synable to all members of society, regardless of their reading onyms (2) comprises three subtasks: the generation of ability or educational background [10]. substitutes, the ranking based on complexity, and the

From a computational perspective, it proves valuable selection of the most appropriate substitute [14]. This for complex tasks such as machine translation [11], infor- multi-step process ensures that the chosen synonym not mation retrieval [12], and summarisation [13] in addition only reduces complexity but also fits seamlessly into the to being an integral part of generic text simplification [ 1]. original context.

The ability to simplify text efectively can improve the One of the major challenges for such a user-dependent performance of these applications by making the input and therefore complex task is the lack of extensive annodata more uniform and easier to process [2]. tated linguistic resources needed to train and evaluate au

Lexical simplification encompasses various subtasks tomatic simplification models [ 2, 4]. Annotated datasets [14]. The two most important ones are: are crucial for developing and testing algorithms that can perform these tasks accurately.

In this context, we present MultiLS-IT, which is, to the best of our knowledge, the first dataset specifically designed for automatic lexical simplification in the Italian language. This resource is part of a larger multilingual dataset, Multi-LS (Multilingual Lexical Simplification) The SemEval 2021 shared task on lexical complexity [17], created for a shared task at the BEA workshop [18]1. prediction [15] ofered datasets for both single words and The main contributions of this work are: multi-word expressions in English, emphasizing continuous complexity judgments rather than binary classifica• A detailed description of the data collection and tions.

annotation process of the Italian sub-dataset; The SimpleText workshop at CLEF [23], initiated in • A descriptive analysis including statistics and 2021, aims to improve the accessibility of scientific inforvisualizations providing an overview of the mation by providing benchmarks for text simplification, dataset’s characteristics; further expanding resources for this task. • The establishment of a reference point for future The TSAR-2022 shared task [16] provided extensive research in lexical simplification for Italian. annotations for lexical simplification in English, Spanish, and Portuguese, allowing participants to predict simple

With this work, we aim to fill a significant gap in substitutions for complex words. lexical simplification research for Italian and provide a These datasets have catalyzed significant research and solid foundation for future studies and more efective development in the efild. For instance, the availability of lexical simplification technologies. such resources has enabled the implementation of full lexical simplification pipelines [24, 25, 26]. 2. Related works The majority of these datasets have typically concentrated on individual sub-tasks within the simplification Most datasets developed for lexical simplification have pipeline, such as complex word identification (or lexical primarily focused on a few languages, with English be- complexity prediction) or substitute generation. This diing the most resourced language [18]. In recent years, vision often limits the ability to comprehensively address however, there has been notable progress in creating re- the entire lexical simplification process. sources for other languages, such as Spanish, Portuguese, In this context, Multi-MLSP represents a significant and Japanese, which has facilitated advancements in lex- advancement [17]. It serves as a foundational resource ical simplification tasks for these languages. Despite for the entire simplification pipeline, annotated for both these eforts, specific datasets for the Italian language complexity values and potential substitutes. By providhave been notably absent, hindering the development of ing a well-structured and annotated dataset, Multi-MLSP comprehensive lexical simplification systems for Italian. facilitates comprehensive research and development in

Many of these valuable datasets have been developed lexical simplification, addressing both complexity predicwithin the context of various shared tasks. The first one tion and the generation of simpler substitutes2. was proposed for SemEval 2012 [19]. It addressed English Despite these advancements, Italian has lagged behind lexical simplification and provided a platform for eval- due to the lack of dedicated resources. uating systems that could rank substitution candidates by simplicity, using a dataset enriched with simplicity 2.1. Lexical Simplification Research in rankings from second language learners. Italian

The CWI task at SemEval 2016 [20] focused on predicting which words in a sentence would be considered Numerous studies have explored automatic simplification complex by non-native English speakers, creating a new for Italian [27], and several parallel corpora have been dataset of 9,200 instances and attracting significant par- developed within these research projects [28, 29, 30, 31]. ticipation. These corpora provide a valuable foundation for imple

Expanding to multiple languages, the BEA 2018 CWI menting automatic models for text simplification by preshared task [21] included English, German, and Spanish, senting original texts aligned with their simplified verand introduced a multilingual task with French, promot- sions. However, they primarily focus on syntactic simpliing the development of models capable of classifying ifcation rather than lexical simplification, limiting their word complexity across diferent languages. utility for tasks that require detailed lexical annotations.

The IberLEF 2020 forum [22] advanced Spanish lexical We attempted to extract the lexical simplifications simplification by providing binary complexity judgments present in the available corpora using text comparison beover educational texts, contributing to the available re- tween simple and complex sentences with the difflib sources for Spanish. library. The lack of annotations made the recognition of substitutions complex and required significant manual efort. From the exploration of these substitutions, how1While some general information about the entire dataset has already been published in these papers [17, 18], the detailed process of constructing the Italian resource has not been thoroughly discussed until now.

2The resource, including the Italian part, is available for download

from https://github.com/MLSP2024/MLSP_Data.

Target Context

Lo stile è molto popolareggiante, a volte quasi con ostentazione popolareggiante (specialmente in alcune canzoni, che sembrano costituite da centoni di proverbi 0.3 popolari), ma senza per questo risultare afettato.

Lo stile è molto popolareggiante, a volte quasi con ostentazione ostentazione (specialmente in alcune canzoni, che sembrano costituite da centoni di proverbi 0.12 popolari), ma senza per questo risultare afettato.

Lo stile è molto popolareggiante, a volte quasi con ostentazione afettato (specialmente in alcune canzoni, che sembrano costituite da centoni di proverbi 0.52

popolari), ma senza per questo risultare afettato. ever, we realized that the steps of lexical simplification each repetition focusing on a diferent target word. Conhave never been truly systematized. sequently, the dataset includes a total of 600 sentences,

The only resource used to identify complex words and corresponding to 600 target words. potential simpler substitutes has been Nuovo Vocabolario For each target word, the dataset provides an average di Base [32], a dictionary of common Italian words. This complexity value. This value is calculated by aggregating resource, although fundamental and significant for the the complexity ratings assigned by individual annotators. Italian language, is primarily built on the basis of word Additionally, the dataset includes a series of substitute frequency. However, as we know from the literature words for each target word. These substitutes are or[33], we cannot consider only a single measure, such as dered primarily by the frequency with which they were frequency, as a comprehensive parameter of complexity. suggested by the annotators. In cases where multiple

Furthermore, this resource, due to its nature as a substitutes have the same frequency, they are listed alstatic list, has inherent limitations in identifying complex phabetically. words and generating suitable substitutes. For instance, consider the word abolizione (abolition), which is not 3.1. Data Preparation included in De Mauro’s basic vocabulary list, whereas its verb counterpart abolire (to abolish) is present. Speakers For the construction of the MultiLS-IT dataset, we started familiar with the meaning of abolire would likely compre- by selecting the first 200 Italian words as outlined in the hend abolizione relatively easily, deducing its meaning guidelines. The chosen words represent single lexical as the action or process of abolishing. This example un- units, thus multi-word expressions were excluded4. derscores the limitation of solely relying on predefined The selection process ensured that the words were sufreference lists, as speakers can understand logically con- ficiently complex to justify lexical complexity annotation nected words within their lexicon. and that simpler substitutes could be found within the

Given this scenario, there is a clear need for more context. Each target word required a minimum of 10 comprehensive and annotated datasets that specifically annotators. address lexical simplification in Italian. Prior to selecting the words, we chose texts for the corpus. Given that the shared task, in the context of which this dataset was constructed, focused on educa3. Dataset tional applications, we selected texts related to educational settings, specifically Italian literature. This choice MultiLS-IT is the Italian portion of a broader multilin- was reinforced by the importance of lexical simplicfiation gual dataset, MultiLS. The overall dataset comprises 10 tasks in educational contexts, such as schools. diferent languages: Catalan, English, Filipino, French, To ensure privacy and copyright compliance, texts German, Italian, Japanese, Sinhala, Portuguese, and Span- from Wikimedia, specifically Wikibook and Wikiquote, ish. To ensure consistency across the sub-datasets for were used. These texts are released under the Creative each language, shared guidelines were established [17]3. Commons Attribution-ShareAlike 3.0 license, allowing This section will outline the key aspects specific to the for use and sharing. We maintained a balanced ratio by construction of the resource for Italian. selecting 50% of the texts from Wikibook and 50% from

MultiLS-IT comprises 200 distinct contexts, each con- Wikiquote, as indicated in [18]. taining 3 target words. This design means that each sentence is repeated 3 times, as illustrated in Table 1, with

3The full guidelines are available at: https://github.com/

MLSP2024/MLSP_Data/blob/main/MLSP%20Shared%20Task% 20%40%20BEA%202024%20\protect\discretionary{\char\ hyphenchar\font}{}{}%20Annotation%20Guidelines%20\protect\ discretionary{\char\hyphenchar\font}{}{}%20V1.0.pdf.

4The guidelines provided two options for selecting words: we could

either translate part of a sample list of 200 English words provided, or use this list as a guide to understand the type and distribution of words to select. We opted for the second approach, selecting the Italian words independently while using the English list only as a reference.

Web material extraction was carried out using BootCat The selection of the two additional words involved a [34], a tool that allows for automated collection of texts manual search for content words—nouns, verbs, or adfrom the web. jectives—that could be substituted without altering the

To ensure the dataset reflected modern Italian usage, meaning or coherence of the sentence. In cases where we applied specific filters to exclude archaic or outdated multiple suitable content words were identified, we priterms. We configured BootCat to focus on texts from the oritized those for which a higher number of simpler sub20th century by using keywords such as ‘20th-century stitutes could be found, applying the same approach used Italian literature’, ‘authors’, ‘female authors’, and ‘writ- for the primary target word. ers’. These filters helped us target contemporary Italian If a sentence did not allow for the selection of all three language and avoid the inclusion of words or expressions target words with suitable substitutions, it was excluded that are no longer in common usage. Through this ap- to ensure consistency across the dataset. This method proach, we ensured that the vocabulary extracted was guaranteed that all selected words were valid candidates relevant for current readers and aligned with modern for lexical simplification and provided a meaningful basis Italian linguistic practices. for analyzing word complexity and substitution potential.

We employed a binary classifier developed for Italian CWI to select the words. The Random Forest model, 3.2. Annotation detailed in [35], classifies words as simple (0) or complex (1) using various linguistic parameters to define lexical Our dataset provides a complexity rating for each tarcomplexity. get word, along with a set of synonyms perceived by

The model was trained on a dataset comprising 13,319 annotators as simpler alternatives for replacement. words, labeled as simple or complex. To avoid subjective For the first task, annotators were instructed to assign choices, this list of words was created based on linguistic a complexity rating based on ‘how simple or complex the resources related to L2 learning, ensuring an objective target word might be for a typical Italian native speaker’. selection process. It is important to note that the com- Ratings were distributed on a 5-point Likert scale: plexity classification was done without considering the context in which the words appear due to the lack of 1. very easy - words that are very familiar available resources. This dataset includes features such 2. easy - words that are mostly familiar as word frequency from two corpora (ItWac [36] and 3. neutral - when the word is neither dificult nor Subtlex-it [37]), word length, syllable count, vowel count, easy stop word identification, number of senses, POS tags, 4. dificult - words whose meanings are unclear but number of morphemes, morphological density, and the can be inferred from the context frequency of lexical morphemes. These metrics are com- 5. very dificult - words that are very unclear. monly used because they have a significant impact on lexical complexity [38]. Additionally, pre-trained word The prediction of lexical complexity involves assigning embeddings from fastText were incorporated to enhance a complexity score to a lexical item in context, typically the model’s predictions. The model underwent rigorous ranging from 0 to 1. The aggregated complexity score, validation, demonstrating strong performance in accu- computed as the average of individual complexity ratings, racy, precision, recall, and F1 score. The classifier efec- initially ranged from 1 to 5 and was normalized using tively utilized the combined linguistic features and word the min-max function following the Complex 2.0 format embeddings, providing a robust method for predicting [39] as provided by the guidelines. The resulting scores word complexity. were rounded to the nearest two decimal places.

This model was applied to the corpus of educational For the second task, annotators were asked to suggest texts. To select the 200 words, we observed the complex- 1 to 3 synonyms that could replace the target word with ity probabilities assigned by the model and chose those simpler alternatives, aiming to enhance sentence comprewith the highest probabilities, ensuring that they allowed hension. The substitutions were selected to ensure that for easy identification of simpler synonyms. the meaning of the original word and the overall context

For each sentence, in addition to the primary target was preserved, and that the substitution was easier to unword, we selected two additional content words to ensure derstand than the original target. If the annotator could a balanced representation of lexical complexity within not find a simpler substitute, they were instructed to enthe context. These words were chosen based on their ter the target word itself as the suggestion to indicate semantic relevance to the sentence and their potential for that the term is the simplest word. simplification, meaning they could plausibly be replaced Specific instructions were provided to the annotators with simpler synonyms. The aim was to cover a range for the Italian dataset to avoid further complicating the of complexity levels, avoiding an over-representation of already challenging task of finding suitable synonyms. It either very simple or overly complex words. was permissible to disregard gender agreement within the context. Additionally, pronominal verbs were to be of the association between two ranked variables without treated as single entities that could be replaced by other assuming a linear relationship. types of verbs. For example, mobilitarsi (to mobilise one- We calculated the Spearman correlation coeficient for self) could be substituted with agire (to act). each pair of annotators, using the spearmanr function

To ensure dataset robustness, a minimum of 10 anno- from the scipy.stats module. This process was retations per word was required. Both complexity rating peated for all possible annotator pairs within each of and synonym suggestion tasks were assigned to the same the 20 Google Forms, each annotated by at least 10 angroup of annotators for consistency. notators. For each form, we then calculated the mean

Data collection was facilitated through Google Forms, Spearman correlation coeficient to summarize the level where annotators evaluated sentences and proposed sub- of agreement among annotators for that form. stitutions. We distributed 20 unique forms, each contain- The overall mean of the Spearman correlation coefiing 30 sentences, and automated data compilation using cients across all forms provides a single numerical meaGoogle App Script. Distribution channels included social sure of inter-annotator agreement for the entire dataset. media platforms like Instagram and Facebook, along with This value is 0.4230. direct outreach to native speakers for participation. The inter-annotator agreement value indicates a mod

Additionally, manual quality control was performed erate level of consistency among annotators in their comto ensure the reliability of the annotations. This included plexity ratings. This reflects the inherent subjectivity checking that annotators had used the full range of anno- in assessing lexical complexity but also highlights the tations and verifying that the complexity judgments were general alignment in annotators’ judgments. consistent with those of other annotators. For synonym The process of finding and suggesting synonyms is insuggestions, we checked the suitability of the substitu- herently more variable and subjective, making it dificult tions within the context and monitored the frequency to measure agreement in the same statistical manner as with which annotators were unable to find a simplifica- for ordinal complexity ratings. tion.

In total, 215 annotators participated, ensuring diverse 3.4. Statistical Analysis and comprehensive representation. The metadata summarizing annotator demographics is presented in Table 2.

To gain a comprehensive statistical overview of our cor

pus, we calculated key metrics including the distribution of complexity values and the average length of sentences. This analysis provides insights into the characteristics of the dataset, which are essential for understanding the nature of the lexical simplification task.

Age Years in education Nr. of L2-languages Hours reading/week Number of native annotators L1-languages 36.39 (11.23) 17.33 (3.27) 2.17 (0.93) 7.39 (6.96) 215 Italian

This structured approach ensured data quality and reliability, crucial for subsequent analyses and computational model development in lexical complexity research.

3.3. Inter-Annotator Agreement

To evaluate the reliability of the complexity ratings, we

calculated the inter-annotator agreement. This was done by assessing the consistency of the complexity scores assigned by diferent annotators to the same target words.

Given that our dataset consists of ordinal data representing complexity values ranging from 1 to 5, we employed Spearman’s rank correlation coeficient to measure agreement. Spearman’s correlation is appropriate for ordinal data as it assesses the strength and direction

The distribution of complexity values in the MultiLS-IT

dataset is summarized as follows: the average complexity score across all target words is 0.276, with a standard deviation of 0.168. The range of complexity values spans from 0.0 to 0.88. This distribution is visualized in Figure ing the dataset’s quality. Additionally, the application 1. of more advanced computational models and the explo

Additionally, we analyzed the sentence lengths within ration of real-world use cases will further contribute to the dataset. The average sentence length is 29.30 words, the development of sophisticated tools for lexical simwith a standard deviation of 10.36 words. This measure plification. We hope that this dataset will serve as a helps in understanding the context provided for each tar- foundation for future research and development in auget word, which is crucial for annotators when assigning tomatic simplification, ultimately making information complexity scores and suggesting simpler synonyms. more accessible and comprehensible to all.

Furthermore, we investigated the correlation between sentence length and word complexity. The correlation coeficient between these two variables is 0.11, indicating References a very weak relationship. This suggests that the complexity of a word is not significantly influenced by the length of the sentence in which it appears. [1] H. Saggion, G. Hirst, Automatic text simplification,

volume 32, Springer, 2017. [2] G. Paetzold, L. Specia, Lexical simplification with neural ranking, in: Proceedings of the 15th Confer4. Conclusions ence of the European Chapter of the Association for Computational Linguistics: Volume 2, Short PaIn this study, we present MultiLS-IT, the first dataset pers, 2017, pp. 34–40. URL: https://aclanthology. specifically designed for automatic lexical simplification org/E17-2006. in Italian. As part of the larger Multi-LS dataset, it ad- [3] M. Shardlow, A comparison of techniques to autodresses a significant gap in resources for lexical simpli- matically identify complex words., in: 51st annual ifcation in Italian. Despite its limited size, we believe meeting of the association for computational linthat MultiLS-IT ofers a valuable starting point for the guistics proceedings of the student research workdevelopment and evaluation of automatic simplification shop, 2013, pp. 103–109. models. Our detailed description of the data collection [4] K. North, M. Zampieri, M. Shardlow, Lexical comand annotation process, including complexity ratings and plexity prediction: An overview, ACM Computing synonym suggestions, provides a protocol that we hope Surveys 55 (2023) 1–42. will be followed and extended to increase the resources [5] D. De Hertog, A. Tack, Deep learning architecture available for the Italian language. for complex word identification, in: Proceedings

Our analysis revealed that the average complexity of the thirteenth workshop on innovative use of score of all target words is 0.276, with a standard de- NLP for building educational applications, 2018, pp. viation of 0.168, highlighting the range of complexity 328–334. levels within the dataset. Including more diverse and [6] S. Stajner, Automatic text simplification for socomplex contexts would provide a richer resource for cial good: Progress and challenges, in: C. Zong, training and evaluating simplification models. F. Xia, W. Li, R. Navigli (Eds.), Findings of the

The inter-annotator agreement value of 0.4230 reflects Association for Computational Linguistics: ACLa moderate level of consistency among annotators, em- IJCNLP 2021, Association for Computational Linphasizing the inherent subjectivity in assessing lexical guistics, Online, 2021, pp. 2637–2652. URL: https: complexity. This relatively low value highlights the need //aclanthology.org/2021.findings-acl.233. doi: 10. to increase the sample size of both the dataset and the 18653/v1/2021.findings-acl.233. number of annotators to obtain more robust results. [7] J. S. Lee, C. Y. Yeung, Personalizing lexical simpli

Future work should focus on expanding the dataset to ifcation, in: Proceedings of the 27th International include a greater variety of texts and more annotators to Conference on Computational Linguistics, 2018, pp. improve the reliability and generalizability of the results. 224–232.

Our goal is to create broader resources that enable the [8] X. Chen, D. Meurers, Linking text readability and development of robust and efective lexical simplifica- learner proficiency using linguistic complexity feation technologies that can improve text accessibility and ture vector distance, Computer Assisted Language comprehension for a wide range of readers. Learning 32 (2019) 418–447.

In conclusion, while MultiLS-IT represents a signifi- [9] W. M. Watanabe, A. C. Junior, V. R. Uzêda, R. P. cant step forward in the field of lexical simplification for d. M. Fortes, T. A. S. Pardo, S. M. Aluísio, Facilita: Italian, there is still considerable potential for growth. reading assistance for low-literacy readers, in: ProExpanding the dataset to include a broader range of texts, ceedings of the 27th ACM international conference increasing the number of annotators, and refining the on Design of communication, 2009, pp. 29–36. annotation guidelines are all crucial steps toward improv- [10] H. Saggion, J. O’Flaherty, T. Blanchet, S. Sharof, S. Sanfilippo, L. Muñoz, M. Gollegger, A. Rascón, on Tools and Resources for People with REAdJ. L. Martí, S. Szasz, et al., Making democratic delib- ing DIficulties (READI) @ LREC-COLING 2024, eration and participation more accessible: the idem ELRA and ICCL, Torino, Italia, 2024, pp. 38–46. URL: project, in: SEPLN – CEDI 2024 Seminar of the https://aclanthology.org/2024.readi-1.4. Spanish Society for Natural Language Processing - [18] M. Shardlow, F. Alva-Manchego, R. Batista-Navarro, 7th Spanish Conference on Informatics., 2024. S. Bott, S. Calderon Ramirez, R. Cardon, T. François, [11] S. Štajner, M. Popović, Can text simplification help A. Hayakawa, A. Horbach, A. Hülsing, Y. Ide, J. M. machine translation?, in: Proceedings of the 19th Imperial, A. Nohejl, K. North, L. Occhipinti, N. P. Annual Conference of the European Association Rojas, N. Raihan, T. Ranasinghe, M. S. Salazar, for Machine Translation, 2016, pp. 230–242. S. Štajner, M. Zampieri, H. Saggion, The BEA [12] C. Burges, T. Shaked, E. Renshaw, A. Lazier, 2024 shared task on the multilingual lexical simM. Deeds, N. Hamilton, G. Hullender, Learning plification pipeline, in: E. Kochmar, M. Bexte, to rank using gradient descent, in: Proceedings J. Burstein, A. Horbach, R. Laarmann-Quante, of the 22nd international conference on Machine A. Tack, V. Yaneva, Z. Yuan (Eds.), Proceedings learning, 2005, pp. 89–96. of the 19th Workshop on Innovative Use of NLP [13] Z. Cao, F. Wei, L. Dong, S. Li, M. Zhou, Ranking for Building Educational Applications (BEA 2024), with recursive neural networks and its application Association for Computational Linguistics, Mexto multi-document summarization, in: Proceedings ico City, Mexico, 2024, pp. 571–589. URL: https: of the AAAI conference on artificial intelligence, //aclanthology.org/2024.bea-1.51.

volume 29, 2015. [19] L. Specia, S. K. Jauhar, R. Mihalcea, Semeval-2012 [14] M. Shardlow, Out in the open: Finding and cate- task 1: English lexical simplification, in: * SEM gorising errors in the lexical simplification pipeline, 2012: The First Joint Conference on Lexical and in: N. Calzolari, K. Choukri, T. Declerck, H. Lofts- Computational Semantics–Volume 1: Proceedings son, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, of the main conference and the shared task, and S. Piperidis (Eds.), Proceedings of the Ninth In- Volume 2: Proceedings of the Sixth International ternational Conference on Language Resources Workshop on Semantic Evaluation (SemEval 2012), and Evaluation (LREC’14), European Language Re- 2012, pp. 347–355. sources Association (ELRA), Reykjavik, Iceland, [20] G. Paetzold, L. Specia, SemEval 2016 task 11: Com2014, pp. 1583–1590. URL: http://www.lrec-conf. plex word identification, in: S. Bethard, M. Carpuat, org/proceedings/lrec2014/pdf/479_Paper.pdf . D. Cer, D. Jurgens, P. Nakov, T. Zesch (Eds.), [15] M. Shardlow, R. Evans, G. H. Paetzold, M. Zampieri, Proceedings of the 10th International Workshop SemEval-2021 task 1: Lexical complexity predic- on Semantic Evaluation (SemEval-2016), 2016, pp. tion, in: A. Palmer, N. Schneider, N. Schluter, 560–569. URL: https://aclanthology.org/S16-1085. G. Emerson, A. Herbelot, X. Zhu (Eds.), Proceed- doi:10.18653/v1/S16-1085. ings of the 15th International Workshop on Se- [21] S. M. Yimam, C. Biemann, S. Malmasi, G. Paetzold, mantic Evaluation (SemEval-2021), Association for L. Specia, S. Štajner, A. Tack, M. Zampieri, A reComputational Linguistics, Online, 2021, pp. 1– port on the complex word identification shared task 16. URL: https://aclanthology.org/2021.semeval-1.1. 2018, in: Proceedings of the Thirteenth Workshop doi:10.18653/v1/2021.semeval-1.1. on Innovative Use of NLP for Building Educational [16] H. Saggion, S. Štajner, D. Ferrés, K. C. Sheang, Applications, 2018, pp. 66–78.

M. Shardlow, K. North, M. Zampieri, Findings of [22] J. A. Ortiz-Zambranoa, A. Montejo-Ráezb, the tsar-2022 shared task on multilingual lexical Overview of alexs 2020: First workshop on lexical simplification, in: Proceedings of the Workshop on analysis at sepln, in: Proceedings of the Iberian Text Simplification, Accessibility, and Readability Languages Evaluation Forum (IberLEF 2020), (TSAR-2022), 2022, pp. 271–283. volume 2664, 2020, pp. 1–6. [17] M. Shardlow, F. Alva-Manchego, R. Batista-Navarro, [23] L. Ermakova, P. Bellot, P. Braslavski, J. Kamps, S. Bott, S. Calderon Ramirez, R. Cardon, T. François, J. Mothe, D. Nurbakova, I. Ovchinnikova, E. SanA. Hayakawa, A. Horbach, A. Hülsing, Y. Ide, Juan, Overview of simpletext 2021-clef workshop J. M. Imperial, A. Nohejl, K. North, L. Occhip- on text simplification for scientific information acinti, N. Peréz Rojas, N. Raihan, T. Ranasinghe, cess, in: Experimental IR Meets Multilinguality, M. Solis Salazar, M. Zampieri, H. Saggion, An Multimodality, and Interaction: 12th International extensible massively multilingual lexical simplifi- Conference of the CLEF Association, CLEF 2021, cation pipeline dataset using the MultiLS frame- Virtual Event, September 21–24, 2021, Proceedings work, in: R. Wilkens, R. Cardon, A. Todirascu, 12, Springer, 2021, pp. 432–449.

N. Gala (Eds.), Proceedings of the 3rd Workshop [24] K. North, M. Zampieri, T. Ranasinghe, Alexsis-pt: A