-

1613-0073

Artificial Intelligence (AI)-based Framework to Automatically Adapt Short Stories in Spanish into Easier and Accessible Versions

Isam Diab

isam.diab@upm.es 0

CEUR Workshop Proceedings (CEUR-WS.org)

Cognitive Accessibility, Text Adaptation, Easy-to-Read Methodology, Artificial Intelligence

0 Ontology Engineering Group (OEG), Universidad Politécnica de Madrid (UPM) , Madrid , Spain

The Easy-to-Read (E2R) Methodology was created with the aim of presenting clear and easily understood documents to improve the daily life of people who present reading comprehension dificulties, such as persons with cognitive disabilities. To do that, the methodology provides a set of guidelines regarding both writing and layout aspects. However, the E2R guidelines are applied manually to create easy-to-read text materials, which demands considerable resources and efort. To help in such a manual process, and considering that cultural materials, in general, and literary materials, in particular, should be accessible for all, our research objective is to develop (a) a technological framework for (semi)-automatically adapting short stories in Spanish into easier and accessible versions close to the E2R principles, and (b) an evaluation framework to ensure that the easier versions of the short stories provided by the technological framework preserve the content and semantics of the original versions. In this regard, the developed technological framework should be useful to help (a) diferent target groups who present comprehension dificulties (e.g. people with cognitive disabilities, low literacy skills or nonnative speakers) and (b) E2R professionals in terms of streamlining the task of text adaptation.

Accessible Versions

CEUR ceur-ws.org

1. Introduction and Motivation

People with cognitive disabilities have some dificulties related to reading comprehension. Over the last decade, the need for comprehensible and accessible materials for people with learning dificulties has received increased attention in the social sphere [ 1 ]. In that sense, a methodology called Easy-to-Read (E2R) [ 2, 3, 4, 5 ] was created to present clear and easily understood contents to diferent sectors of the population that include people with cognitive disabilities and people with limited reading proficiency, among others. The methodology provides a collection of guidelines and recommendations related to both writing and layout aspects, such as avoiding the use of complex syntax, long words, or metaphorical and figurative language, for instance. However, this methodology presents three main limitations: (a) it lacks clear information on how to apply the guidelines and recommendations; (b) the guidelines it proposes are general, i.e. there is no specific information for adapting diferent types of texts; and (c) it relies heavily on manual adaptation, a highly time-consuming and subjective task.

To cover these identified gaps, our research is focused on applying diferent Artificial Intelligence (AI) methods and techniques to (semi)-automatically1 perform, in particular, both the analysis and adaptation based on the E2R Methodology2 of short stories3 written in Spanish. The motivation for leaning towards this specific type of text for our research is based on the following three reasons: 1. Variety of linguistic aspects. Short stories harbour wide vocabulary registers, diferent syntactic structures, pragmatic implications and heterogeneous semantics. 2. Fixed structure. This type of texts generally present a tripartite organisation: introduction, development and denouement. In addition, they are of moderate length. 3. Social impact. The processes involved in reading comprehension are essential for people’s cognitive development [6]. Furthermore, the act of reading literary texts is seen as part of a wider process of human development and growth based on understanding both one’s own experience and the social world [7]. On this account, cultural materials, in general, and literary materials such as short stories, in particular, should be accessible for all.

The rest of the paper is organised as follows: Section 2 is devoted to the state-of-the-art on the automatic approaches for both checking the E2R guidelines in texts and for adapting texts in Spanish into simpler versions. In Section 3 we pose the Open Research Problem as well as the Hypothesis and Research Questions in which we base this PhD thesis. For its part, Section 4 explains the main Objective and the diferent Sub-Objectives of this research. We proceed with the Research Methodology in Section 5, and Section 6 shows some Ongoing Work carried out to date. Finally, we present conclusions and further research.

2. State of the Art

The already introduced Easy-to-Read (E2R) Methodology plays a decisive role in the process of adapting materials. However, as mentioned, its main drawback lies in the manual adaptation method, which demands considerable resources and efort. To address this limitation and make the process more eficient, there are semi-automatic tools both for checking the E2R guidelines in texts and for providing simpler adaptations in Spanish. On the E2R assessment, we can mention Easy-to-Read Advisor [8], an E2R conformance checker for assessing a particular document with respect to the E2R guidelines, and FACILE [9], an extension and improvement of Easy-to-Read Advisor. On the other hand, considering the automatic adaptation of texts into simpler versions, we can point out the following approaches based on simplification techniques: Simplext [10], LexSIS [11], DysWebxia or [12] and EASIER [13]. These existing works are based on simplification techniques at both lexical and syntactic levels. Regarding lexical simplification, 1We refer to a semi-automatic adaptation in the sense that we cannot assume that such an adaptation is completely accurate, so in many cases it will be the adapter who tailors the text to the appropriate context.

2The automatic adaptation can be considered an intralinguistic automatic translation. However, we decided to use the word adaptation along the research since this is the most appropriate terminology in the E2R area.

3In Literature, a short story is a piece of prose fiction that typically can be read in one sitting and focuses on a self-contained incident or series of linked incidents. the tool presented in LexSIS [11] employs three techniques to find a simpler lexical substitute: a word-based vector model, word frequency, and word length. To make such a substitution for a simpler candidate, it relies on semantic resources available online such as OpenThesaurus4 and the Corpus de Referencia del Español Actual (CREA5). The work presented in DysWebxia [12] makes use of LexSIS [11] to replace words that have been identified as complex for people with dyslexia by synonyms that fit their comprehension needs. For its part, the system proposed in EASIER [13] identifies complex words using Support-Vector Machines (SVM) algorithms and replaces them with easier synonyms by consulting external open-access resources such as BabelNet6 and OpenThesaurus. In the framework of the Simplext project [10], we find a double automatic simplification: on the one hand, lexical, based on the implementation of LexSIS and a rule-based simplification; and on the other hand, syntactic. On such a syntactic simplification, they make use of a hand-written computational grammar and focus on reducing sentence complexity.

However, to the best of our knowledge, there is no research work oriented towards the automatic adaptation of a specific literary text type such as short stories or short narratives into easy-to-read versions in Spanish (in languages other than Spanish it is worth mentioning ERNESTA tool [14], in which the authors have worked on the syntactic simplification of short stories in Italian developing a system which focuses on (a) resolving anaphoric references and (b) rewriting sentences in a simpler form using the present tense of verbs). In this regard, it is important to note that the typology of a text is crucial when considering its adaptation, since not all text types share similar linguistic properties and, therefore, a general adaptation of texts is not valid for all typologies. On the other hand, as we mentioned, all existing work focuses on automatically simplifying texts, but we must cautiously discern that simplifying and adapting do not point to similar processes, since adapting implies taking into account the needs of target users, while simplifying focuses on reducing the complexity of formal aspects of a text without necessarily taking into account the target user. Thus, an adaptation must be based on user research in order to be useful in terms of functionality.

3. Open Research Problem, Hypothesis and Research Questions

Based on the foregoing, the Open Research Problem (ORP) in which our research objectives are based is “The number of adapted short stories in Spanish into easy-to-read versions is quite scarce. Thus, literature is limited for people with reading comprehension dificulties”.

In this line, our research work raises the following Research Hypothesis (RH): “The (semi)automatic adaptation of short stories written in Spanish into easier versions close to the E2R Methodology is possible and the resulting adapted versions will maintain the content and semantic information for people with cognitive disabilities”. Having considered both the Open Research Problem (ORP) and the Research Hypothesis (RH), the following Research Questions (RQ) can be proposed: 4https://www.openthesaurus.de/ 5https://www.rae.es/banco-de-datos/crea 6https://babelnet.org/ • RQ1. Which AI-based techniques and methods can be used in the automatic adaptation of short stories in Spanish? • RQ2. Which metrics (indices or formulas) can be useful for assessing the reading comprehension of automatically adapted short stories by people with cognitive disabilities? • RQ3. Are the adapted versions by the technological framework faithful to the content and semantics of the original versions?

4. Objective and Sub-Objectives

As a first attempt, the Research Objective (RO) of this work in progress is to develop (a) a technological framework for adapting (semi)-automatically short stories in Spanish into easier versions close to the E2R Methodology, and (b) an evaluation framework to ensure that the easier versions of the short stories provided by the technological framework preserve the content and semantics of the original versions.

Hence, to achieve our propose, the sub-objectives (SOs) that could be proposed are the following: • SO1. To identify the main cognitive dificulties in short stories regarding reading comprehension by people with cognitive disabilities. This identification will be experiment-based involving user studies. • SO2. To determine and classify the diferent linguistic criteria (in Spanish) related to the identified cognitive problems, in order to select (a) which E2R guidelines should be included in our framework, and (b) to establish a priority order for the guidelines to be included in the framework. • SO3. To determine which AI-based methods and techniques should be considered in the technological framework development; and thus used them in the automation of the guidelines. • SO4. To evaluate whether the adapted versions of short stories provided by the technological framework maintain both the content and semantics of the original versions.

5. Research Methodology

In order to achieve the aforementioned objectives and bear the hypothesis out, the current research work is structured into three stages: 1. User-based Studies. In this first stage we conducted an exhaustive analysis of the cognitive and linguistic issues that afect the readability of short stories, in order to complete the information provided by the E2R Methodology on how to apply it. To achieve this goal, user-based studies are needed. Thus, we designed and set in some studies to find the linguistic aspects involving reading comprehension dificulties.

This stage is directly related to the SO1. 2. Technological Framework Development. In this second stage we will (a) select which guidelines should be included in the framework and (b) establish a priority order based on the results obtained in the user studies, with the aim of automating the selected guidelines by means of Artificial Intelligence (AI) methods and techniques such as Natural Language Processing (NLP), Pattern Matching, or Machine Learning. 3. Evaluation Framework Development. The latter stage will cover the evaluation of the previous processes. Since it will involve persons with cognitive disabilities, the evaluation will also include user studies. In this stage we will implement readability and understandability values to analyse whether the (semi)-automatically adapted versions of short stories maintain both the content and semantics of the original versions.

Stages 2 and 3 will be iterative and they are related to SO2, SO3 and SO4.

In more detail, considering both the objective and sub-objectives and the stages in which the research is structured, the distribution schedule of the diferent tasks is presented in Figure 2, organised over the four years foreseen.

6. Ongoing Work

We are currently in the first stage of the PhD thesis, in which, thanks to the collaboration with several institutions in Spain, we have launched a user study to gather data on the problems that afect reading comprehension of persons with cognitive disabilities. More than 70 users with cognitive or intellectual disabilities participated in this study, including a small group of validators (i.e. people with disabilities who are familiar with the E2R methodology guidelines). In order to extract the cognitive-linguistic problems in the participants’ reading comprehension of short stories, we based the user study on questionnaires7 in which we asked participants about diferent short stories (we collected a corpus of texts ranging from 100 to 350 lines in length each), by proposing questions based on free recall methods, induced recall methods, and metacognitive questions.

After extracting the data, we observed that the problems or dificulties in terms of reading comprehension of short stories are grouped in the following diferent linguistic dimensions: • Vocabulary. Participants present difuculties regarding infrequent words, or adverbs ending in -mente (-ly in English). • Syntax. Structures as clause chains (complex subordinate and coordinated constructions), juxtaposition clauses, or tempo-causal appositions, raise a problem for people with reading comprehension dificulties. • Figurative Language. The use of figurative language such as metaphors in morals are very common in this type of text, which poses a major problem in the literal comprehension of the narratives. • Cohesion and Coherence. Likewise, this type of text, as it is narrative, presents challenges when there appear, for example, anaphoric references to characters (it is preferable to always use proper names instead of pronouns or other reference marks), spatio-temporal inferences of events (discourse markers should be used to organise the narrative clearly), or ambiguity in references to diferent characters (the use of pronouns pointing to more than one character can cause confusion). • Layout. In this dimension the focus is mainly on the format of the dialogues, as they should appear in a theatrical style (i.e. giving prior notice of who the speaker is before the introductory hyphen).

Considering these dimensions, we now have a roadmap on which aspects should be addressed in the second stage regarding the technical framework development.

Furthermore, alongside the launch of the user study, we have been analysing which AI techniques may be the most suitable for our development by implementing several proofs-of-concept based on E2R guidelines, such as to avoid the use of verbal periphrases, to avoid the use of adverbs ending in -mente and superlative forms, to avoid the use of abbreviations and acronyms, to avoid the use of passive voice, or to avoid the use of lexical repetitions. To implement such proofs-ofconcept we have mainly relied on Natural Language Processing (NLP)-based techniques. Some of this work is already described in published papers [9, 15, 16, 17], and other work is accepted for presentation at conferences in the coming months. In these papers we address some of the dificulties in the diferent dimensions outlined above. We have also carried out proofsof-concept in final theses, trying to solve syntactic 8, metaphorical9, semantic10 or dialogue 7A sample questionnaire (in Spanish) in relation to short story 1 is available here: https://short.upm.es/argi7 8https://oa.upm.es/72751/ 9https://oa.upm.es/71157/ 10https://oa.upm.es/75098/ formatting11 issues. For the time being, based on the work already carried out, we have observed that techniques based on declarative methods are proving to be the most efective. The results, which can be found in these published works, are apparently acceptable.

7. Conclusions and Further Research

After laying all this out, and provided the results are favourable, one of the expected conclusions of this PhD thesis work would be considering it as an original contribution, since the task of adaptation of literary texts into easy-to-read versions has not yet been addressed in Spanish.

For this reason this work plan can be ambitious. Thus, there will be a broad path to continue in this research. As a further step, we would like to study particular impairments (not only cognitive disabilities in general), in order to analyse whether a particular adaptation for each disability is needed or not.

Additionally, it would be worthwhile to compile a corpus to apply other AI techniques such as Machine Learning or Deep Learning in terms of, for instance, developing a language model for text adaptation into easy-to-read versions.

On the other hand, the developed framework should be useful to help (a) diferent target groups who present comprehension dificulties (e.g. people with cognitive disabilities, low literacy skills or nonnative speakers) and (b) E2R professionals in terms of streamlining the task of text adaptation.

In addition, this research work will help on (a) the development and fulfilment of the Sustainable Development Goals12 (SDGs) 4, 5 and 10 proposed by the UN, and (b) the recognition and achievement of cognitive accessibility.

Acknowledgments

This work has been funded by the program PEJ-2020-AI/TIC-19542 supported by both Comunidad Autónoma de Madrid (Spain) and Fondo Social Europeo. First, I would like to thank my supervisor, Mari Carmen Suárez de Figueroa Baonza. Furthermore, we thank the diferent institutions that are helping us to improve the daily life problems that people with reading impairments present. 11https://oa.upm.es/75513/ 12https://sdgs.un.org/goals [3] AENOR, Lectura Fácil. Pautas y recomendaciones para la elaboración de documentos (UNE 153101:2018 EX), Asociación Española de Normalización, 2018. [4] Inclusion Europe, Information for All. European standards for making information easy to read and understand, Inclusion Europe, Brüssel, 2009. OCLC: 838005460. [5] E. B. Union, Making information accessible for all, 2017. URL: https://www.euroblind.org/ publications-and-resources/making-information-accessible-all. [6] M. Nikolajeva, Leer ficción es bueno para el desarrollo cognitivo, emocional y social, Alabe Revista de Investigación sobre Lectura y Escritura (2019). doi:10.15645/Alabe2019.20. 12. [7] P. Freire, L. Slover, The importance of the act of reading, The Journal of Education 165 (1983) 5–11. [8] M. C. Suárez-Figueroa, E. Ruckhaus, J. López-Guerrero, I. Cano, Álvaro Cervera, Towards the Assessment of Easy-to-Read Guidelines Using Artificial Intelligence Techniques, in: K. Miesenberger, R. Manduchi, M. C. Rodriguez, P. Peňáz (Eds.), Computers Helping People with Special Needs. ICCHP 2020, volume 12376 of Lecture Notes in Computer Science, Springer, 2020, pp. 74–82. [9] M. C. Suárez-Figueroa, I. Diab, E. Ruckhaus, I. Cano, First steps in the development of a support application for easy-to-read adaptation, Universal Access in the Information Society (2022). doi:10.1007/s10209- 022- 00946- z. [10] H. Saggion, S. Stajner, S. Bott, S. Mille, L. Rello, B. Drndarevic, Making It Simplext: Implementation and Evaluation of a Text Simplification System for Spanish, ACM Transactions on Accessible Computing 6 (2015). [11] S. Bott, L. Rello, B. Drndarevic, H. Saggion, Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish, in: Proceedings of COLING 2012, The COLING 2012 Organizing Committee, Mumbai, India, 2012, pp. 357–374. [12] L. Rello, R. Baeza-Yates, H. Saggion, DysWebxia: Textos más Accesibles para Personas con

Dislexia, Procesamiento del Lenguaje Natural 51 (2013) 205–208. [13] L. Moreno, R. Alarcón, P. Martínez, EASIER system. Language resources for cognitive accessibility., in: The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’20, Association for Computing Machinery, Virtual Event, Greece, 2020, pp. 1–3. [14] G. Barlacchi, S. Tonelli, Ernesta: A sentence simplification tool for children’s stories in italian, 2013, pp. 476–487. doi:10.1007/978- 3- 642- 37256- 8_39. [15] M. C. Suárez-Figueroa, I. Diab, A. González, J. Rivero-Espinosa, First attempt to an easy-to-read adaptation of repetitions in captions, in: Computers Helping People with Special Needs: 18th International Conference, ICCHP-AAATE 2022, Lecco, Italy, July 11–15, 2022, Proceedings, Part I, Springer-Verlag, Berlin, Heidelberg, 2022, p. 417–424. doi:10.1007/978- 3- 031- 08648- 9_48. [16] M. C. Suárez-Figueroa, I. Diab, Á. González Sanz, J. Rivero-Espinosa, Automatic easyto-read translation of morphological structures in spanish texts, Procesamiento del Lenguaje Natural 71 (2023) 191–203. URL: http://journal.sepln.org/sepln/ojs/ojs/index. php/pln/article/view/6553. [17] M. C. Suárez-Figueroa, I. Diab, Á. González, J. Rivero-Espinosa, Towards an automatic easyto-read adaptation of morphological features in spanish texts, in: J. Abdelnour Nocera, M. Kristín Lárusdóttir, H. Petrie, A. Piccinno, M. Winckler (Eds.), Human-Computer Interaction – INTERACT 2023, Springer Nature Switzerland, Cham, 2023, pp. 176–198.

[1]

Matausch ,

Nietzio , Easy-to-read and plain language: Defining criteria and refining rules , 2012 . URL: http://www.w3.org/WAI/RD/2012/easy-to-read/ paper11 .

[2]

Nomura ,

G. S.

Nielsen , International Federation of Library Associations and Institutions , Library Services to People with Special Needs Section, Guidelines for easy-to-read materials , IFLA Headquarters , The Hague, 2010 .