=Paper= {{Paper |id=Vol-3651/DARLI-AP_paper3 |storemode=property |title=Building Foundations for Inclusiveness through Expert-Annotated Data |pdfUrl=https://ceur-ws.org/Vol-3651/DARLI-AP-3.pdf |volume=Vol-3651 |authors=Moreno La Quatra,Salvatore Greco,Luca Cagliero,Michela Tonti,Francesca Dragotto,Rachele Raus,Stefania Cavagnoli,Tania Cerquitelli |dblpUrl=https://dblp.org/rec/conf/edbt/QuatraGCTDRCC24 }} ==Building Foundations for Inclusiveness through Expert-Annotated Data== https://ceur-ws.org/Vol-3651/DARLI-AP-3.pdf
                         Building Foundations for Inclusiveness through
                         Expert-Annotated Data
                         Moreno La Quatra1,† , Salvatore Greco2,*,† , Luca Cagliero2 , Michela Tonti3 , Francesca Dragotto4 ,
                         Rachele Raus5 , Stefania Cavagnoli4 and Tania Cerquitelli2
                         1
                           Kore University of Enna, Enna, Italy
                         2
                           Politecnico di Torino, Turin, Italy
                         3
                           Università degli studi di Bergamo, Bergamo, Italy
                         4
                           Università degli Studi di Roma Tor Vergata, Rome, Italy
                         5
                           Università di Bologna, Bologna, Italy


                                           Abstract
                                           Natural Language Understanding and Generation models suffer from a limited capability of understanding the nuances of inclusive
                                           communication as they are trained on massive data, often including significant portions of non-inclusive content. Even when the models
                                           are specifically designed to address non-inclusive language detection or reformulation, they disregard, to a large extent, inclusiveness-
                                           related features that are likely correlated with the inclusive language nuances, such as the discourse type, level of inclusiveness, and
                                           intended context of use. To assess the importance of additional inclusiveness-related features, we collect a new corpus of Italian
                                           administrative documents humanly annotated by linguistic experts. Linguistic experts not only highlight non-inclusive text snippets and
                                           propose possible reformulations, but also annotate multi-aspect labels related to different inclusive language nuances. We empirically
                                           show that a multi-task learning approach that leverages the multi-aspect annotations can improve the non-inclusive text reformulation
                                           performance, thereby confirming the potential of expert-annotated data in inclusive language processing.

                                           Keywords
                                           inclusive language, natural language processing, text generation, deep learning



                         1. Introduction                                                                                                   tially different: while language editing tools rewrite parts of
                                                                                                                                           the source text based on predefined expert-provided rules,
                         Non-inclusive expressions are widespread in humanly writ-                                                         Natural Language Understanding and Generation models
                         ten documents [1]. Training Natural Language Understand-                                                          can leverage annotations to capture the nuances of anno-
                         ing and Generation models on massive data exposes them                                                            tated text in a self-supervised manner. The use of textual
                         to bias issues related to language inclusiveness. Addressing                                                      annotations also relieves annotators of the task of explicitly
                         this issue is particularly relevant because Artificial Intel-                                                     formulating or adhering to ad hoc linguistic rules.
                         ligence (AI)-based solutions must be used responsibly to                                                             In the context of inclusive language understanding and
                         correctly model inclusive language practices and not unin-                                                        generation, most of the previous work exploits rule-based
                         tentionally marginalize or disadvantage certain groups.                                                           or round-trip translations to annotate texts for inclusivity is-
                            To mitigate the presence of bias in data, applications based                                                   sues [9, 10, 11, 12]. However, these works often overlook the
                         on AI rely on human supervision for model training and                                                            significance of human expert annotations, opting instead
                         post-processing evaluation. This is quite common in the                                                           for rule-based approaches or artificially created datasets
                         areas of Natural Language Understanding and Generative AI,                                                        generated through round-trip translations. The role of lin-
                         in which applications like Large Language Models (LLMs)                                                           guistic annotators in providing specific understanding and
                         provide end-users with conversational and language editing                                                        annotations of language data is crucial for developing more
                         services [2].                                                                                                     inclusive AI models [13, 14].
                            The computational linguistic community has agreed                                                                 A limited body of work has been devoted to generating
                         on the need to leverage human expert annotations in                                                               and exploiting multi-faceted expert human annotations to
                         experience-based learning for bias detection and mitiga-                                                          drive AI models for inclusive language, e.g., [15, 16, 17].
                         tion [3, 4, 5, 6]. However, the linguistics literature often                                                      However, existing benchmarks of annotated text for inclu-
                         underestimates the importance of linguistic annotators be-                                                        sive language processing neglect potentially relevant as-
                         cause of the widespread tendency to value the figures of pre-                                                     pects such as the level of inclusiveness, the intended context
                         and post-editors [7, 8]. Editing and annotation are substan-                                                      of use, and the text genre. These aspects have the poten-
                                                                                                                                           tial to improve the inclusive language understanding and
                         Published in the Proceedings of the Workshops of the EDBT/ICDT 2024                                               generation capabilities of AI models.
                         Joint Conference (March 25-28, 2024), Paestum, Italy                                                                 This paper proposes an expert-annotated dataset cover-
                         *
                           Corresponding author.                                                                                           ing these new aspects and investigates their usefulness in
                         †
                           These authors contributed equally.                                                                              enhancing the performance of the task of non-inclusive text
                         $ moreno.laquatra@unikore.it (M. La Quatra);
                         salvatore_greco@polito.it (S. Greco); luca.cagliero@polito.it
                                                                                                                                           reformulation in the absence of rule-based editing models.
                         (L. Cagliero); michela.tonti@unibg.it (M. Tonti);                                                                    To this end, we enrich a corpus of Italian administrative
                         francesca.dragotto@gmail.com (F. Dragotto); rachele.raus@unibo.it                                                 documents with multi-aspect annotations, providing more
                         (R. Raus); stefania.cavagnoli@uniroma2.it (S. Cavagnoli);                                                         insights into the inclusive language nuances. The purpose is
                         tania.cerquitelli@polito.it (T. Cerquitelli)                                                                      to enable the study of new features describing inclusiveness
                         € https://www.mlaquatra.me/ (M. La Quatra);
                         https://grecosalvatore.github.io/ (S. Greco)
                                                                                                                                           aspects neglected by existing approaches, such as the level of
                          0000-0001-8838-064X (M. La Quatra); 0000-0001-7239-9602                                                         inclusiveness, register, and genre. By enriching the language
                         (S. Greco); 0000-0002-7185-5247 (L. Cagliero); 0000-0001-5306-3054                                                descriptions with new inclusiveness-related features, we
                         (R. Raus); 0000-0002-9039-6226 (T. Cerquitelli)                                                                   provide the research community with new resources to
                                   © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
                                   tion 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
enhance the understanding and writing capabilities of AI-           By learning language inclusiveness patterns from a di-
based solutions.                                                  versified, context-dependent set of expert annotations, AI
  We also collect preliminary results on the use of multi-        models gain exposure to subtle interpretive differences. The
aspect annotations in a multi-task learning approach to           consistency across annotations is ensured through detailed
enhance non-inclusive language reformulations. The results        guidelines and instructions provided to experts. Before full
confirm the potential of the inclusiveness-related expert         annotation, a collaborative analysis of a sample set identifies
annotations.                                                      any divergent interpretations to refine guidance.

                                                                  Statistics on annotated data. Table 1 reports the num-
2. The annotation process                                         ber of annotated sentences for each aspect, separately for
The term annotation is often used to indicate the process         the training, validation, and test sets.
by which textual data are subjected to a tightly interrelated
two-phase activity [6]: a) Identification, selection, and lo-                 Task ID    Train    Validation   Test
calisation of specific documents, and b) Interpretation and                    NILR      6491        956       579
labeling of those documents. The first phase entails identify-                 ILC       9207       1421       866
ing and detailing the text segments that exhibit the linguistic                 RC       2167       338        247
phenomenon under investigation. Subsequently, in the in-                        GC       2166       338        248
terpretation phase, the selected occurrences are humanly
                                                                  Table 1
labeled. These annotations may encompass various forms
                                                                  Statistics on data. NILR=Non-Inclusive Language Reformulation,
ranging from a selection of pre-established alternatives to       ILC=Inclusiveness Level Classification, RC=Register Classifica-
free-text comments or possible reformulations.                    tion, GC=Genre Classification.
   Unlike human annotators, AI models often lack cognitive
abilities such as common sense reasoning and generaliza-
tion capabilities due to the relatively limited numbers of
linguistic examples used for model training compared to           Example of annotations. Table 2 shows an example of
the impressive variety of natural language forms.                 an Italian annotated sentence (as well as the correspond-
   Human annotators need sufficient expertise to interpret        ing English translation for non-Italian readers). Linguistics
nuanced linguistic phenomena and assign appropriate labels        experts assign different annotations to each sentence. In
adequately. Their annotations are at the base of a supervised     this example, they have assigned three labels to the sen-
learning process. The trained models can progressively            tence. Regarding inclusiveness, the sentence has been cate-
learn from annotated data as automatized humans do, but           gorized as non-inclusive because it contains “Il Presidente”
at a scale not possible through manual work alone.                (i.e., Chair/President) and “Rettore” (i.e., Rector), which are
                                                                  masculine declensions of professional roles. In addition, the
                                                                  sentence also contains “suo decreto”, which refers to a de-
Annotation of Italian administrative documents. We
                                                                  cree that comes from a male person, so the sentence is not
have designed and utilized a novel benchmark dataset for in-
                                                                  inclusive. The discourse sequence is of the administrative
clusive language writing in Italian. This dataset comprises
                                                                  type, as the content refers to an administrative topic, and
administrative communications sourced from the Italian
                                                                  the used language is specialized, as the content describes
public administration, spanning across both national and
                                                                  specific and technical aspects.
regional levels. We annotate the corpus at the sentence
level. To this end, we set up a heterogeneous team of 13
linguistic experts with diverse experiences and expertise         3. Case study: Leveraging Aspects
in inclusive language. The team consists of predominantly
female individuals, all native Italian speakers. All the anno-       for Italian Inclusive Language
tators are educated: 57% have at least 10 years of experience        Reformulation
in linguistics, and 50% have at least 3 years of experience
in inclusive language. In addition, the annotators received,      We conduct an empirical analysis to examine the impact
on average, about 30 hours of training specific to inclusive      of utilizing expert annotations in inclusive language gen-
language annotations.                                             eration. Specifically, we investigate the advantages of si-
   Each human annotator independently assigns                     multaneously addressing two key objectives: reformulating
inclusiveness-related metadata to the document sen-               non-inclusive language and predicting various aspects of
tences. Each sentence can be enriched with multiple               inclusiveness.
annotations. The annotations consist of (a) The refor-
mulation of any non-inclusive piece of text, i.e., an             Tasks. Given a non-inclusive piece of text 𝑇 , the Non-
alternative inclusive form; (b) The level of inclusiveness        Inclusive Language Reformulation (NILR) task aims at gen-
of the input sentence indicating whether a sentence is            erating an equivalent inclusive natural language form. The
non-inclusive, inclusive, or not pertinent; (c) The register      NILR task is a sequence-to-sequence problem, where the
or intended context of use, i.e., Standard, Specialized, or       input is a non-inclusive sentence and the output is the cor-
Informative/Educational; (d) the discourse type or genre, i.e.,   responding inclusive sentence.
Legal, Administrative, Technical, or Informative/Educational.        Given 𝑇 and an aspect 𝐴, the goal is to predict the 𝐴’s
   Additional contextual aspects could be included in future      value for 𝑇 . 𝐴 can be the level of inclusiveness, register
annotations to enhance models’ understanding of inclusive         or intended context of use, and discourse type or genre.
language usage further. By jointly providing those anno-          According to the aspect under analysis, the corresponding
tations, the experts aimed to capture inclusive language’s        sub-tasks are denoted by Inclusiveness Level Classification
nuanced, multi-faceted nature.                                    (ILC), Register Classification (RC), and Genre Classification
                                                                                                   Inclusive       Discursive        Clear
                        Sentence                                 Reformulation
                                                                                                     Class         Sequence        Language
                                                    "Chi ricopre la carica di Presidente, su
           "Il Presidente, scelto dal Rettore tra
                                                    scelta di chi riveste il ruolo di Rettore
           i professori ordinari dell’Ateneo con
                                                    tra il personale docente ordinario
           competenze in ambito di valutazione,
  IT                                                dell’Ateneo con competenze in ambito          Non-inclusivo   Amministrativo   Specialistico
           accreditamento e qualità e nominato
                                                    di valutazione, accreditamento e qualità
           con suo decreto, previo parere del
                                                    e in seguito a nomina con suo decreto,
           Senato Accademico;"
                                                    previo parere del Senato Accademico;"
           "The Chair/President, selected by the    "Who serves as Chair/President,
           Rector among the full professors of      selected by who holds the position
           the University, with expertise in the    of Rector, among the full professors of
  EN       fields of evaluation, accreditation,     the University with expertise in the fields   Non-inclusive   Administrative   Specialized
           and quality and appointed by his         of evaluation, accreditation, and quality
           decree, subject to the opinion of the    and appointed by his or her decree, subject
           Academic Senate;"                        to the opinion of the Academic Senate;"

       Table 2
       Example of sentence annotations illustrating non-inclusive language reformulation in Italian (IT) and English (EN), along with
       corresponding inclusiveness classification, discursive sequence, and clear language class.



         Setting        R-1        R-2       R-L      Human Eval                 between the two pieces of text. The larger the score, the
       Single-Task     74.95     64.09      74.79         0.67                   higher the syntactic similarity. R-1, R-2, and R-L count the
       Multi-Task      75.58     64.37      75.36         0.70                   unit overlap in terms of unigrams, bigrams, and longest
                                                                                 common subsequences, respectively.
Table 3                                                                             To complement the quantitative evaluation, we also per-
Performance comparison between Single- and Multi-task Learn-                     form a qualitative evaluation of the achieved results. We
ing approaches in inclusive language generation, evaluated based                 involved six human evaluators who were asked to label
on ROUGE scores (R-1, R-2, R-L) and human evaluation.
                                                                                 each model-generated sentence as: correct if it accurately
                                                                                 maintained the original meaning while using inclusive lan-
                                                                                 guage appropriately for the context; partially correct if some
(GC). The ILC, RC, and GC tasks are treated as separate                          aspects were reformed correctly, but others were missed
classification problems, where the input is a sentence and                       or inaccurate; or not correct if the rewriting fundamentally
the output is the corresponding aspect value.                                    failed to capture the original meaning or usage intention.
                                                                                 This multi-level feedback aims at capturing the models’ abil-
Single- vs. Multi-Task Learning To compare the per-                              ity to perform the rewriting task sensitively across different
formance of models trained using different learning ap-                          scenarios beyond just string-matching metrics.
proaches, we conducted experiments in both single-task                              To each reformulation, we assign a score to each anno-
and multi-task learning settings.                                                tation as follows: 1 for correct, 0.5 for partially correct, and
    In Single-Task Learning, we exclusively focus on the task                    0 for incorrect. The final score for each reformulation is
of Non-Inclusive Language Reformulation (NILR), disregard-                       computed as the average over all the expert annotations
ing all aspect-related annotations. We leverage an encoder-                      (𝑚 = 6). Finally, we average the scores for all the reformu-
decoder architecture, specifically BART-IT [18], which is a                      lations (𝑛 = 30) to obtain a single score for each model.
BART architecture [19] pre-trained on a clean Italian corpus
[20]. The model is fine-tuned on the NILR task with the                           Results’ overview. Columns 2, 3, and 4 in Table 3 show
twofold objective of modifying the input sentence to make                         the ROUGE scores for both models. The multi-task learn-
it inclusive while maintaining the original meaning.                              ing achieves the best performance on all the quantitative
    Conversely, in Multi-Task Learning, we integrate the NILR                     metrics. Regarding the human evaluation, we obtained 6
task with Aspect Classification tasks during training (i.e.,                      annotations for 30 reformulations for each model. For the
ILC, RC, and GC). For the additional tasks, we specifically                       model trained with the single task configuration, 93 were
leverage the encoder component of the model, which ex-                            correct, 55 were partially correct, and 32 were incorrect.
tracts representations of the input text. The encoder com-                        Instead, for the multi-task model, 101 were correct, 49 were
ponent is additionally trained with a classification objective.                   partially correct, and 30 were incorrect. Column 5 reports
Each task is associated with a separate classification head,                      the average human evaluation scores for both models. The
trained to predict the corresponding aspect value for the                         human scores are coherent with the quantitative ones, show-
input sentence. By interleaving these tasks during training,                      ing that the model trained under multi-task settings bene-
the model learns to simultaneously address NILR and create                        fits from the additional labels. Based on these preliminary
encoder representations that capture various aspects related                      results, we can conclude that the nuanced and multidimen-
to inclusiveness.                                                                 sional annotations of inclusive language have the potential
                                                                                  to develop a more comprehensive approach to modeling
Evaluation Metrics. We evaluate the quality of the text                           inclusive language.
reformulation using a standard train-validation-test split
on our expert-annotated data. To compare the automati-
cally generated and expected reformulations, we use the
established ROUGE F1-scores [21]. They measure the unit
overlap, in terms of the number of n-grams in common,
4. Conclusions                                                       [2] A. Balayn, J. Yang, Z. Szlávik, A. Bozzon, Automatic
                                                                         identification of harmful, aggressive, abusive, and of-
This paper discussed and experimentally demonstrated                     fensive language on the web: A survey of techni-
that the role and contribution of human annotators are                   cal biases informed by psychology literature, ACM
of paramount importance in improving the quality of NLP                  Trans. Soc. Comput. 4 (2021) 11:1–11:56. URL: https:
results and the writing capability of generative approaches              //doi.org/10.1145/3479158. doi:10.1145/3479158.
in inclusive communication. Starting from a new Italian              [3] R. Artstein, M. Poesio, Bias decreases in proportion to
administrative corpus, we enriched it with a variety of an-              the number of annotators, in: Proceedings of FG-MoL
notations with the help of a team of language experts. This              2005: The 10th conference on Formal Grammar and
included (i) reformulating gendered language and acronyms,               The 9th Meeting on, volume 139, 2009.
(ii) rewriting to enhance readability for the visually impaired,     [4] R. Artstein, M. Poesio, Inter-coder agreement for com-
and (iii) defining the intended context of use (register) and            putational linguistics, Computational linguistics 34
text genre. The preliminary experimental results on the                  (2008) 555–596.
annotated corpus are promising and highlight the potential           [5] J. Carletta, Assessing agreement on classification
of the newly proposed annotations to develop a more com-                 tasks: The kappa statistic, Computational Linguis-
prehensive and richer approach that improves the ability                 tics 22 (1996) 249–254. URL: https://aclanthology.org/
of the generative algorithm to propose comprehensive and                 J96-2004.
integrative reformulations.                                          [6] P. S. Bayerl, K. I. Paul, What determines inter-coder
                                                                         agreement in manual annotations? a meta-analytic in-
Limitations. i) The annotation is language-specific, lim-                vestigation, Computational Linguistics 37 (2011) 699–
ited to the Italian language, thereby constraining its utility           725. URL: https://aclanthology.org/J11-4004. doi:10.
in multilingual scenarios; and ii) It is formal communication-           1162/COLI_a_00074.
specific. Tailored to tackle the challenge of inclusive lan-         [7] J. Monti, Dalla zairja alla traduzione automatica: ri-
guage in administrative and academic settings, the natural               flessioni sulla traduzione nell’era digitale, Loffredo,
language tasks are exclusively trained on administrative doc-            2019.
uments, potentially lacking suitability for diverse contexts         [8] P. Sánchez-Gijón, D. Kenny, Selecting and preparing
like legal and web communications.                                       texts for machine translation: Pre-editing and writ-
                                                                         ing for a global audience, Machine translation for
Future work. As part of the E-MIMIC1 (Empowering Mul-                    everyone: Empowering users in the age of artificial
tilingual Inclusive Communication) project, we are currently             intelligence 18 (2022) 81.
working on a multilingual annotation process to overcome             [9] B. Alhafni, N. Habash, H. Bouamor, User-centric gen-
these issues and foster inclusive communication across dif-              der rewriting, in: Proceedings of the 2022 Confer-
ferent domains and languages. A team of experts is anno-                 ence of the North American Chapter of the Associa-
tating a large corpus of documents according to linguistic               tion for Computational Linguistics: Human Language
criteria to label linguistic resources in a multilingual setting.        Technologies, Association for Computational Linguis-
    Finally, we want to exploit text-based explainability tech-          tics, Seattle, United States, 2022, pp. 618–631. URL:
niques [22, 23] to perform further human validation of the               https://aclanthology.org/2022.naacl-main.46. doi:10.
models produced.                                                         18653/v1/2022.naacl-main.46.
                                                                    [10] C. Amrhein, F. Schottmann, R. Sennrich, S. Läubli, Ex-
Ethical Considerations. All the gathered documents are                   ploiting biased models to de-bias text: A gender-fair
public and therefore freely accessible on the internet. All              rewriting model, in: Proceedings of the 61st Annual
references to proper names of people and institutions have               Meeting of the Association for Computational Linguis-
been anonymized and replaced with random names for pri-                  tics (Volume 1: Long Papers), Association for Compu-
vacy reasons.                                                            tational Linguistics, Toronto, Canada, 2023, pp. 4486–
                                                                         4506. URL: https://aclanthology.org/2023.acl-long.246.
                                                                         doi:10.18653/v1/2023.acl-long.246.
Acknowledgments                                                     [11] T. Sun, K. Webster, A. Shah, W. Y. Wang, M. Johnson,
                                                                         They, them, theirs: Rewriting with gender-neutral
This study was carried out within the project ”E-MIMIC: Em-              english, CoRR abs/2102.06788 (2021). URL: https://
powering Multilingual Inclusive Communication”, funded                   arxiv.org/abs/2102.06788. arXiv:2102.06788.
by the Ministero dell’Universitá e della Ricerca - with the         [12] E. Vanmassenhove, C. Emmery, D. Shterionov, Neu-
PRIN 2022 (D.D. 104 - 02/02/2022) program.                               Tral Rewriter: A rule-based and neural approach to
                                                                         automatic rewriting into gender neutral alternatives,
                                                                         in: Proceedings of the 2021 Conference on Empir-
References                                                               ical Methods in Natural Language Processing, As-
    [1] S. J. Ashwell, P. K. Baskin, S. L. Christiansen, S. A.           sociation for Computational Linguistics, Online and
        DiBari, A. Flanagin, T. Frey, R. Jemison, M. Ricci,              Punta Cana, Dominican Republic, 2021, pp. 8940–8948.
        Three recommended inclusive language guidelines for              URL: https://aclanthology.org/2021.emnlp-main.704.
        scholarly publishing: Words matter, Learn. Publ. 36              doi:10.18653/v1/2021.emnlp-main.704.
        (2023) 94–99. URL: https://doi.org/10.1002/leap.1527.       [13] A. Piergentili, D. Fucci, B. Savoldi, L. Bentivogli, M. Ne-
        doi:10.1002/LEAP.1527.                                           gri, Gender neutralization for an inclusive machine
                                                                         translation: from theoretical foundations to open chal-
                                                                         lenges, in: Proceedings of the First Workshop on
                                                                         Gender-Inclusive Translation Technologies, European
1
    https://dbdmg.polito.it/e-mimic/                                     Association for Machine Translation, Tampere, Fin-
     land, 2023, pp. 71–83. URL: https://aclanthology.org/
     2023.gitt-1.7.
[14] M. Rosola, S. Frenda, A. T. Cignarella, M. Pellegrini,
     A. Marra, M. Floris, et al., Beyond obscuration and vis-
     ibility: Thoughts on the different strategies of gender-
     fair language in italian, in: CLiC-it 2023. Proceedings
     of the 9th Italian Conference on Computational Lin-
     guistics. Venice, Italy, November 30-December 2, 2023.,
     volume 3596, CEUR-WS, 2023, pp. 1–10.
[15] G. Attanasio, S. Greco, M. La Quatra, L. Cagliero,
     M. Tonti, T. Cerquitelli, R. Raus, E-mimic: Empower-
     ing multilingual inclusive communication, in: 2021
     IEEE International Conference on Big Data (Big Data),
     IEEE, 2021, pp. 4227–4234.
[16] M. La Quatra, S. Greco, L. Cagliero, T. Cerquitelli, In-
     clusively: An ai-based assistant for inclusive writing,
     in: Joint European Conference on Machine Learn-
     ing and Knowledge Discovery in Databases, Springer,
     2023, pp. 361–365.
[17] Raus, Rachele, Tonti, Michela, Cerquitelli, Tania,
     Cagliero, Luca, Attanasio, Giuseppe, La Quatra,
     Moreno, Greco, Salvatore, L´analyse du discours
     et l´intelligence artificielle pour réaliser une écri-
     ture inclusive : le projet emimic,            SHS Web
     Conf. 138 (2022) 01007. URL: https://doi.org/10.
     1051/shsconf/202213801007. doi:10.1051/shsconf/
     202213801007.
[18] La Quatra, Cagliero, Bart-it: An efficient sequence-to-
     sequence model for italian text summarization, Future
     Internet 15 (2022) 15. URL: http://dx.doi.org/10.3390/
     fi15010015. doi:10.3390/fi15010015.
[19] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad,
     A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer,
     BART: Denoising sequence-to-sequence pre-training
     for natural language generation, translation, and
     comprehension, in: Proceedings of the 58th An-
     nual Meeting of the Association for Computational
     Linguistics, Association for Computational Linguis-
     tics, Online, 2020, pp. 7871–7880. URL: https://
     aclanthology.org/2020.acl-main.703. doi:10.18653/
     v1/2020.acl-main.703.
[20] G. Sarti, M. Nissim, It5: Large-scale text-to-text pre-
     training for italian language understanding and gen-
     eration, arXiv preprint arXiv:2203.03759 (2022).
[21] C.-Y. Lin, ROUGE: A package for automatic evaluation
     of summaries, in: Text Summarization Branches Out,
     Association for Computational Linguistics, Barcelona,
     Spain, 2004, pp. 74–81. URL: https://aclanthology.org/
     W04-1013.
[22] G. Sarti, N. Feldhus, L. Sickert, O. van der Wal, M. Nis-
     sim, A. Bisazza, Inseq: An interpretability toolkit
     for sequence generation models, in: Proceedings
     of the 61st Annual Meeting of the Association for
     Computational Linguistics (Volume 3: System Demon-
     strations), Association for Computational Linguis-
     tics, Toronto, Canada, 2023, pp. 421–435. URL: https:
     //aclanthology.org/2023.acl-demo.40. doi:10.18653/
     v1/2023.acl-demo.40.
[23] F. Ventura, S. Greco, D. Apiletti, T. Cerquitelli,
     Trusting       deep       learning     natural-language
     models via local and global explanations,
     Knowl. Inf. Syst. 64 (2022) 1863–1907. URL:
     https://doi.org/10.1007/s10115-022-01690-9.
     doi:10.1007/s10115-022-01690-9.