=Paper=
{{Paper
|id=Vol-3651/DARLI-AP_paper3
|storemode=property
|title=Building Foundations for Inclusiveness through Expert-Annotated Data
|pdfUrl=https://ceur-ws.org/Vol-3651/DARLI-AP-3.pdf
|volume=Vol-3651
|authors=Moreno La Quatra,Salvatore Greco,Luca Cagliero,Michela Tonti,Francesca Dragotto,Rachele Raus,Stefania Cavagnoli,Tania Cerquitelli
|dblpUrl=https://dblp.org/rec/conf/edbt/QuatraGCTDRCC24
}}
==Building Foundations for Inclusiveness through Expert-Annotated Data==
Building Foundations for Inclusiveness through
Expert-Annotated Data
Moreno La Quatra1,† , Salvatore Greco2,*,† , Luca Cagliero2 , Michela Tonti3 , Francesca Dragotto4 ,
Rachele Raus5 , Stefania Cavagnoli4 and Tania Cerquitelli2
1
Kore University of Enna, Enna, Italy
2
Politecnico di Torino, Turin, Italy
3
Università degli studi di Bergamo, Bergamo, Italy
4
Università degli Studi di Roma Tor Vergata, Rome, Italy
5
Università di Bologna, Bologna, Italy
Abstract
Natural Language Understanding and Generation models suffer from a limited capability of understanding the nuances of inclusive
communication as they are trained on massive data, often including significant portions of non-inclusive content. Even when the models
are specifically designed to address non-inclusive language detection or reformulation, they disregard, to a large extent, inclusiveness-
related features that are likely correlated with the inclusive language nuances, such as the discourse type, level of inclusiveness, and
intended context of use. To assess the importance of additional inclusiveness-related features, we collect a new corpus of Italian
administrative documents humanly annotated by linguistic experts. Linguistic experts not only highlight non-inclusive text snippets and
propose possible reformulations, but also annotate multi-aspect labels related to different inclusive language nuances. We empirically
show that a multi-task learning approach that leverages the multi-aspect annotations can improve the non-inclusive text reformulation
performance, thereby confirming the potential of expert-annotated data in inclusive language processing.
Keywords
inclusive language, natural language processing, text generation, deep learning
1. Introduction tially different: while language editing tools rewrite parts of
the source text based on predefined expert-provided rules,
Non-inclusive expressions are widespread in humanly writ- Natural Language Understanding and Generation models
ten documents [1]. Training Natural Language Understand- can leverage annotations to capture the nuances of anno-
ing and Generation models on massive data exposes them tated text in a self-supervised manner. The use of textual
to bias issues related to language inclusiveness. Addressing annotations also relieves annotators of the task of explicitly
this issue is particularly relevant because Artificial Intel- formulating or adhering to ad hoc linguistic rules.
ligence (AI)-based solutions must be used responsibly to In the context of inclusive language understanding and
correctly model inclusive language practices and not unin- generation, most of the previous work exploits rule-based
tentionally marginalize or disadvantage certain groups. or round-trip translations to annotate texts for inclusivity is-
To mitigate the presence of bias in data, applications based sues [9, 10, 11, 12]. However, these works often overlook the
on AI rely on human supervision for model training and significance of human expert annotations, opting instead
post-processing evaluation. This is quite common in the for rule-based approaches or artificially created datasets
areas of Natural Language Understanding and Generative AI, generated through round-trip translations. The role of lin-
in which applications like Large Language Models (LLMs) guistic annotators in providing specific understanding and
provide end-users with conversational and language editing annotations of language data is crucial for developing more
services [2]. inclusive AI models [13, 14].
The computational linguistic community has agreed A limited body of work has been devoted to generating
on the need to leverage human expert annotations in and exploiting multi-faceted expert human annotations to
experience-based learning for bias detection and mitiga- drive AI models for inclusive language, e.g., [15, 16, 17].
tion [3, 4, 5, 6]. However, the linguistics literature often However, existing benchmarks of annotated text for inclu-
underestimates the importance of linguistic annotators be- sive language processing neglect potentially relevant as-
cause of the widespread tendency to value the figures of pre- pects such as the level of inclusiveness, the intended context
and post-editors [7, 8]. Editing and annotation are substan- of use, and the text genre. These aspects have the poten-
tial to improve the inclusive language understanding and
Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 generation capabilities of AI models.
Joint Conference (March 25-28, 2024), Paestum, Italy This paper proposes an expert-annotated dataset cover-
*
Corresponding author. ing these new aspects and investigates their usefulness in
†
These authors contributed equally. enhancing the performance of the task of non-inclusive text
$ moreno.laquatra@unikore.it (M. La Quatra);
salvatore_greco@polito.it (S. Greco); luca.cagliero@polito.it
reformulation in the absence of rule-based editing models.
(L. Cagliero); michela.tonti@unibg.it (M. Tonti); To this end, we enrich a corpus of Italian administrative
francesca.dragotto@gmail.com (F. Dragotto); rachele.raus@unibo.it documents with multi-aspect annotations, providing more
(R. Raus); stefania.cavagnoli@uniroma2.it (S. Cavagnoli); insights into the inclusive language nuances. The purpose is
tania.cerquitelli@polito.it (T. Cerquitelli) to enable the study of new features describing inclusiveness
https://www.mlaquatra.me/ (M. La Quatra);
https://grecosalvatore.github.io/ (S. Greco)
aspects neglected by existing approaches, such as the level of
0000-0001-8838-064X (M. La Quatra); 0000-0001-7239-9602 inclusiveness, register, and genre. By enriching the language
(S. Greco); 0000-0002-7185-5247 (L. Cagliero); 0000-0001-5306-3054 descriptions with new inclusiveness-related features, we
(R. Raus); 0000-0002-9039-6226 (T. Cerquitelli) provide the research community with new resources to
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
tion 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
enhance the understanding and writing capabilities of AI- By learning language inclusiveness patterns from a di-
based solutions. versified, context-dependent set of expert annotations, AI
We also collect preliminary results on the use of multi- models gain exposure to subtle interpretive differences. The
aspect annotations in a multi-task learning approach to consistency across annotations is ensured through detailed
enhance non-inclusive language reformulations. The results guidelines and instructions provided to experts. Before full
confirm the potential of the inclusiveness-related expert annotation, a collaborative analysis of a sample set identifies
annotations. any divergent interpretations to refine guidance.
Statistics on annotated data. Table 1 reports the num-
2. The annotation process ber of annotated sentences for each aspect, separately for
The term annotation is often used to indicate the process the training, validation, and test sets.
by which textual data are subjected to a tightly interrelated
two-phase activity [6]: a) Identification, selection, and lo- Task ID Train Validation Test
calisation of specific documents, and b) Interpretation and NILR 6491 956 579
labeling of those documents. The first phase entails identify- ILC 9207 1421 866
ing and detailing the text segments that exhibit the linguistic RC 2167 338 247
phenomenon under investigation. Subsequently, in the in- GC 2166 338 248
terpretation phase, the selected occurrences are humanly
Table 1
labeled. These annotations may encompass various forms
Statistics on data. NILR=Non-Inclusive Language Reformulation,
ranging from a selection of pre-established alternatives to ILC=Inclusiveness Level Classification, RC=Register Classifica-
free-text comments or possible reformulations. tion, GC=Genre Classification.
Unlike human annotators, AI models often lack cognitive
abilities such as common sense reasoning and generaliza-
tion capabilities due to the relatively limited numbers of
linguistic examples used for model training compared to Example of annotations. Table 2 shows an example of
the impressive variety of natural language forms. an Italian annotated sentence (as well as the correspond-
Human annotators need sufficient expertise to interpret ing English translation for non-Italian readers). Linguistics
nuanced linguistic phenomena and assign appropriate labels experts assign different annotations to each sentence. In
adequately. Their annotations are at the base of a supervised this example, they have assigned three labels to the sen-
learning process. The trained models can progressively tence. Regarding inclusiveness, the sentence has been cate-
learn from annotated data as automatized humans do, but gorized as non-inclusive because it contains “Il Presidente”
at a scale not possible through manual work alone. (i.e., Chair/President) and “Rettore” (i.e., Rector), which are
masculine declensions of professional roles. In addition, the
sentence also contains “suo decreto”, which refers to a de-
Annotation of Italian administrative documents. We
cree that comes from a male person, so the sentence is not
have designed and utilized a novel benchmark dataset for in-
inclusive. The discourse sequence is of the administrative
clusive language writing in Italian. This dataset comprises
type, as the content refers to an administrative topic, and
administrative communications sourced from the Italian
the used language is specialized, as the content describes
public administration, spanning across both national and
specific and technical aspects.
regional levels. We annotate the corpus at the sentence
level. To this end, we set up a heterogeneous team of 13
linguistic experts with diverse experiences and expertise 3. Case study: Leveraging Aspects
in inclusive language. The team consists of predominantly
female individuals, all native Italian speakers. All the anno- for Italian Inclusive Language
tators are educated: 57% have at least 10 years of experience Reformulation
in linguistics, and 50% have at least 3 years of experience
in inclusive language. In addition, the annotators received, We conduct an empirical analysis to examine the impact
on average, about 30 hours of training specific to inclusive of utilizing expert annotations in inclusive language gen-
language annotations. eration. Specifically, we investigate the advantages of si-
Each human annotator independently assigns multaneously addressing two key objectives: reformulating
inclusiveness-related metadata to the document sen- non-inclusive language and predicting various aspects of
tences. Each sentence can be enriched with multiple inclusiveness.
annotations. The annotations consist of (a) The refor-
mulation of any non-inclusive piece of text, i.e., an Tasks. Given a non-inclusive piece of text 𝑇 , the Non-
alternative inclusive form; (b) The level of inclusiveness Inclusive Language Reformulation (NILR) task aims at gen-
of the input sentence indicating whether a sentence is erating an equivalent inclusive natural language form. The
non-inclusive, inclusive, or not pertinent; (c) The register NILR task is a sequence-to-sequence problem, where the
or intended context of use, i.e., Standard, Specialized, or input is a non-inclusive sentence and the output is the cor-
Informative/Educational; (d) the discourse type or genre, i.e., responding inclusive sentence.
Legal, Administrative, Technical, or Informative/Educational. Given 𝑇 and an aspect 𝐴, the goal is to predict the 𝐴’s
Additional contextual aspects could be included in future value for 𝑇 . 𝐴 can be the level of inclusiveness, register
annotations to enhance models’ understanding of inclusive or intended context of use, and discourse type or genre.
language usage further. By jointly providing those anno- According to the aspect under analysis, the corresponding
tations, the experts aimed to capture inclusive language’s sub-tasks are denoted by Inclusiveness Level Classification
nuanced, multi-faceted nature. (ILC), Register Classification (RC), and Genre Classification
Inclusive Discursive Clear
Sentence Reformulation
Class Sequence Language
"Chi ricopre la carica di Presidente, su
"Il Presidente, scelto dal Rettore tra
scelta di chi riveste il ruolo di Rettore
i professori ordinari dell’Ateneo con
tra il personale docente ordinario
competenze in ambito di valutazione,
IT dell’Ateneo con competenze in ambito Non-inclusivo Amministrativo Specialistico
accreditamento e qualità e nominato
di valutazione, accreditamento e qualità
con suo decreto, previo parere del
e in seguito a nomina con suo decreto,
Senato Accademico;"
previo parere del Senato Accademico;"
"The Chair/President, selected by the "Who serves as Chair/President,
Rector among the full professors of selected by who holds the position
the University, with expertise in the of Rector, among the full professors of
EN fields of evaluation, accreditation, the University with expertise in the fields Non-inclusive Administrative Specialized
and quality and appointed by his of evaluation, accreditation, and quality
decree, subject to the opinion of the and appointed by his or her decree, subject
Academic Senate;" to the opinion of the Academic Senate;"
Table 2
Example of sentence annotations illustrating non-inclusive language reformulation in Italian (IT) and English (EN), along with
corresponding inclusiveness classification, discursive sequence, and clear language class.
Setting R-1 R-2 R-L Human Eval between the two pieces of text. The larger the score, the
Single-Task 74.95 64.09 74.79 0.67 higher the syntactic similarity. R-1, R-2, and R-L count the
Multi-Task 75.58 64.37 75.36 0.70 unit overlap in terms of unigrams, bigrams, and longest
common subsequences, respectively.
Table 3 To complement the quantitative evaluation, we also per-
Performance comparison between Single- and Multi-task Learn- form a qualitative evaluation of the achieved results. We
ing approaches in inclusive language generation, evaluated based involved six human evaluators who were asked to label
on ROUGE scores (R-1, R-2, R-L) and human evaluation.
each model-generated sentence as: correct if it accurately
maintained the original meaning while using inclusive lan-
guage appropriately for the context; partially correct if some
(GC). The ILC, RC, and GC tasks are treated as separate aspects were reformed correctly, but others were missed
classification problems, where the input is a sentence and or inaccurate; or not correct if the rewriting fundamentally
the output is the corresponding aspect value. failed to capture the original meaning or usage intention.
This multi-level feedback aims at capturing the models’ abil-
Single- vs. Multi-Task Learning To compare the per- ity to perform the rewriting task sensitively across different
formance of models trained using different learning ap- scenarios beyond just string-matching metrics.
proaches, we conducted experiments in both single-task To each reformulation, we assign a score to each anno-
and multi-task learning settings. tation as follows: 1 for correct, 0.5 for partially correct, and
In Single-Task Learning, we exclusively focus on the task 0 for incorrect. The final score for each reformulation is
of Non-Inclusive Language Reformulation (NILR), disregard- computed as the average over all the expert annotations
ing all aspect-related annotations. We leverage an encoder- (𝑚 = 6). Finally, we average the scores for all the reformu-
decoder architecture, specifically BART-IT [18], which is a lations (𝑛 = 30) to obtain a single score for each model.
BART architecture [19] pre-trained on a clean Italian corpus
[20]. The model is fine-tuned on the NILR task with the Results’ overview. Columns 2, 3, and 4 in Table 3 show
twofold objective of modifying the input sentence to make the ROUGE scores for both models. The multi-task learn-
it inclusive while maintaining the original meaning. ing achieves the best performance on all the quantitative
Conversely, in Multi-Task Learning, we integrate the NILR metrics. Regarding the human evaluation, we obtained 6
task with Aspect Classification tasks during training (i.e., annotations for 30 reformulations for each model. For the
ILC, RC, and GC). For the additional tasks, we specifically model trained with the single task configuration, 93 were
leverage the encoder component of the model, which ex- correct, 55 were partially correct, and 32 were incorrect.
tracts representations of the input text. The encoder com- Instead, for the multi-task model, 101 were correct, 49 were
ponent is additionally trained with a classification objective. partially correct, and 30 were incorrect. Column 5 reports
Each task is associated with a separate classification head, the average human evaluation scores for both models. The
trained to predict the corresponding aspect value for the human scores are coherent with the quantitative ones, show-
input sentence. By interleaving these tasks during training, ing that the model trained under multi-task settings bene-
the model learns to simultaneously address NILR and create fits from the additional labels. Based on these preliminary
encoder representations that capture various aspects related results, we can conclude that the nuanced and multidimen-
to inclusiveness. sional annotations of inclusive language have the potential
to develop a more comprehensive approach to modeling
Evaluation Metrics. We evaluate the quality of the text inclusive language.
reformulation using a standard train-validation-test split
on our expert-annotated data. To compare the automati-
cally generated and expected reformulations, we use the
established ROUGE F1-scores [21]. They measure the unit
overlap, in terms of the number of n-grams in common,
4. Conclusions [2] A. Balayn, J. Yang, Z. Szlávik, A. Bozzon, Automatic
identification of harmful, aggressive, abusive, and of-
This paper discussed and experimentally demonstrated fensive language on the web: A survey of techni-
that the role and contribution of human annotators are cal biases informed by psychology literature, ACM
of paramount importance in improving the quality of NLP Trans. Soc. Comput. 4 (2021) 11:1–11:56. URL: https:
results and the writing capability of generative approaches //doi.org/10.1145/3479158. doi:10.1145/3479158.
in inclusive communication. Starting from a new Italian [3] R. Artstein, M. Poesio, Bias decreases in proportion to
administrative corpus, we enriched it with a variety of an- the number of annotators, in: Proceedings of FG-MoL
notations with the help of a team of language experts. This 2005: The 10th conference on Formal Grammar and
included (i) reformulating gendered language and acronyms, The 9th Meeting on, volume 139, 2009.
(ii) rewriting to enhance readability for the visually impaired, [4] R. Artstein, M. Poesio, Inter-coder agreement for com-
and (iii) defining the intended context of use (register) and putational linguistics, Computational linguistics 34
text genre. The preliminary experimental results on the (2008) 555–596.
annotated corpus are promising and highlight the potential [5] J. Carletta, Assessing agreement on classification
of the newly proposed annotations to develop a more com- tasks: The kappa statistic, Computational Linguis-
prehensive and richer approach that improves the ability tics 22 (1996) 249–254. URL: https://aclanthology.org/
of the generative algorithm to propose comprehensive and J96-2004.
integrative reformulations. [6] P. S. Bayerl, K. I. Paul, What determines inter-coder
agreement in manual annotations? a meta-analytic in-
Limitations. i) The annotation is language-specific, lim- vestigation, Computational Linguistics 37 (2011) 699–
ited to the Italian language, thereby constraining its utility 725. URL: https://aclanthology.org/J11-4004. doi:10.
in multilingual scenarios; and ii) It is formal communication- 1162/COLI_a_00074.
specific. Tailored to tackle the challenge of inclusive lan- [7] J. Monti, Dalla zairja alla traduzione automatica: ri-
guage in administrative and academic settings, the natural flessioni sulla traduzione nell’era digitale, Loffredo,
language tasks are exclusively trained on administrative doc- 2019.
uments, potentially lacking suitability for diverse contexts [8] P. Sánchez-Gijón, D. Kenny, Selecting and preparing
like legal and web communications. texts for machine translation: Pre-editing and writ-
ing for a global audience, Machine translation for
Future work. As part of the E-MIMIC1 (Empowering Mul- everyone: Empowering users in the age of artificial
tilingual Inclusive Communication) project, we are currently intelligence 18 (2022) 81.
working on a multilingual annotation process to overcome [9] B. Alhafni, N. Habash, H. Bouamor, User-centric gen-
these issues and foster inclusive communication across dif- der rewriting, in: Proceedings of the 2022 Confer-
ferent domains and languages. A team of experts is anno- ence of the North American Chapter of the Associa-
tating a large corpus of documents according to linguistic tion for Computational Linguistics: Human Language
criteria to label linguistic resources in a multilingual setting. Technologies, Association for Computational Linguis-
Finally, we want to exploit text-based explainability tech- tics, Seattle, United States, 2022, pp. 618–631. URL:
niques [22, 23] to perform further human validation of the https://aclanthology.org/2022.naacl-main.46. doi:10.
models produced. 18653/v1/2022.naacl-main.46.
[10] C. Amrhein, F. Schottmann, R. Sennrich, S. Läubli, Ex-
Ethical Considerations. All the gathered documents are ploiting biased models to de-bias text: A gender-fair
public and therefore freely accessible on the internet. All rewriting model, in: Proceedings of the 61st Annual
references to proper names of people and institutions have Meeting of the Association for Computational Linguis-
been anonymized and replaced with random names for pri- tics (Volume 1: Long Papers), Association for Compu-
vacy reasons. tational Linguistics, Toronto, Canada, 2023, pp. 4486–
4506. URL: https://aclanthology.org/2023.acl-long.246.
doi:10.18653/v1/2023.acl-long.246.
Acknowledgments [11] T. Sun, K. Webster, A. Shah, W. Y. Wang, M. Johnson,
They, them, theirs: Rewriting with gender-neutral
This study was carried out within the project ”E-MIMIC: Em- english, CoRR abs/2102.06788 (2021). URL: https://
powering Multilingual Inclusive Communication”, funded arxiv.org/abs/2102.06788. arXiv:2102.06788.
by the Ministero dell’Universitá e della Ricerca - with the [12] E. Vanmassenhove, C. Emmery, D. Shterionov, Neu-
PRIN 2022 (D.D. 104 - 02/02/2022) program. Tral Rewriter: A rule-based and neural approach to
automatic rewriting into gender neutral alternatives,
in: Proceedings of the 2021 Conference on Empir-
References ical Methods in Natural Language Processing, As-
[1] S. J. Ashwell, P. K. Baskin, S. L. Christiansen, S. A. sociation for Computational Linguistics, Online and
DiBari, A. Flanagin, T. Frey, R. Jemison, M. Ricci, Punta Cana, Dominican Republic, 2021, pp. 8940–8948.
Three recommended inclusive language guidelines for URL: https://aclanthology.org/2021.emnlp-main.704.
scholarly publishing: Words matter, Learn. Publ. 36 doi:10.18653/v1/2021.emnlp-main.704.
(2023) 94–99. URL: https://doi.org/10.1002/leap.1527. [13] A. Piergentili, D. Fucci, B. Savoldi, L. Bentivogli, M. Ne-
doi:10.1002/LEAP.1527. gri, Gender neutralization for an inclusive machine
translation: from theoretical foundations to open chal-
lenges, in: Proceedings of the First Workshop on
Gender-Inclusive Translation Technologies, European
1
https://dbdmg.polito.it/e-mimic/ Association for Machine Translation, Tampere, Fin-
land, 2023, pp. 71–83. URL: https://aclanthology.org/
2023.gitt-1.7.
[14] M. Rosola, S. Frenda, A. T. Cignarella, M. Pellegrini,
A. Marra, M. Floris, et al., Beyond obscuration and vis-
ibility: Thoughts on the different strategies of gender-
fair language in italian, in: CLiC-it 2023. Proceedings
of the 9th Italian Conference on Computational Lin-
guistics. Venice, Italy, November 30-December 2, 2023.,
volume 3596, CEUR-WS, 2023, pp. 1–10.
[15] G. Attanasio, S. Greco, M. La Quatra, L. Cagliero,
M. Tonti, T. Cerquitelli, R. Raus, E-mimic: Empower-
ing multilingual inclusive communication, in: 2021
IEEE International Conference on Big Data (Big Data),
IEEE, 2021, pp. 4227–4234.
[16] M. La Quatra, S. Greco, L. Cagliero, T. Cerquitelli, In-
clusively: An ai-based assistant for inclusive writing,
in: Joint European Conference on Machine Learn-
ing and Knowledge Discovery in Databases, Springer,
2023, pp. 361–365.
[17] Raus, Rachele, Tonti, Michela, Cerquitelli, Tania,
Cagliero, Luca, Attanasio, Giuseppe, La Quatra,
Moreno, Greco, Salvatore, L´analyse du discours
et l´intelligence artificielle pour réaliser une écri-
ture inclusive : le projet emimic, SHS Web
Conf. 138 (2022) 01007. URL: https://doi.org/10.
1051/shsconf/202213801007. doi:10.1051/shsconf/
202213801007.
[18] La Quatra, Cagliero, Bart-it: An efficient sequence-to-
sequence model for italian text summarization, Future
Internet 15 (2022) 15. URL: http://dx.doi.org/10.3390/
fi15010015. doi:10.3390/fi15010015.
[19] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad,
A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer,
BART: Denoising sequence-to-sequence pre-training
for natural language generation, translation, and
comprehension, in: Proceedings of the 58th An-
nual Meeting of the Association for Computational
Linguistics, Association for Computational Linguis-
tics, Online, 2020, pp. 7871–7880. URL: https://
aclanthology.org/2020.acl-main.703. doi:10.18653/
v1/2020.acl-main.703.
[20] G. Sarti, M. Nissim, It5: Large-scale text-to-text pre-
training for italian language understanding and gen-
eration, arXiv preprint arXiv:2203.03759 (2022).
[21] C.-Y. Lin, ROUGE: A package for automatic evaluation
of summaries, in: Text Summarization Branches Out,
Association for Computational Linguistics, Barcelona,
Spain, 2004, pp. 74–81. URL: https://aclanthology.org/
W04-1013.
[22] G. Sarti, N. Feldhus, L. Sickert, O. van der Wal, M. Nis-
sim, A. Bisazza, Inseq: An interpretability toolkit
for sequence generation models, in: Proceedings
of the 61st Annual Meeting of the Association for
Computational Linguistics (Volume 3: System Demon-
strations), Association for Computational Linguis-
tics, Toronto, Canada, 2023, pp. 421–435. URL: https:
//aclanthology.org/2023.acl-demo.40. doi:10.18653/
v1/2023.acl-demo.40.
[23] F. Ventura, S. Greco, D. Apiletti, T. Cerquitelli,
Trusting deep learning natural-language
models via local and global explanations,
Knowl. Inf. Syst. 64 (2022) 1863–1907. URL:
https://doi.org/10.1007/s10115-022-01690-9.
doi:10.1007/s10115-022-01690-9.