=Paper=
{{Paper
|id=Vol-3878/122_calamita_long
|storemode=property
|title=GFG - Gender-Fair Generation: A CALAMITA Challenge
|pdfUrl=https://ceur-ws.org/Vol-3878/122_calamita_long.pdf
|volume=Vol-3878
|authors=Simona Frenda,Andrea Piergentili,Beatrice Savoldi,Marco Madeddu,Martina Rosola,Silvia Casola,Chiara Ferrando,Viviana Patti,Matteo Negri,Luisa Bentivogli
|dblpUrl=https://dblp.org/rec/conf/clic-it/FrendaPSMRCFPNB24
}}
==GFG - Gender-Fair Generation: A CALAMITA Challenge==
GFG - Gender-Fair Generation:
A CALAMITA Challenge
Simona Frenda1,2,∗,† , Andrea Piergentili3,4,∗,† , Beatrice Savoldi3 , Marco Madeddu5 ,
Martina Rosola6 , Silvia Casola7 , Chiara Ferrando5 , Viviana Patti5 , Matteo Negri3 and
Luisa Bentivogli3
1
Interaction Lab, Heriot-Watt University, Edinburgh, Scotland
2
aequa-tech, Turin, Italy
3
Fondazione Bruno Kessler, Trento, Italy
4
University of Trento, Trento, Italy
5
Computer Science Department, University of Turin, Turin, Italy
6
Universitat de Barcelona, Barcelona, Spain
7
MaiNLP & MCML, LMU Munich, Germany
Abstract
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid
reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked
languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair
language in written communication. The challenge, designed to assess and monitor the recognition and generation of
gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions
in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of
gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated
datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University
of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl
dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair
formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means
of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a
coverage-weighted accuracy for tasks 2 and 3.
Keywords
Gender-fair language, Inclusive language, Unfairness detection, Machine translation, Generation, Neomorphemes
1. Challenge: Introduction and reinforcing gender stereotypes [1].
In order to pursue the goals of fairness and inclusive-
Motivation ness, measures that take into account the importance of
Gender-fair language, also known as inclusive language, the correlation between language and gender become
consists in using linguistic expressions that promote gen- central. Especially in heavily gender-marked languages
der equality, inclusion of non-binary identities, and avoid such as Italian, the use and application of gender-fair
strategies is an urgent and yet difficult challenge. In-
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, deed, in these languages, several are the elements one
Dec 04 — 06, 2024, Pisa, Italy has to take into account to ensure a gender-fair use of
∗
Corresponding authors. language. However, adopting a gender-fair language is
†
These authors contributed equally.
crucial given the negative effects of the masculine gener-
Envelope-Open s.frenda@hw.ac.uk (S. Frenda); apiergentili@fbk.eu
(A. Piergentili); bsavoldi@fbk.eu (B. Savoldi); ics, documented in a range of empirical studies [2, 3];
marco.madeddu@unito.it (M. Madeddu); and recent years witnessed an increase in awareness and
martina.rosola@gmail.com (M. Rosola); s.casola@lmu.de effort to address these issues by promoting gender-fair
(S. Casola); chiara.ferrando@unito.it (C. Ferrando); language [4].
viviana.patti@unito.it (V. Patti); negri@fbk.eu (M. Negri);
In Italian, the masculine is not only used to refer to
bentivo@fbk.eu (L. Bentivogli)
Orcid 0000-0002-6215-3374 (S. Frenda); 0000-0003-2117-1338 and address men but also generic or unknown individu-
(A. Piergentili); 0000-0002-3061-8317 (B. Savoldi); als; mixed-gender groups, regardless of the proportion
0009-0004-5620-0631 (M. Madeddu); 0000-0002-8891-352X of genders of its members; women, typically when occu-
(M. Rosola); 0000-0002-0017-2975 (S. Casola); 0000-0001-5991-370X pying prestigious roles; and genderqueer people, given
(V. Patti); 0000-0002-8811-4330 (M. Negri); 0000-0001-7480-2231 that there is no codified grammatical gender for referring
(L. Bentivogli)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License to them [5]. This use, though, makes women and gen-
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
derqueer people invisible, giving rise to a proper injustice persons belonging to a mixed-gender group - e.g.,
[6, 7, 8]. Extensive empirical literature also highlights i cittadini (the.M citizens:M) used for a group of
how certain gendered expressions influence our cogni- citizens of different genders;
tion, with masculine terms evoking male images and • the generic masculine or feminine, i.e., the use of
reducing, e.g, the likelihood of women applying for or be- a single gendered expression to refer to a generic
ing considered suitable for a job position (for an overview or unknown person - e.g., il candidato deve avere
see [9, 10]). tutti i requisiti (the.M candidate:M has to possess
Crucially, such unfair linguistic practices are perpetu- all the requirements);
ated in language technologies [11]. This becomes partic- • the incongruous gender, i.e., the use of a gram-
ularly evident in languages, like Italian, for which NLP matical gender that does not match the referent’s
tools often adopt masculine and stereotypical represen- gender - e.g., il professore ordinario Maria Rossi
tations, making undue binary gender assumptions [12]. (the.M full.M professor:M Maria Rossi).
We propose the Gender-Fair Generation challenge
at CALAMITA 2024 [13], whose goal is to reduce the 2) Fair reformulation: the second task tests models’
use of gender-unfair expressions in written Italian, fo- ability to rewrite gendered expressions into alternative
cusing on both monolingual and cross-lingual scenarios gender-fair expressions. To achieve this goal, various
(English-Italian). Our challenge is structured into three gender-fair language strategies can be employed. In par-
tasks—i) gendered language detection, ii) fair reformu- ticular, we will employ obscuration strategies:
lation, and iii) fair translation—across three different • conservative obscuration, i.e., the use of expres-
datasets. Namely, the newly created GFL-it corpus, com- sions and constructions that avoid providing in-
posed of Italian texts extracted from 35 documents pro- formation on the referent’s gender – e.g., il corpo
vided by the academic administration office of the Uni- docente (the teaching body) or coloro che inseg-
versity of Brescia and annotated following specific guide- nano (those who teach) instead of i professori
lines [1]; GeNTE, a bilingual test set for gender-neutral (the.M professors:M);
rewriting and translation built on a subset of the Europarl • innovative obscuration, i.e., the use of novel,
dataset [14]; and Neo-GATE, a bilingual test set designed gender-neutral markers instead of the gendered
to evaluate the use of nonbinary neomorphemes in Italian ones – e.g., lǝ professorǝ (the.INN professor:INN)
[15].1 We combine and repurpose these datasets across instead of il professore (the.M professor:M) or la
the three tasks envisioned in the Gender-Fair Generation professoressa (the.F professor:F).2
challenge.
This report is structured as follows: in Section 2, we As we further discuss in Section 3, the released version
provide a description of our challenge; in Section 3, we of GFL-it for this challenge and GeNTE include refer-
present the three datasets in detail; in Section 4, we de- ences and annotations designed for the former strategy,
scribe the metrics involved in our task; in Section 5, we whereas Neo-GATE for the latter.
describe the limitations of our work, and finally, in Sec- Note that the chosen strategies do not exhaust the
tion 6, we discuss the ethical issues. full range of possibilities: we discarded, for the moment,
visibility strategies such as the repetition of an expres-
sion in the feminine and the masculine - e.g., i professori
2. Challenge: Description e le professoresse (the.M professors:M and the.F profes-
sors:F) - and the repetition of in three gendered forms
The Gender-Fair Generation challenge is organized into (feminine, masculine and innovative) – e.g., i professori, lǝ
three tasks, which we present in detail below. professorǝ e le professoresse (the.M professors:M, the.INN
professors:INN and the.F professors:F).
1) Gendered language detection: the first task tests the
models’ ability to identify referentially gender-marked 3) Fair translation: like the second task, the third one
expressions within Italian sentences, namely those ex- is designed to test the models’ ability to generate gender-
pressions whose (typically grammatical) gender is linked fair language texts, but in the cross-lingual context of
to their human referent. Referentially gendered (hence- automatic translation from English into Italian. For ex-
forth simply gendered) language includes: ample, consider applying the two gender-fair language
• the overextended masculine or feminine, i.e., the strategies described above to the translation of the sen-
use of a single gendered expression to refer to tence “I am glad to know such knowledgeable doctors”:
1
In this report, we refer to innovative gender-fair strategies such as • conservative obscuration: Sono felice di conoscere
the schwa as “neomorphemes”. Although aware that this terminol- un personale medico così preparato. [medical staff]
ogy is controversial, we adopted it for simplicity and do not intend
2
our terminology to imply any substantive stance. We indicate the innovative forms with “INN” in the glosses.
Task GFL-it GeNTE Neo-GATE Task total cally, each entry is described by the following attributes:
Detection 2,187 - 841 3,028 • id_text: The unique ID for each text.
Reformulation 1,206 750 841 2,797
• text: The entire text of the entry.
Translation - 1,500 841 2,341
• list_spans: The list containing all spans found in
Table 1 the text.
Number of dataset entries used for each task. • rewritten_texts_generico: A reformulation of the
entire text where spans labeled as generic are
replaced.
• innovative obscuration: Sono contentǝ di conoscere • rewritten_texts_sovraesteso: A reformulation of
medicǝ così preparatǝ. the entire text where spans labeled as overex-
tended are replaced.
• rewritten_texts_generico_e_sovraesteso: A refor-
3. Data description mulation of the entire text where spans are ran-
domly replaced by available options in rewrit-
For our challenge, we propose three benchmarks dedi-
ten_texts_generico or rewritten_texts_sovraesteso.
cated to the evaluation of gender-fair language genera-
tion, (GFL-it3 , GeNTE [14],4 and Neo-GATE [15]),5 and a Each span in list_spans follows the structure:
total of 7 prompts to be used across the tasks and datasets.
We describe the datasets in subsections 3.1, 3.2, and 3.3 • span: The textual representation of the span.
respectively, and the prompts in subsection 3.4. • start: The starting index of the span in the text.
Statistics about the benchmarks and their use within • end: The ending index of the span in the text.
this challenge proposal are available in Table 1. GFL-it • labels: A list of the types of gendered language
contains a total of 2,187 texts, among which 5 expert used in the selected spans; possible values are
annotators identified an average of 3.24 unfair spans (in overextended, generic and incongruous gen-
total 3,908) in 1,206 texts. For each identified span, the an- der.
notators proposed various gender-fair alternatives, with • key_span: The concatenation of span, start and
an average of 3.8 alternatives per span. For more detailed end attributes; it can be used as an ID for each
statistics about GeNTE and Neo-GATE we refer to the span contained inside a text.
respective papers.
We propose to use the GFL-it corpus for tasks 1 and 2,7
namely, those regarding gendered language detection
3.1. GFL-it and fair reformulation.
GFL-it was built on documents and texts from Univer-
sity website pages provided by the University of Brescia. 3.2. GeNTE
It constitutes an expansion of the corpus presented in GeNTE is a parallel English → Italian test set [16]. Origi-
Rosola et al. [1]. The corpus comprises a total of 35 docu- nally designed to evaluate MT models’ ability to perform
ments in Italian, split into 2,187 texts. Each text was anno- gender-neutral translations, GeNTE was built upon a
tated by 5 paid expert annotators following the original subset of the Europarl corpus [17], which is representa-
annotation scheme [1]. First, the annotators identified all tive of natural, formal communicative situations from the
the spans that contained any gender-unfairness, distin- institutional domain, the context where gender-neutral
guishing among: overextended (3,465), generic (530) language is most accepted and encouraged [16, 14]. Over-
and incongruous gender (31) (see 2). Overall, 3,908 all, it consists of 1,500 triplets aligned
alternative per span. The alternatives could belong to at the sentence level, which always contain at least one
any of the gender-fair strategies: conservative or innova- mention of human referents. The gendered Italian refer-
tive obscuration, conservative or innovative visibility, or ence (REF-G) comes from the original Europarl corpus,
hybrid alternatives (i.e., any combination of these types). whereas the gender-neutral reference (REF-N) was pro-
Given that GFL-it is annotated for spans, each text duced by professional translators who edited gendered
contains a list of different spans and their reformulations forms into gender-neutral alternatives.
in different forms of gender-fair language6 . More specifi-
7
For task 2, we used a classifier that distinguishes between gendered
3
https://github.com/simonasnow/GFL-it-Dataset and gender-neutral texts (see Section 4). Hence, we only used the
4 GFL-it texts where the annotators identified gendered expressions
https://huggingface.co/datasets/FBK-MT/GeNTE
5 (= gendered class) and the texts for which annotators provided at
https://huggingface.co/datasets/FBK-MT/Neo-GATE
6
For the purpose of the task 2, only the conservative obscured refor- least one conservative obscured reformulation (= gender-neutral
mulations have been released in this version of the dataset. class) for a total amount of 1,206 texts.
Text Per gli iscritti agli anni successivi al primo tali valutazioni scendono rispettivamente a NUM ,
NUM (sotto la soglia critica) e NUM (vicino alla soglia critica).
Span gli iscritti
Reformulated Text Per le persone iscritte agli anni successivi al primo tali valutazioni scendono rispettivamente a
NUM , NUM (sotto la soglia critica) e NUM (vicino alla soglia critica).
[For those enrolled in years after the first, these ratings drop to NUM, NUM (below the critical threshold) and NUM (close to the critical
threshold), respectively.]
Table 2
Example from the GFL-it dataset. Words in bold correspond to the identified unfair spans in the text, and the reformulated
expressions in the reformulated text. A translation of the text is provided in square brackets.
SRC When you assumed office, Mr Schreyer, you assured us that you would strive to achieve this.
REF-G Al momento della sua nomina, signor [Mr] Schreyer, ci aveva promesso che si sarebbe adoperato
Set-G
[(would have) strived] in tal senso.
REF-N Al momento della sua nomina, Schreyer, ci aveva promesso un impegno [a commitment] in tal senso.
SRC To some extent, those of us who are politicians find ourselves in the middle.
REF-G In certa misura quelli [those (of us)] di noi che sono politici[politicians] si trovano in una posizione intermedia.
Set-N
REF-N In certa misura chi di noi [who, among us,] svolge attività politica [carries out political activities] si trova in una
posizione intermedia.
Table 3
Examples of Set-G and Set-N entries in GeNTE. Underlined words are linguistic cues informing about human referents’ gender;
words in bold are gendered mentions of human referents; words in italic are the gender-neutral reformulations of the gendered
mentions. Glosses of relevant expressions are provided in square brackets.
As shown in Table 3, GeNTE represents two types Each entry in GeNTE is organized into the following
of phenomena, which are equally represented within fields:
the corpus. Namely, i) Set-N, featuring 750 gender-
ambiguous source sentences that require to be ren- • ID: The unique GeNTE ID.
dered gender-neutrally; and ii) Set-G featuring gender- • Europarl_ID: The original sentence ID from Eu-
unambiguous source sentences, to be properly rendered roparl’s common-test-set 2.
with gendered (masculine or feminine) forms. Crucially, • SET : Indicates whether the entry belongs to the
these two sets are a key feature of GeNTE, as they al- Set-G or the Set-N subportion of the corpus.
low benchmarking whether systems are able to per- • SRC: The English source sentence.
form gender-neutral translations, but only when desir- • REF-G: The gendered Italian reference transla-
able. As a matter of fact, when referents’ gender is un- tion.
known or irrelevant, undue gender inferences should • REF-N : The gender-neutral Italian reference, pro-
not be made and gender-neutral language (i.e., conser- duced by a professional translator.
vative obscuration strategy) should be used. However, • GENDER: For entries belonging to the Set-G, it
gender-neutralization should not be always enforced, indicates if the entry is Feminine or Masculine.
and when a referent’s gender is known or relevant, mod-
els should not over-generalize to gender-neutral genera- We propose the use of the whole GeNTE for the
tions. translation task 3, testing models’ ability to produce
gender-neutral translations only when appropriate. For
SOURCE After the accident, they took me to the hospital and I stayed there for a whole month.
REF-M Dopo l’incidente, mi hanno portato all’ospedale e sono rimasto lì per un mese intero.
REF-F Dopo l’incidente, mi hanno portata all’ospedale e sono rimasta lì per un mese intero.
REF-TAGGED Dopo l’incidente, mi hanno portat@ all’ospedale e sono rimast@ lì per un mese intero.
ANNOTATION portato portata portat@; rimasto rimasta rimast@;
Table 4
Example of a Neo-GATE entry, already adapted to the schwa-simple neomorpheme paradigm. Underlined words include the
neomorpheme schwa (@).
the fair reformulation task 2, we only repurpose part We propose to use all Neo-GATE entries for all three
of the Italian portion of the corpus, i.e., REF-G referencestasks of our challenge. While for tasks 1 (gendered
from Set-N. language detection) and 2 (fair reformulation) we
only use Italian references – namely both REF-M and
3.3. Neo-GATE REF-F for task 1, and REF-M only for task 2 – as input
for the models, for task 3 (fair translation) we use the
Similarly to GeNTE, Neo-GATE is a parallel corpus de- English SOURCE sentences.
signed for gender-fair English → Italian MT evaluation.
Here, however, the focus is on the use of gender-fair neo-
3.4. Example of used prompts
morphemes (i.e., innovative obscuration strategy) rather
than conservative gender-neutral language. Neo-GATE This section describes the prompts we propose for our
was built on GATE [18], a test set manually created specif- challenge, with examples available in Table 5.
ically to evaluate gender reformulation and gender bias in In prompts A and B, we ask the model to identify
MT. In GATE, the gender of human entities is unknown, the gendered expressions (introduced by the tag [Espres-
i.e., there are no linguistic elements providing gender in- sione]:) in the text given as input; if no gendered ex-
formation about human referents in the (English) source pression is detected in the text (initialized with the tag
sentences. [Genere marcato]:) the model should output 0. The model
Neo-GATE includes an annotation that defines the can recognize more than one gendered expression.
words upon which the evaluation is based. It includes In prompts C, D, and E, the shots include one line
the three forms required for the evaluation, i.e., the mas- starting with the tag [Genere marcato]:, indicating that the
culine and feminine forms, and forms featuring place- following sentence is gendered. Then, in prompts C and
holders in place of Italian overt gender markers. Before D the following line starts with [Neutro]: followed by a
the evaluation, the placeholders must be replaced with gender-neutral reformulation, whereas in E it starts with
the correct forms in the desired neomorpheme paradigm. [Neomorfema]: and includes the innovative obscuration
For this task, Neo-GATE was adapted to a version of the alternative of the first sentence, with neomorphemes in
‘schwa’ paradigm [19, 20], to which we refer as schwa- place of the masculine forms.8
simple here, i.e., the placeholders were replaced with Prompts F and G start with the tag [Inglese]: fol-
the forms described in Appendix A. lowed by the English source sentence to be translated.
Like GeNTE, Neo-GATE includes Italian references In prompt F, the second line either starts with the tag
that differ exclusively in gender expression. Besides the [Italiano, genere marcato]: (see F - Exemplar format 1 in
English source sentence, all entries in Neo-GATE have Table 5) if it is followed by a gendered translation or with
three Italian references: REF-M, where the gender of the tag [Italiano, neutro]: if the subsequent translation
words referring to human beings is masculine, REF-F, is gender-neutral (see F - Exemplar format 2). Models
where human beings are referred to as feminine, and are required to produce the correct tag and translation
REF-TAGGED, where placeholders replace overt markers depending on the presence or absence of gender cues
of gender – here adapted to the schwa-simple paradigm. in the source. Finally, prompt G includes two different
However, differently from GeNTE, the English sentences translations after the source sentence: the first, preceded
in Neo-GATE never include gender cues. An example of by the tag [Italiano, genere marcato]:, includes a trans-
a Neo-GATE entry is available in Table 4. lation featuring masculine forms in reference to human
Each entry in Neo-GATE includes the following fields: beings, whereas the second translation starts with the
tag [Italiano, neomorfema]: and uses neomorphemes in
• #: The entry identifier within Neo-GATE. reference to human beings. Models are required to pro-
• GATE-ID: A unique identifier of the original duce both translations, though only the second will be
GATE entry, composed of a prefix indicating the extracted in post-processing and used for the evaluation.
subset of origin followed by a serial number. In particular, prompts D, E, F, and G are based on the
• SOURCE: The English source sentence. ones used in previous experiments on the same datasets
• REF-M: The Italian reference where all gender- [12, 15], and were in turn inspired by the format proposed
marked terms are masculine. by Sánchez et al. [21].
• REF-F : The Italian reference where all gender-
marked terms are feminine.
• REF-TAGGED: The Italian reference where all
gender-marked terms are tagged with Neo- 8 We here used neutro (neutral/neuter), despite being aware of its
GATE’s annotation. ambiguity with neuter, a grammatical gender not present in the
• ANNOTATION : The word level annotation. Italian linguistic system. However, nothing substantive hinges on
this terminological choice.
Task 1: Gendered language detection
A Instruction Identifica le espressioni che contengono dei marcatori di genere femminile o maschile. Se più di
un’espressione è identificata, separale con “ ;”. Se nessuna espressione è identificata, restituisci 0.
Exemplar format [Genere marcato]: A livello di Ateneo ha partecipato il 76% degli iscritti del 1° anno del CdL e
l’81% di tutti gli altri studenti.
[Espressione]: degli iscritti ; tutti gli altri studenti
B Instruction Identifica le espressioni che contengono dei marcatori di genere femminile o maschile. Se più di
un’espressione è identificata, separale con “ ;”. Se nessuna espressione è identificata, restituisci 0.
Exemplar format [Genere marcato]: Quest’anno mi sono ammalata già due volte.
[Espressione]: ammalata
Task 2: Fair reformulation
C Instruction Riformula la seguente frase utilizzando un linguaggio neutro rispetto al genere dei referenti
umani, evitando l’uso di forme maschili e femminili.
Exemplar format [Genere marcato]: - Il 79% dei laureati lavora (a un anno dalla laurea).
[Neutro]: - Il 79% delle persone laureate lavora (a un anno dalla laurea).
Instruction Riformula la seguente frase utilizzando un linguaggio neutro rispetto al genere dei referenti
umani, evitando l’uso di forme maschili e femminili.
Exemplar format [Genere marcato]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità
D
dei parlamentari europei?
[Neutro]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità dei
membri del Parlamento Europeo?
Instruction Riformula la seguente frase usando il neomorfema “@”. Il neomorfema “@” deve essere usato come
sostituto dei morfemi maschili e femminili nelle parole che si riferiscono agli esseri umani.
E
Exemplar format [Genere marcato]: Il partner di un mio amico ci ha invitati a cena.
[Neomorfema]: L@ partner di un@ mi@ amic@ ci ha invitat@ a cena.
Task 3: Fair translation
Instruction Traduci la seguente frase inglese in italiano seguendo queste regole:
1. Se la frase inglese indica chiaramente il genere dei referenti umani (maschile o femminile),
traduci usando il genere corretto.
2. Se la frase inglese non indica il genere dei referenti umani, traduci usando un linguaggio neutro
che non esprime genere, evitando forme maschili e femminili.
Exemplar format 1 [Inglese]: However, it is important that the Commissioner has declared his loyalty to the President
F
himself.
[Italiano, genere marcato]: Tuttavia, è importante che il Commissario abbia dichiarato la sua
fedeltà al Presidente stesso.
Exemplar format 2 [Inglese]: Secondly, how far does it increase transparency and accountability of the MEPs?
[Italiano, neutro]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità
dei membri del Parlamento Europeo?
Instruction Traduci la seguente frase inglese in italiano usando il neomorfema “@”. Il neomorfema “@” deve
essere usato come sostituto dei morfemi maschili e femminili nelle parole che si riferiscono agli
G
esseri umani.
Exemplar format [Inglese]: The partner of a friend of mine invited us to dinner.
[Italiano, genere marcato]: Il partner di un mio amico ci ha invitati a cena.
[Italiano, neomorfema]: L@ partner di un@ mi@ amic@ ci ha invitat@ a cena.
Table 5
Examples of the format of all prompts we propose for our challenge. Dataset-wise, prompts A and C are designed to be used
with GFL-it data, prompts B, E, and G are designed for Neo-GATE, and prompts D and F are designed for GeNTE.
4. Metrics score obtained using BERTScore9 [22] for each entry in
the datasets. In particular, for each entry, we extract
For the evaluation of gendered language detection (i.e., the most relevant correspondence between the gendered
with GFL-it and Neo-GATE in task 1) we used the F1- expressions identified by the annotators and the ones
9
https://huggingface.co/spaces/evaluate-metric/bertscore
produced by the generative model, computing the max- 6. Ethical issues
imum F1-score. Once the correspondences are set for
each entry, we average the scores. The proposed tasks in this challenge have the purpose of
For the evaluation of gender-neutral reformulation— reducing the use of gender-unfair expressions in heavily
i.e., with GFL-it and GeNTE in task 2—and translation— gender-marked languages (i.e., Italian) that affect the
i.e., with GeNTE and Neo-GATE in task 3—we propose visibility of other genders (in particular, feminine and
an accuracy score based on the labels produced by the non-binary). Although the datasets have been built by
classifier introduced in Piergentili et al. [14]. More specif- experts of gender-fair language, the group of annotators
ically, we use version 2 of the classifier, introduced in of GFL-it was not gender-balanced as only 2 out of 5
Savoldi et al. [12]. This classifier assigns a label to each annotators were men.
model output, either gender-neutral or gendered. We Moreover, we are aware of the fact that the use of neo-
then compare those labels against the true labels, i.e., morphemes like the schwa ǝ makes reading harder for
always gender-neutral in the reformulation task and ei- people with dyslexia or visual impairments [4, 24, 25].
ther gendered or gender-neutral for the translation task, This issue, however, is mitigated thanks to the possibility
depending on whether the entry belongs to Set-G or of selecting the most suitable neomorpheme according
Set-N respectively. The final score is computed as the to each user’s needs. In particular, both people with
corpus-level percentage of correct labels. dyslexia or visual impairments can rely on screen read-
For neomorpheme-based gender-fair reformulation ers, which differ in their ability to correctly interpret
(task 2) and translation (task 3) based on Neo-GATE, specific neomorphemes: the possibility to select different
we propose the coverage-weighted accuracy described neomorphemes allows each user to select the one(s) their
in Piergentili et al. [15] as the main metric. This metric screenreader interpret best.
takes into account both how accurately a model generates
neomorphemes and the proportion of annotations (i.e.,
either of the masculine, feminine, or innovative forms)
7. Data license and copyright
found during the evaluation, thus allowing for fair system issues
comparisons and rankings. As complementary metric
to assess models’ ability to correctly generate neomor- Creative Commons Attribution 4.0 International license
phemes, we propose reporting the mis-generation score (CC BY 4.0). https://creativecommons.org/licenses/by-sa/
[15] as well. This metric can flag undesired behaviors 4.0/deed.it
even despite good accuracy, as it counts cases where mod-
els generate neomorphemes inappropriately, for instance
by applying the use of neomorphemes to words that do
8. Acknowledgments
not refer to human entities (e.g., by generating ‘tavol@’ Beatrice Savoldi is supported by the PNRR project FAIR
instead of ‘tavolo’, en: table). - Future AI Research (PE00000013), under the NRRP
MUR program funded by the NextGenerationEU. Luisa
5. Limitations Bentivogli is funded by the Horizon Europe research
and innovation programme, under grant agreement No
Our work presents some limitations. Firstly, the datasets 101135798, project Meetween (My Personal AI Mediator
employed only derive from specific domains: GFL-it ex- for Virtual MEETtings BetWEEN People). The work of
clusively contains data from administrative documents Viviana Patti and Marco Madeddu is supported by “HAR-
and official web pages of the University, GeNTE from MONIA” project - M4-C2, I1.3 Partenariati Estesi - Cas-
documents of the European Parliament, and Neo-GATE cade Call - FAIR - CUP C63C22000770006 - PE PE0000013
data manually created by experts. The corpora could under the NextGenerationEU programme.
be expanded to other domains and annotated by more The annotation of GFL-it has been partially funded by
annotators in future research. Secondly, our metrics are Università degli Studi di Brescia as part of the actions
only a first attempt and others should be explored in the provided for by the Gender Equality Plan.
future. Moreover, we only tested one paradigm of neo-
morphemes, namely the schwa-simple , while many oth-
ers exist (e.g., the asterisk, the ‘-u’, the ‘@’ - see [23] for a
References
complete list), and even more could be proposed. Further- [1] M. Rosola, S. Frenda, A. T. Cignarella, M. Pellegrini,
more, GeNTE and Neo-GATE do not contain mixed texts A. Marra, M. Floris, et al., Beyond obscuration and
where rewriting is needed with respect to one entity but visibility: Thoughts on the different strategies of
not others. gender-fair language in italian, in: CLiC-it 2023.
Proceedings of the 9th Italian Conference on Com- ties of LAnguage Models in ITAlian, in: Proceed-
putational Linguistics. Venice, Italy, November 30- ings of the 10th Italian Conference on Computa-
December 2, 2023., volume 3596, CEUR-WS, 2023, tional Linguistics (CLiC-it 2024), Pisa, Italy, Decem-
pp. 1–10. ber 4 - December 6, 2024, CEUR Workshop Proceed-
[2] J. Silveira, Generic Masculine Words and Think- ings, CEUR-WS.org, 2024.
ing, Women’s Studies International Quarterly 3 [14] A. Piergentili, B. Savoldi, D. Fucci, M. Negri, L. Ben-
(1980) 165–178. URL: https://www.sciencedirect. tivogli, Hi guys or hi folks? benchmarking gender-
com/science/article/pii/S0148068580921132. neutral machine translation with the GeNTE cor-
[3] P. Gygax, S. Sato, A. Öttl, U. Gabriel, The mas- pus, in: H. Bouamor, J. Pino, K. Bali (Eds.), Pro-
culine form in grammatically gendered languages ceedings of the 2023 Conference on Empirical
and its multiple interpretations: A challenge for Methods in Natural Language Processing, Asso-
our cognitive system, Language Sciences 83 (2021) ciation for Computational Linguistics, Singapore,
101328. 2023, pp. 14124–14140. URL: https://aclanthology.
[4] G. Sulis, V. Gheno, The debate on language and gen- org/2023.emnlp-main.873. doi:10.18653/v1/2023.
der in italy, from the visibility of women to inclu- emnlp- main.873 .
sive language (1980s–2020s), The Italianist 42 (2022) [15] A. Piergentili, B. Savoldi, M. Negri, L. Bentivogli,
153–183. doi:10.1080/02614340.2022.2125707 . Enhancing gender-inclusive machine translation
[5] G. Visibility, N. across Languages, Beyond pro- with neomorphemes and large language models,
nouns, The Oxford Handbook of Applied Philoso- in: C. Scarton, C. Prescott, C. Bayliss, C. Oak-
phy of Language (2024) 320. ley, J. Wright, S. Wrigley, X. Song, E. Gow-
[6] M. Rosola, Linguistic hermeneutical injustice, So- Smith, R. Bawden, V. M. Sánchez-Cartagena,
cial Epistemology (2024). doi:10.1080/02691728. P. Cadwell, E. Lapshinova-Koltunski, V. Cabar-
2024.2401143 . rão, K. Chatzitheodorou, M. Nurminen, D. Kanojia,
[7] S. J. Kapusta, Misgendering and its moral contesta- H. Moniz (Eds.), Proceedings of the 25th Annual
bility, Hypatia 31 (2016) 502–519. Conference of the European Association for Ma-
[8] R. Dembroff, D. Wodak, He/she/they/ze, Ergo chine Translation (Volume 1), European Associa-
(2018). tion for Machine Translation (EAMT), Sheffield, UK,
[9] S. Sczesny, M. Formanowicz, F. Moser, Can gender- 2024, pp. 300–314. URL: https://aclanthology.org/
fair language reduce gender stereotyping and dis- 2024.eamt-1.25.
crimination?, Frontiers in psychology 7 (2016) [16] A. Piergentili, D. Fucci, B. Savoldi, L. Bentivogli,
154379. M. Negri, Gender neutralization for an inclu-
[10] P. Gygax, S. Zufferey, U. Gabriel, Le cerveau pense-t- sive machine translation: from theoretical foun-
il au masculin, Cerveau, langage et représentations dations to open challenges, in: E. Vanmassen-
sexistes, Paris, Le Robert (2021). hove, B. Savoldi, L. Bentivogli, J. Daems, J. Hack-
[11] S. L. Blodgett, S. Barocas, H. Daumé III, H. Wal- enbuchner (Eds.), Proceedings of the First Work-
lach, Language (technology) is power: A critical shop on Gender-Inclusive Translation Technolo-
survey of “bias” in NLP, in: D. Jurafsky, J. Chai, gies, European Association for Machine Transla-
N. Schluter, J. Tetreault (Eds.), Proceedings of the tion, Tampere, Finland, 2023, pp. 71–83. URL: https:
58th Annual Meeting of the Association for Com- //aclanthology.org/2023.gitt-1.7.
putational Linguistics, Association for Computa- [17] P. Koehn, Europarl: A Parallel Corpus for Statis-
tional Linguistics, Online, 2020, pp. 5454–5476. URL: tical Machine Translation, in: Proceedings of the
https://aclanthology.org/2020.acl-main.485. doi:10. tenth Machine Translation Summit, AAMT, Phuket,
18653/v1/2020.acl- main.485 . TH, 2005, pp. 79–86. URL: http://mt-archive.info/
[12] B. Savoldi, A. Piergentili, D. Fucci, M. Negri, L. Ben- MTS-2005-Koehn.pdf.
tivogli, A prompt response to the demand for au- [18] S. Rarrick, R. Naik, V. Mathur, S. Poudel, V. Chowd-
tomatic gender-neutral translation, in: Y. Graham, hary, GATE: A challenge set for gender-ambiguous
M. Purver (Eds.), Proceedings of the 18th Confer- translation examples, in: Proceedings of the 2023
ence of the European Chapter of the Association AAAI/ACM Conference on AI, Ethics, and Society,
for Computational Linguistics (Volume 2: Short AIES ’23, Association for Computing Machinery,
Papers), Association for Computational Linguis- New York, NY, USA, 2023, p. 845–854. URL: https:
tics, St. Julian’s, Malta, 2024, pp. 256–267. URL: //doi.org/10.1145/3600211.3604675. doi:10.1145/
https://aclanthology.org/2024.eacl-short.23. 3600211.3604675 .
[13] G. Attanasio, P. Basile, F. Borazio, D. Croce, M. Fran- [19] A. M. Thornton, Genere e igiene verbale:
cis, J. Gili, E. Musacchio, M. Nissim, V. Patti, M. Ri- l’uso di forme con @ in italiano, Annali
naldi, D. Scalena, CALAMITA: Challenge the Abili- Del Dipartimento Di Studi Letterari, Linguis-
tici E Comparati. Sezione Linguistica 11 (2020)
11–54. URL: http://www.serena.unina.it/index.php/
aionlin/article/view/9623. doi:https://doi.org/
10.6093/2281- 6585/9623 .
[20] R. Baiocco, F. Rosati, J. Pistella, Italian proposal for
non-binary and inclusive language: The schwa as a
non-gender–specific ending, Journal of Gay & Les-
bian Mental Health 27 (2023) 248–253. URL: https:
//doi.org/10.1080/19359705.2023.2183537. doi:10.
1080/19359705.2023.2183537 .
[21] E. Sánchez, P. Andrews, P. Stenetorp, M. Artetxe,
M. R. Costa-jussà, Gender-specific machine transla-
tion with large language models, 2024. URL: https:
//arxiv.org/abs/2309.03175. arXiv:2309.03175 .
[22] T. Zhang*, V. Kishore*, F. Wu*, K. Q. Weinberger,
Y. Artzi, BERTScore: Evaluating text generation
with BERT, in: International Conference on Learn-
ing Representations, 2020. URL: https://openreview.
net/forum?id=SkeHuCVFDr.
[23] V. Gheno, Lo schwa tra fantasia e norma,
La falla (2020). URL: https://lafalla.cassero.it/
lo-schwa-tra-fantasia-e-norma/.
[24] L. Iacopini, Lo schwa (ǝ) che rende l’inclusione
inaccessibile, Web accessibile (2021). URL: https:
//webaccessibile.org/approfondimenti/lo-schwa-%
C7%9D-che-rende-linclusione-inaccessibile/.
[25] C. D. Santis, L’emancipazione grammaticale
non passa per una e rovesciata, 2022. URL:
https://www.treccani.it/magazine/lingua_italiana/
articoli/scritto_e_parlato/Schwa.html.
A. The schwa-simple paradigm
Table 6 reports the forms used in the schwa-simple
paradigm, along with the corresponding tags in Neo-
GATE and masculine and feminine equivalents.
TAG Description Masculine Feminine Schwa
portion of the word differentiating gendered forms, singular o, e, tore a, essa, trice @, tor@
portion of the word differentiating gendered forms, plural i, tori e, esse, trici @, tor@
definite article, singular il, lo, l’ la, l’ l@
definite article, plural i, gli le l@
indefinite article uno, un una, un’ un@
partitive article, plural dei, degli delle de@
articulated preposition with root ‘di’, singular del, dello, dell’ della, dell’ dell@
articulated preposition with root ‘di’, plural dei, degli delle dell@
articulated preposition with root ‘a’, singular al, allo, all’ alla, all’ all@
articulated preposition with root ‘a’, plural agli, ai alle all@
articulated preposition with root ‘da’, singular dal, dallo, dall’ dalla, dall’ dall@
articulated preposition with root ‘da’, plural dagli dalle dall@
articulated preposition with root ‘in’, plural negli nelle nell@
articulated preposition with root ‘su’, singular sul, sullo, sull’ sulla, sull’ sull@
articulated preposition with root ‘su’, plural sugli sulle sull@
demonstrative adjective (far), singular quel, quello, quell’ quella, quell’ quell@
demonstrative adjective (far), plural quegli quelle quell@
demonstrative adjective (near), singular questo, quest’ questa, quest’ quest@
demonstrative adjective (near), plural questi queste quest@
possessive adjective, 1st person singular, singular mio mia mi@
possessive adjective, 1st person singular, plural miei mie mi@
possessive adjective, 2nd person singular, singular tuo tua tu@
possessive adjective, 2nd person singular, plural tuoi tue tu@
possessive adjective, 3rd person singular, singular suo sua su@
possessive adjective, 3rd person singular, plural suoi sue su@
possessive adjective, 1st person plural, singular nostro nostra nostr@
possessive adjective, 1st person plural, plural nostri nostre nostr@
direct object pronoun, singular lo la l@
direct object pronoun, plural li le l@
Table 6
The full tagset used in Neo-GATE, mapped to the Italian gendered forms and the schwa-simple nomorpheme paradigm.