=Paper=
{{Paper
|id=Vol-3878/122_calamita_long
|storemode=property
|title=GFG - Gender-Fair Generation: A CALAMITA Challenge
|pdfUrl=https://ceur-ws.org/Vol-3878/122_calamita_long.pdf
|volume=Vol-3878
|authors=Simona Frenda,Andrea Piergentili,Beatrice Savoldi,Marco Madeddu,Martina Rosola,Silvia Casola,Chiara Ferrando,Viviana Patti,Matteo Negri,Luisa Bentivogli
|dblpUrl=https://dblp.org/rec/conf/clic-it/FrendaPSMRCFPNB24
}}
==GFG - Gender-Fair Generation: A CALAMITA Challenge==
<pdf width="1500px">https://ceur-ws.org/Vol-3878/122_calamita_long.pdf</pdf>
<pre>
                                GFG - Gender-Fair Generation:
                                A CALAMITA Challenge
                                Simona Frenda1,2,∗,† , Andrea Piergentili3,4,∗,† , Beatrice Savoldi3 , Marco Madeddu5 ,
                                Martina Rosola6 , Silvia Casola7 , Chiara Ferrando5 , Viviana Patti5 , Matteo Negri3 and
                                Luisa Bentivogli3
                                1
                                  Interaction Lab, Heriot-Watt University, Edinburgh, Scotland
                                2
                                  aequa-tech, Turin, Italy
                                3
                                  Fondazione Bruno Kessler, Trento, Italy
                                4
                                  University of Trento, Trento, Italy
                                5
                                  Computer Science Department, University of Turin, Turin, Italy
                                6
                                  Universitat de Barcelona, Barcelona, Spain
                                7
                                  MaiNLP & MCML, LMU Munich, Germany


                                                  Abstract
                                                  Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid
                                                  reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked
                                                  languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair
                                                  language in written communication. The challenge, designed to assess and monitor the recognition and generation of
                                                  gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions
                                                  in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of
                                                  gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated
                                                  datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University
                                                  of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl
                                                  dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair
                                                  formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means
                                                  of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a
                                                  coverage-weighted accuracy for tasks 2 and 3.

                                                  Keywords
                                                  Gender-fair language, Inclusive language, Unfairness detection, Machine translation, Generation, Neomorphemes


                                1. Challenge: Introduction and                                                                                    reinforcing gender stereotypes [1].
                                                                                                                                                     In order to pursue the goals of fairness and inclusive-
                                   Motivation                                                                                                     ness, measures that take into account the importance of
                                Gender-fair language, also known as inclusive language, the correlation between language and gender become
                                consists in using linguistic expressions that promote gen- central. Especially in heavily gender-marked languages
                                der equality, inclusion of non-binary identities, and avoid such as Italian, the use and application of gender-fair
                                                                                                                                                  strategies is an urgent and yet difficult challenge. In-
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, deed, in these languages, several are the elements one
                                Dec 04 — 06, 2024, Pisa, Italy                                                                                    has to take into account to ensure a gender-fair use of
                                ∗
                                     Corresponding authors.                                                                                       language. However, adopting a gender-fair language is
                                †
                                    These authors contributed equally.
                                                                                                                                                  crucial given the negative effects of the masculine gener-
                                Envelope-Open s.frenda@hw.ac.uk (S. Frenda); apiergentili@fbk.eu
                                (A. Piergentili); bsavoldi@fbk.eu (B. Savoldi);                                                                   ics, documented in a range of empirical studies [2, 3];
                                marco.madeddu@unito.it (M. Madeddu);                                                                              and recent years witnessed an increase in awareness and
                                martina.rosola@gmail.com (M. Rosola); s.casola@lmu.de                                                             effort to address these issues by promoting gender-fair
                                (S. Casola); chiara.ferrando@unito.it (C. Ferrando);                                                              language [4].
                                viviana.patti@unito.it (V. Patti); negri@fbk.eu (M. Negri);
                                                                                                                                                     In Italian, the masculine is not only used to refer to
                                bentivo@fbk.eu (L. Bentivogli)
                                Orcid 0000-0002-6215-3374 (S. Frenda); 0000-0003-2117-1338                                                        and address men but also generic or unknown individu-
                                (A. Piergentili); 0000-0002-3061-8317 (B. Savoldi);                                                               als; mixed-gender groups, regardless of the proportion
                                0009-0004-5620-0631 (M. Madeddu); 0000-0002-8891-352X                                                             of genders of its members; women, typically when occu-
                                (M. Rosola); 0000-0002-0017-2975 (S. Casola); 0000-0001-5991-370X pying prestigious roles; and genderqueer people, given
                                (V. Patti); 0000-0002-8811-4330 (M. Negri); 0000-0001-7480-2231                                                   that there is no codified grammatical gender for referring
                                (L. Bentivogli)
                                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License to them [5]. This use, though, makes women and gen-
                                            Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
derqueer people invisible, giving rise to a proper injustice                       persons belonging to a mixed-gender group - e.g.,
[6, 7, 8]. Extensive empirical literature also highlights                          i cittadini (the.M citizens:M) used for a group of
how certain gendered expressions influence our cogni-                              citizens of different genders;
tion, with masculine terms evoking male images and                               • the generic masculine or feminine, i.e., the use of
reducing, e.g, the likelihood of women applying for or be-                         a single gendered expression to refer to a generic
ing considered suitable for a job position (for an overview                        or unknown person - e.g., il candidato deve avere
see [9, 10]).                                                                      tutti i requisiti (the.M candidate:M has to possess
   Crucially, such unfair linguistic practices are perpetu-                        all the requirements);
ated in language technologies [11]. This becomes partic-                         • the incongruous gender, i.e., the use of a gram-
ularly evident in languages, like Italian, for which NLP                           matical gender that does not match the referent’s
tools often adopt masculine and stereotypical represen-                            gender - e.g., il professore ordinario Maria Rossi
tations, making undue binary gender assumptions [12].                              (the.M full.M professor:M Maria Rossi).
   We propose the Gender-Fair Generation challenge
at CALAMITA 2024 [13], whose goal is to reduce the                          2) Fair reformulation: the second task tests models’
use of gender-unfair expressions in written Italian, fo-                    ability to rewrite gendered expressions into alternative
cusing on both monolingual and cross-lingual scenarios                      gender-fair expressions. To achieve this goal, various
(English-Italian). Our challenge is structured into three                   gender-fair language strategies can be employed. In par-
tasks—i) gendered language detection, ii) fair reformu-                     ticular, we will employ obscuration strategies:
lation, and iii) fair translation—across three different                         • conservative obscuration, i.e., the use of expres-
datasets. Namely, the newly created GFL-it corpus, com-                            sions and constructions that avoid providing in-
posed of Italian texts extracted from 35 documents pro-                            formation on the referent’s gender – e.g., il corpo
vided by the academic administration office of the Uni-                            docente (the teaching body) or coloro che inseg-
versity of Brescia and annotated following specific guide-                         nano (those who teach) instead of i professori
lines [1]; GeNTE, a bilingual test set for gender-neutral                          (the.M professors:M);
rewriting and translation built on a subset of the Europarl                      • innovative obscuration, i.e., the use of novel,
dataset [14]; and Neo-GATE, a bilingual test set designed                          gender-neutral markers instead of the gendered
to evaluate the use of nonbinary neomorphemes in Italian                           ones – e.g., lǝ professorǝ (the.INN professor:INN)
[15].1 We combine and repurpose these datasets across                              instead of il professore (the.M professor:M) or la
the three tasks envisioned in the Gender-Fair Generation                           professoressa (the.F professor:F).2
challenge.
   This report is structured as follows: in Section 2, we                      As we further discuss in Section 3, the released version
provide a description of our challenge; in Section 3, we                    of GFL-it for this challenge and GeNTE include refer-
present the three datasets in detail; in Section 4, we de-                  ences and annotations designed for the former strategy,
scribe the metrics involved in our task; in Section 5, we                   whereas Neo-GATE for the latter.
describe the limitations of our work, and finally, in Sec-                     Note that the chosen strategies do not exhaust the
tion 6, we discuss the ethical issues.                                      full range of possibilities: we discarded, for the moment,
                                                                            visibility strategies such as the repetition of an expres-
                                                                            sion in the feminine and the masculine - e.g., i professori
2. Challenge: Description                                                   e le professoresse (the.M professors:M and the.F profes-
                                                                            sors:F) - and the repetition of in three gendered forms
The Gender-Fair Generation challenge is organized into                      (feminine, masculine and innovative) – e.g., i professori, lǝ
three tasks, which we present in detail below.                              professorǝ e le professoresse (the.M professors:M, the.INN
                                                                            professors:INN and the.F professors:F).
1) Gendered language detection: the first task tests the
models’ ability to identify referentially gender-marked                     3) Fair translation: like the second task, the third one
expressions within Italian sentences, namely those ex-                      is designed to test the models’ ability to generate gender-
pressions whose (typically grammatical) gender is linked                    fair language texts, but in the cross-lingual context of
to their human referent. Referentially gendered (hence-                     automatic translation from English into Italian. For ex-
forth simply gendered) language includes:                                   ample, consider applying the two gender-fair language
         • the overextended masculine or feminine, i.e., the                strategies described above to the translation of the sen-
           use of a single gendered expression to refer to                  tence “I am glad to know such knowledgeable doctors”:
1
    In this report, we refer to innovative gender-fair strategies such as        • conservative obscuration: Sono felice di conoscere
    the schwa as “neomorphemes”. Although aware that this terminol-                un personale medico così preparato. [medical staff]
    ogy is controversial, we adopted it for simplicity and do not intend
                                                                            2
    our terminology to imply any substantive stance.                        We indicate the innovative forms with “INN” in the glosses.
Task                 GFL-it GeNTE Neo-GATE Task total                    cally, each entry is described by the following attributes:
Detection     2,187               -           841          3,028                  • id_text: The unique ID for each text.
Reformulation 1,206              750          841          2,797
                                                                                  • text: The entire text of the entry.
Translation     -               1,500         841          2,341
                                                                                  • list_spans: The list containing all spans found in
Table 1                                                                             the text.
Number of dataset entries used for each task.                                     • rewritten_texts_generico: A reformulation of the
                                                                                    entire text where spans labeled as generic are
                                                                                    replaced.
      • innovative obscuration: Sono contentǝ di conoscere                        • rewritten_texts_sovraesteso: A reformulation of
        medicǝ così preparatǝ.                                                      the entire text where spans labeled as overex-
                                                                                    tended are replaced.
                                                                                  • rewritten_texts_generico_e_sovraesteso: A refor-
3. Data description                                                                 mulation of the entire text where spans are ran-
                                                                                    domly replaced by available options in rewrit-
For our challenge, we propose three benchmarks dedi-
                                                                                    ten_texts_generico or rewritten_texts_sovraesteso.
cated to the evaluation of gender-fair language genera-
tion, (GFL-it3 , GeNTE [14],4 and Neo-GATE [15]),5 and a                       Each span in list_spans follows the structure:
total of 7 prompts to be used across the tasks and datasets.
We describe the datasets in subsections 3.1, 3.2, and 3.3                         • span: The textual representation of the span.
respectively, and the prompts in subsection 3.4.                                  • start: The starting index of the span in the text.
   Statistics about the benchmarks and their use within                           • end: The ending index of the span in the text.
this challenge proposal are available in Table 1. GFL-it                          • labels: A list of the types of gendered language
contains a total of 2,187 texts, among which 5 expert                               used in the selected spans; possible values are
annotators identified an average of 3.24 unfair spans (in                           overextended, generic and incongruous gen-
total 3,908) in 1,206 texts. For each identified span, the an-                      der.
notators proposed various gender-fair alternatives, with                          • key_span: The concatenation of span, start and
an average of 3.8 alternatives per span. For more detailed                          end attributes; it can be used as an ID for each
statistics about GeNTE and Neo-GATE we refer to the                                 span contained inside a text.
respective papers.
                                                                           We propose to use the GFL-it corpus for tasks 1 and 2,7
                                                                         namely, those regarding gendered language detection
3.1. GFL-it                                                              and fair reformulation.
GFL-it was built on documents and texts from Univer-
sity website pages provided by the University of Brescia.                3.2. GeNTE
It constitutes an expansion of the corpus presented in                   GeNTE is a parallel English → Italian test set [16]. Origi-
Rosola et al. [1]. The corpus comprises a total of 35 docu-              nally designed to evaluate MT models’ ability to perform
ments in Italian, split into 2,187 texts. Each text was anno-            gender-neutral translations, GeNTE was built upon a
tated by 5 paid expert annotators following the original                 subset of the Europarl corpus [17], which is representa-
annotation scheme [1]. First, the annotators identified all              tive of natural, formal communicative situations from the
the spans that contained any gender-unfairness, distin-                  institutional domain, the context where gender-neutral
guishing among: overextended (3,465), generic (530)                      language is most accepted and encouraged [16, 14]. Over-
and incongruous gender (31) (see 2). Overall, 3,908                      all, it consists of 1,500 <English source, gendered Italian
spans were identified. Then, they provided at least one                  reference, gender-neutral Italian reference> triplets aligned
alternative per span. The alternatives could belong to                   at the sentence level, which always contain at least one
any of the gender-fair strategies: conservative or innova-               mention of human referents. The gendered Italian refer-
tive obscuration, conservative or innovative visibility, or              ence (REF-G) comes from the original Europarl corpus,
hybrid alternatives (i.e., any combination of these types).              whereas the gender-neutral reference (REF-N) was pro-
   Given that GFL-it is annotated for spans, each text                   duced by professional translators who edited gendered
contains a list of different spans and their reformulations              forms into gender-neutral alternatives.
in different forms of gender-fair language6 . More specifi-
                                                                         7
                                                                             For task 2, we used a classifier that distinguishes between gendered
3
  https://github.com/simonasnow/GFL-it-Dataset                               and gender-neutral texts (see Section 4). Hence, we only used the
4                                                                            GFL-it texts where the annotators identified gendered expressions
  https://huggingface.co/datasets/FBK-MT/GeNTE
5                                                                            (= gendered class) and the texts for which annotators provided at
  https://huggingface.co/datasets/FBK-MT/Neo-GATE
6
  For the purpose of the task 2, only the conservative obscured refor-       least one conservative obscured reformulation (= gender-neutral
  mulations have been released in this version of the dataset.               class) for a total amount of 1,206 texts.
 Text                    Per gli iscritti agli anni successivi al primo tali valutazioni scendono rispettivamente a NUM ,
                         NUM (sotto la soglia critica) e NUM (vicino alla soglia critica).
 Span                    gli iscritti
 Reformulated Text       Per le persone iscritte agli anni successivi al primo tali valutazioni scendono rispettivamente a
                         NUM , NUM (sotto la soglia critica) e NUM (vicino alla soglia critica).
                         [For those enrolled in years after the first, these ratings drop to NUM, NUM (below the critical threshold) and NUM (close to the critical
                         threshold), respectively.]

Table 2
Example from the GFL-it dataset. Words in bold correspond to the identified unfair spans in the text, and the reformulated
expressions in the reformulated text. A translation of the text is provided in square brackets.


          SRC       When you assumed office, Mr Schreyer, you assured us that you would strive to achieve this.
          REF-G     Al momento della sua nomina, signor [Mr] Schreyer, ci aveva promesso che si sarebbe adoperato
 Set-G
                    [(would have) strived] in tal senso.
          REF-N     Al momento della sua nomina, Schreyer, ci aveva promesso un impegno [a commitment] in tal senso.
          SRC       To some extent, those of us who are politicians find ourselves in the middle.
          REF-G     In certa misura quelli [those (of us)] di noi che sono politici[politicians] si trovano in una posizione intermedia.
 Set-N
          REF-N     In certa misura chi di noi [who, among us,] svolge attività politica [carries out political activities] si trova in una
                    posizione intermedia.

Table 3
Examples of Set-G and Set-N entries in GeNTE. Underlined words are linguistic cues informing about human referents’ gender;
words in bold are gendered mentions of human referents; words in italic are the gender-neutral reformulations of the gendered
mentions. Glosses of relevant expressions are provided in square brackets.


   As shown in Table 3, GeNTE represents two types         Each entry in GeNTE is organized into the following
of phenomena, which are equally represented within fields:
the corpus. Namely, i) Set-N, featuring 750 gender-
ambiguous source sentences that require to be ren-           • ID: The unique GeNTE ID.
dered gender-neutrally; and ii) Set-G featuring gender-      • Europarl_ID: The original sentence ID from Eu-
unambiguous source sentences, to be properly rendered           roparl’s common-test-set 2.
with gendered (masculine or feminine) forms. Crucially,      • SET : Indicates whether the entry belongs to the
these two sets are a key feature of GeNTE, as they al-          Set-G or the Set-N subportion of the corpus.
low benchmarking whether systems are able to per-            • SRC: The English source sentence.
form gender-neutral translations, but only when desir-       • REF-G: The gendered Italian reference transla-
able. As a matter of fact, when referents’ gender is un-        tion.
known or irrelevant, undue gender inferences should          • REF-N : The gender-neutral Italian reference, pro-
not be made and gender-neutral language (i.e., conser-          duced by a professional translator.
vative obscuration strategy) should be used. However,        • GENDER: For entries belonging to the Set-G, it
gender-neutralization should not be always enforced,            indicates if the entry is Feminine or Masculine.
and when a referent’s gender is known or relevant, mod-
els should not over-generalize to gender-neutral genera-   We propose the use of the whole GeNTE for the
tions.                                                   translation task 3, testing models’ ability to produce
                                                         gender-neutral translations only when appropriate. For


 SOURCE             After the accident, they took me to the hospital and I stayed there for a whole month.
 REF-M              Dopo l’incidente, mi hanno portato all’ospedale e sono rimasto lì per un mese intero.
 REF-F              Dopo l’incidente, mi hanno portata all’ospedale e sono rimasta lì per un mese intero.
 REF-TAGGED         Dopo l’incidente, mi hanno portat@ all’ospedale e sono rimast@ lì per un mese intero.
 ANNOTATION         portato portata portat@; rimasto rimasta rimast@;

Table 4
Example of a Neo-GATE entry, already adapted to the schwa-simple neomorpheme paradigm. Underlined words include the
neomorpheme schwa (@).
the fair reformulation task 2, we only repurpose part          We propose to use all Neo-GATE entries for all three
of the Italian portion of the corpus, i.e., REF-G referencestasks of our challenge. While for tasks 1 (gendered
from Set-N.                                                 language detection) and 2 (fair reformulation) we
                                                            only use Italian references – namely both REF-M and
3.3. Neo-GATE                                               REF-F for task 1, and REF-M only for task 2 – as input
                                                            for the models, for task 3 (fair translation) we use the
Similarly to GeNTE, Neo-GATE is a parallel corpus de- English SOURCE sentences.
signed for gender-fair English → Italian MT evaluation.
Here, however, the focus is on the use of gender-fair neo-
                                                            3.4. Example of used prompts
morphemes (i.e., innovative obscuration strategy) rather
than conservative gender-neutral language. Neo-GATE This section describes the prompts we propose for our
was built on GATE [18], a test set manually created specif- challenge, with examples available in Table 5.
ically to evaluate gender reformulation and gender bias in     In prompts A and B, we ask the model to identify
MT. In GATE, the gender of human entities is unknown, the gendered expressions (introduced by the tag [Espres-
i.e., there are no linguistic elements providing gender in- sione]:) in the text given as input; if no gendered ex-
formation about human referents in the (English) source pression is detected in the text (initialized with the tag
sentences.                                                  [Genere marcato]:) the model should output 0. The model
   Neo-GATE includes an annotation that defines the can recognize more than one gendered expression.
words upon which the evaluation is based. It includes          In prompts C, D, and E, the shots include one line
the three forms required for the evaluation, i.e., the mas- starting with the tag [Genere marcato]:, indicating that the
culine and feminine forms, and forms featuring place- following sentence is gendered. Then, in prompts C and
holders in place of Italian overt gender markers. Before D the following line starts with [Neutro]: followed by a
the evaluation, the placeholders must be replaced with gender-neutral reformulation, whereas in E it starts with
the correct forms in the desired neomorpheme paradigm. [Neomorfema]: and includes the innovative obscuration
For this task, Neo-GATE was adapted to a version of the alternative of the first sentence, with neomorphemes in
‘schwa’ paradigm [19, 20], to which we refer as schwa- place of the masculine forms.8
simple here, i.e., the placeholders were replaced with         Prompts F and G start with the tag [Inglese]: fol-
the forms described in Appendix A.                          lowed by the English source sentence to be translated.
   Like GeNTE, Neo-GATE includes Italian references In prompt F, the second line either starts with the tag
that differ exclusively in gender expression. Besides the [Italiano, genere marcato]: (see F - Exemplar format 1 in
English source sentence, all entries in Neo-GATE have Table 5) if it is followed by a gendered translation or with
three Italian references: REF-M, where the gender of the tag [Italiano, neutro]: if the subsequent translation
words referring to human beings is masculine, REF-F, is gender-neutral (see F - Exemplar format 2). Models
where human beings are referred to as feminine, and are required to produce the correct tag and translation
REF-TAGGED, where placeholders replace overt markers depending on the presence or absence of gender cues
of gender – here adapted to the schwa-simple paradigm. in the source. Finally, prompt G includes two different
However, differently from GeNTE, the English sentences translations after the source sentence: the first, preceded
in Neo-GATE never include gender cues. An example of by the tag [Italiano, genere marcato]:, includes a trans-
a Neo-GATE entry is available in Table 4.                   lation featuring masculine forms in reference to human
   Each entry in Neo-GATE includes the following fields: beings, whereas the second translation starts with the
                                                            tag [Italiano, neomorfema]: and uses neomorphemes in
       • #: The entry identifier within Neo-GATE.           reference to human beings. Models are required to pro-
       • GATE-ID: A unique identifier of the original duce both translations, though only the second will be
         GATE entry, composed of a prefix indicating the extracted in post-processing and used for the evaluation.
         subset of origin followed by a serial number.         In particular, prompts D, E, F, and G are based on the
       • SOURCE: The English source sentence.               ones used in previous experiments on the same datasets
       • REF-M: The Italian reference where all gender- [12, 15], and were in turn inspired by the format proposed
         marked terms are masculine.                        by Sánchez et al. [21].
       • REF-F : The Italian reference where all gender-
         marked terms are feminine.
       • REF-TAGGED: The Italian reference where all
         gender-marked terms are tagged with Neo- 8 We here used neutro (neutral/neuter), despite being aware of its
         GATE’s annotation.                                  ambiguity with neuter, a grammatical gender not present in the
       • ANNOTATION : The word level annotation.             Italian linguistic system. However, nothing substantive hinges on
                                                                 this terminological choice.
                                         Task 1: Gendered language detection
 A Instruction             Identifica le espressioni che contengono dei marcatori di genere femminile o maschile. Se più di
                           un’espressione è identificata, separale con “ ;”. Se nessuna espressione è identificata, restituisci 0.
     Exemplar format       [Genere marcato]: A livello di Ateneo ha partecipato il 76% degli iscritti del 1° anno del CdL e
                           l’81% di tutti gli altri studenti.
                           [Espressione]: degli iscritti ; tutti gli altri studenti
 B Instruction             Identifica le espressioni che contengono dei marcatori di genere femminile o maschile. Se più di
                           un’espressione è identificata, separale con “ ;”. Se nessuna espressione è identificata, restituisci 0.
     Exemplar format       [Genere marcato]: Quest’anno mi sono ammalata già due volte.
                           [Espressione]: ammalata
                                                Task 2: Fair reformulation
 C Instruction             Riformula la seguente frase utilizzando un linguaggio neutro rispetto al genere dei referenti
                           umani, evitando l’uso di forme maschili e femminili.
     Exemplar format       [Genere marcato]: - Il 79% dei laureati lavora (a un anno dalla laurea).
                           [Neutro]: - Il 79% delle persone laureate lavora (a un anno dalla laurea).
     Instruction           Riformula la seguente frase utilizzando un linguaggio neutro rispetto al genere dei referenti
                           umani, evitando l’uso di forme maschili e femminili.
     Exemplar format       [Genere marcato]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità
 D
                           dei parlamentari europei?
                           [Neutro]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità dei
                           membri del Parlamento Europeo?
     Instruction           Riformula la seguente frase usando il neomorfema “@”. Il neomorfema “@” deve essere usato come
                           sostituto dei morfemi maschili e femminili nelle parole che si riferiscono agli esseri umani.
 E
     Exemplar format       [Genere marcato]: Il partner di un mio amico ci ha invitati a cena.
                           [Neomorfema]: L@ partner di un@ mi@ amic@ ci ha invitat@ a cena.
                                                  Task 3: Fair translation
     Instruction       Traduci la seguente frase inglese in italiano seguendo queste regole:
                       1. Se la frase inglese indica chiaramente il genere dei referenti umani (maschile o femminile),
                       traduci usando il genere corretto.
                       2. Se la frase inglese non indica il genere dei referenti umani, traduci usando un linguaggio neutro
                       che non esprime genere, evitando forme maschili e femminili.
     Exemplar format 1 [Inglese]: However, it is important that the Commissioner has declared his loyalty to the President
 F
                       himself.
                       [Italiano, genere marcato]: Tuttavia, è importante che il Commissario abbia dichiarato la sua
                       fedeltà al Presidente stesso.
     Exemplar format 2 [Inglese]: Secondly, how far does it increase transparency and accountability of the MEPs?
                       [Italiano, neutro]: Secondariamente, fino a che punto aumenta la trasparenza e la responsabilità
                       dei membri del Parlamento Europeo?
     Instruction           Traduci la seguente frase inglese in italiano usando il neomorfema “@”. Il neomorfema “@” deve
                           essere usato come sostituto dei morfemi maschili e femminili nelle parole che si riferiscono agli
 G
                           esseri umani.
     Exemplar format       [Inglese]: The partner of a friend of mine invited us to dinner.
                           [Italiano, genere marcato]: Il partner di un mio amico ci ha invitati a cena.
                           [Italiano, neomorfema]: L@ partner di un@ mi@ amic@ ci ha invitat@ a cena.

Table 5
Examples of the format of all prompts we propose for our challenge. Dataset-wise, prompts A and C are designed to be used
with GFL-it data, prompts B, E, and G are designed for Neo-GATE, and prompts D and F are designed for GeNTE.


4. Metrics                                               score obtained using BERTScore9 [22] for each entry in
                                                         the datasets. In particular, for each entry, we extract
For the evaluation of gendered language detection (i.e., the most relevant correspondence between the gendered
with GFL-it and Neo-GATE in task 1) we used the F1- expressions identified by the annotators and the ones
                                                                 9
                                                                     https://huggingface.co/spaces/evaluate-metric/bertscore
produced by the generative model, computing the max-                6. Ethical issues
imum F1-score. Once the correspondences are set for
each entry, we average the scores.                                  The proposed tasks in this challenge have the purpose of
   For the evaluation of gender-neutral reformulation—              reducing the use of gender-unfair expressions in heavily
i.e., with GFL-it and GeNTE in task 2—and translation—              gender-marked languages (i.e., Italian) that affect the
i.e., with GeNTE and Neo-GATE in task 3—we propose                  visibility of other genders (in particular, feminine and
an accuracy score based on the labels produced by the               non-binary). Although the datasets have been built by
classifier introduced in Piergentili et al. [14]. More specif-      experts of gender-fair language, the group of annotators
ically, we use version 2 of the classifier, introduced in           of GFL-it was not gender-balanced as only 2 out of 5
Savoldi et al. [12]. This classifier assigns a label to each        annotators were men.
model output, either gender-neutral or gendered. We                    Moreover, we are aware of the fact that the use of neo-
then compare those labels against the true labels, i.e.,            morphemes like the schwa ǝ makes reading harder for
always gender-neutral in the reformulation task and ei-             people with dyslexia or visual impairments [4, 24, 25].
ther gendered or gender-neutral for the translation task,           This issue, however, is mitigated thanks to the possibility
depending on whether the entry belongs to Set-G or                  of selecting the most suitable neomorpheme according
Set-N respectively. The final score is computed as the              to each user’s needs. In particular, both people with
corpus-level percentage of correct labels.                          dyslexia or visual impairments can rely on screen read-
   For neomorpheme-based gender-fair reformulation                  ers, which differ in their ability to correctly interpret
(task 2) and translation (task 3) based on Neo-GATE,                specific neomorphemes: the possibility to select different
we propose the coverage-weighted accuracy described                 neomorphemes allows each user to select the one(s) their
in Piergentili et al. [15] as the main metric. This metric          screenreader interpret best.
takes into account both how accurately a model generates
neomorphemes and the proportion of annotations (i.e.,
either of the masculine, feminine, or innovative forms)
                                                                    7. Data license and copyright
found during the evaluation, thus allowing for fair system             issues
comparisons and rankings. As complementary metric
to assess models’ ability to correctly generate neomor-             Creative Commons Attribution 4.0 International license
phemes, we propose reporting the mis-generation score               (CC BY 4.0). https://creativecommons.org/licenses/by-sa/
[15] as well. This metric can flag undesired behaviors              4.0/deed.it
even despite good accuracy, as it counts cases where mod-
els generate neomorphemes inappropriately, for instance
by applying the use of neomorphemes to words that do
                                                                    8. Acknowledgments
not refer to human entities (e.g., by generating ‘tavol@’           Beatrice Savoldi is supported by the PNRR project FAIR
instead of ‘tavolo’, en: table).                                    - Future AI Research (PE00000013), under the NRRP
                                                                    MUR program funded by the NextGenerationEU. Luisa
5. Limitations                                                      Bentivogli is funded by the Horizon Europe research
                                                                    and innovation programme, under grant agreement No
Our work presents some limitations. Firstly, the datasets           101135798, project Meetween (My Personal AI Mediator
employed only derive from specific domains: GFL-it ex-              for Virtual MEETtings BetWEEN People). The work of
clusively contains data from administrative documents               Viviana Patti and Marco Madeddu is supported by “HAR-
and official web pages of the University, GeNTE from                MONIA” project - M4-C2, I1.3 Partenariati Estesi - Cas-
documents of the European Parliament, and Neo-GATE                  cade Call - FAIR - CUP C63C22000770006 - PE PE0000013
data manually created by experts. The corpora could                 under the NextGenerationEU programme.
be expanded to other domains and annotated by more                     The annotation of GFL-it has been partially funded by
annotators in future research. Secondly, our metrics are            Università degli Studi di Brescia as part of the actions
only a first attempt and others should be explored in the           provided for by the Gender Equality Plan.
future. Moreover, we only tested one paradigm of neo-
morphemes, namely the schwa-simple , while many oth-
ers exist (e.g., the asterisk, the ‘-u’, the ‘@’ - see [23] for a
                                                                    References
complete list), and even more could be proposed. Further-            [1] M. Rosola, S. Frenda, A. T. Cignarella, M. Pellegrini,
more, GeNTE and Neo-GATE do not contain mixed texts                      A. Marra, M. Floris, et al., Beyond obscuration and
where rewriting is needed with respect to one entity but                 visibility: Thoughts on the different strategies of
not others.                                                              gender-fair language in italian, in: CLiC-it 2023.
     Proceedings of the 9th Italian Conference on Com-              ties of LAnguage Models in ITAlian, in: Proceed-
     putational Linguistics. Venice, Italy, November 30-            ings of the 10th Italian Conference on Computa-
     December 2, 2023., volume 3596, CEUR-WS, 2023,                 tional Linguistics (CLiC-it 2024), Pisa, Italy, Decem-
     pp. 1–10.                                                      ber 4 - December 6, 2024, CEUR Workshop Proceed-
 [2] J. Silveira, Generic Masculine Words and Think-                ings, CEUR-WS.org, 2024.
     ing, Women’s Studies International Quarterly 3            [14] A. Piergentili, B. Savoldi, D. Fucci, M. Negri, L. Ben-
     (1980) 165–178. URL: https://www.sciencedirect.                tivogli, Hi guys or hi folks? benchmarking gender-
     com/science/article/pii/S0148068580921132.                     neutral machine translation with the GeNTE cor-
 [3] P. Gygax, S. Sato, A. Öttl, U. Gabriel, The mas-               pus, in: H. Bouamor, J. Pino, K. Bali (Eds.), Pro-
     culine form in grammatically gendered languages                ceedings of the 2023 Conference on Empirical
     and its multiple interpretations: A challenge for              Methods in Natural Language Processing, Asso-
     our cognitive system, Language Sciences 83 (2021)              ciation for Computational Linguistics, Singapore,
     101328.                                                        2023, pp. 14124–14140. URL: https://aclanthology.
 [4] G. Sulis, V. Gheno, The debate on language and gen-            org/2023.emnlp-main.873. doi:10.18653/v1/2023.
     der in italy, from the visibility of women to inclu-           emnlp- main.873 .
     sive language (1980s–2020s), The Italianist 42 (2022)     [15] A. Piergentili, B. Savoldi, M. Negri, L. Bentivogli,
     153–183. doi:10.1080/02614340.2022.2125707 .                   Enhancing gender-inclusive machine translation
 [5] G. Visibility, N. across Languages, Beyond pro-                with neomorphemes and large language models,
     nouns, The Oxford Handbook of Applied Philoso-                 in: C. Scarton, C. Prescott, C. Bayliss, C. Oak-
     phy of Language (2024) 320.                                    ley, J. Wright, S. Wrigley, X. Song, E. Gow-
 [6] M. Rosola, Linguistic hermeneutical injustice, So-             Smith, R. Bawden, V. M. Sánchez-Cartagena,
     cial Epistemology (2024). doi:10.1080/02691728.                P. Cadwell, E. Lapshinova-Koltunski, V. Cabar-
     2024.2401143 .                                                 rão, K. Chatzitheodorou, M. Nurminen, D. Kanojia,
 [7] S. J. Kapusta, Misgendering and its moral contesta-            H. Moniz (Eds.), Proceedings of the 25th Annual
     bility, Hypatia 31 (2016) 502–519.                             Conference of the European Association for Ma-
 [8] R. Dembroff, D. Wodak, He/she/they/ze, Ergo                    chine Translation (Volume 1), European Associa-
     (2018).                                                        tion for Machine Translation (EAMT), Sheffield, UK,
 [9] S. Sczesny, M. Formanowicz, F. Moser, Can gender-              2024, pp. 300–314. URL: https://aclanthology.org/
     fair language reduce gender stereotyping and dis-              2024.eamt-1.25.
     crimination?, Frontiers in psychology 7 (2016)            [16] A. Piergentili, D. Fucci, B. Savoldi, L. Bentivogli,
     154379.                                                        M. Negri, Gender neutralization for an inclu-
[10] P. Gygax, S. Zufferey, U. Gabriel, Le cerveau pense-t-         sive machine translation: from theoretical foun-
     il au masculin, Cerveau, langage et représentations            dations to open challenges, in: E. Vanmassen-
     sexistes, Paris, Le Robert (2021).                             hove, B. Savoldi, L. Bentivogli, J. Daems, J. Hack-
[11] S. L. Blodgett, S. Barocas, H. Daumé III, H. Wal-              enbuchner (Eds.), Proceedings of the First Work-
     lach, Language (technology) is power: A critical               shop on Gender-Inclusive Translation Technolo-
     survey of “bias” in NLP, in: D. Jurafsky, J. Chai,             gies, European Association for Machine Transla-
     N. Schluter, J. Tetreault (Eds.), Proceedings of the           tion, Tampere, Finland, 2023, pp. 71–83. URL: https:
     58th Annual Meeting of the Association for Com-                //aclanthology.org/2023.gitt-1.7.
     putational Linguistics, Association for Computa-          [17] P. Koehn, Europarl: A Parallel Corpus for Statis-
     tional Linguistics, Online, 2020, pp. 5454–5476. URL:          tical Machine Translation, in: Proceedings of the
     https://aclanthology.org/2020.acl-main.485. doi:10.            tenth Machine Translation Summit, AAMT, Phuket,
     18653/v1/2020.acl- main.485 .                                  TH, 2005, pp. 79–86. URL: http://mt-archive.info/
[12] B. Savoldi, A. Piergentili, D. Fucci, M. Negri, L. Ben-        MTS-2005-Koehn.pdf.
     tivogli, A prompt response to the demand for au-          [18] S. Rarrick, R. Naik, V. Mathur, S. Poudel, V. Chowd-
     tomatic gender-neutral translation, in: Y. Graham,             hary, GATE: A challenge set for gender-ambiguous
     M. Purver (Eds.), Proceedings of the 18th Confer-              translation examples, in: Proceedings of the 2023
     ence of the European Chapter of the Association                AAAI/ACM Conference on AI, Ethics, and Society,
     for Computational Linguistics (Volume 2: Short                 AIES ’23, Association for Computing Machinery,
     Papers), Association for Computational Linguis-                New York, NY, USA, 2023, p. 845–854. URL: https:
     tics, St. Julian’s, Malta, 2024, pp. 256–267. URL:             //doi.org/10.1145/3600211.3604675. doi:10.1145/
     https://aclanthology.org/2024.eacl-short.23.                   3600211.3604675 .
[13] G. Attanasio, P. Basile, F. Borazio, D. Croce, M. Fran-   [19] A. M. Thornton,          Genere e igiene verbale:
     cis, J. Gili, E. Musacchio, M. Nissim, V. Patti, M. Ri-        l’uso di forme con @ in italiano,               Annali
     naldi, D. Scalena, CALAMITA: Challenge the Abili-              Del Dipartimento Di Studi Letterari, Linguis-
     tici E Comparati. Sezione Linguistica 11 (2020)
     11–54. URL: http://www.serena.unina.it/index.php/
     aionlin/article/view/9623. doi:https://doi.org/
     10.6093/2281- 6585/9623 .
[20] R. Baiocco, F. Rosati, J. Pistella, Italian proposal for
     non-binary and inclusive language: The schwa as a
     non-gender–specific ending, Journal of Gay & Les-
     bian Mental Health 27 (2023) 248–253. URL: https:
     //doi.org/10.1080/19359705.2023.2183537. doi:10.
     1080/19359705.2023.2183537 .
[21] E. Sánchez, P. Andrews, P. Stenetorp, M. Artetxe,
     M. R. Costa-jussà, Gender-specific machine transla-
     tion with large language models, 2024. URL: https:
     //arxiv.org/abs/2309.03175. arXiv:2309.03175 .
[22] T. Zhang*, V. Kishore*, F. Wu*, K. Q. Weinberger,
     Y. Artzi, BERTScore: Evaluating text generation
     with BERT, in: International Conference on Learn-
     ing Representations, 2020. URL: https://openreview.
     net/forum?id=SkeHuCVFDr.
[23] V. Gheno,        Lo schwa tra fantasia e norma,
     La falla (2020). URL: https://lafalla.cassero.it/
     lo-schwa-tra-fantasia-e-norma/.
[24] L. Iacopini, Lo schwa (ǝ) che rende l’inclusione
     inaccessibile, Web accessibile (2021). URL: https:
     //webaccessibile.org/approfondimenti/lo-schwa-%
     C7%9D-che-rende-linclusione-inaccessibile/.
[25] C. D. Santis, L’emancipazione grammaticale
     non passa per una e rovesciata, 2022. URL:
     https://www.treccani.it/magazine/lingua_italiana/
     articoli/scritto_e_parlato/Schwa.html.


A. The schwa-simple paradigm
Table 6 reports the forms used in the schwa-simple
paradigm, along with the corresponding tags in Neo-
GATE and masculine and feminine equivalents.
 TAG             Description                                                 Masculine           Feminine        Schwa
 <ENDS>       portion of the word differentiating gendered forms, singular   o, e, tore           a, essa, trice @, tor@
 <ENDP>       portion of the word differentiating gendered forms, plural     i, tori              e, esse, trici @, tor@
 <DARTS>      definite article, singular                                     il, lo, l’           la, l’         l@
 <DARTP>      definite article, plural                                       i, gli               le             l@
 <IART>       indefinite article                                             uno, un              una, un’       un@
 <PARTP>      partitive article, plural                                      dei, degli           delle          de@
 <PREPdiS>    articulated preposition with root ‘di’, singular               del, dello, dell’    della, dell’   dell@
 <PREPdiP>    articulated preposition with root ‘di’, plural                 dei, degli           delle          dell@
 <PREPaS>     articulated preposition with root ‘a’, singular                al, allo, all’       alla, all’     all@
 <PREPaP>     articulated preposition with root ‘a’, plural                  agli, ai             alle           all@
 <PREPdaS>    articulated preposition with root ‘da’, singular               dal, dallo, dall’    dalla, dall’   dall@
 <PREPdaP>    articulated preposition with root ‘da’, plural                 dagli                dalle          dall@
 <PREPinP>    articulated preposition with root ‘in’, plural                 negli                nelle          nell@
 <PREPsuS>    articulated preposition with root ‘su’, singular               sul, sullo, sull’    sulla, sull’   sull@
 <PREPsuP>    articulated preposition with root ‘su’, plural                 sugli                sulle          sull@
 <DADJquelS> demonstrative adjective (far), singular                         quel, quello, quell’ quella, quell’ quell@
 <DADJquelP> demonstrative adjective (far), plural                           quegli               quelle         quell@
 <DADJquestS> demonstrative adjective (near), singular                       questo, quest’       questa, quest’ quest@
 <DADJquestP> demonstrative adjective (near), plural                         questi               queste         quest@
 <POSS1S>     possessive adjective, 1st person singular, singular            mio                  mia            mi@
 <POSS1P>     possessive adjective, 1st person singular, plural              miei                 mie            mi@
 <POSS2S>     possessive adjective, 2nd person singular, singular            tuo                  tua            tu@
 <POSS2P>     possessive adjective, 2nd person singular, plural              tuoi                 tue            tu@
 <POSS3S>     possessive adjective, 3rd person singular, singular            suo                  sua            su@
 <POSS3P>     possessive adjective, 3rd person singular, plural              suoi                 sue            su@
 <POSS4S>     possessive adjective, 1st person plural, singular              nostro               nostra         nostr@
 <POSS4P>     possessive adjective, 1st person plural, plural                nostri               nostre         nostr@
 <PRONDOBJS> direct object pronoun, singular                                 lo                   la             l@
 <PRONDOBJP> direct object pronoun, plural                                   li                   le             l@

Table 6
The full tagset used in Neo-GATE, mapped to the Italian gendered forms and the schwa-simple nomorpheme paradigm.

</pre>