<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Gender Bias in Italian Word Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Biasion</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Fabris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gian Antonio Susto</string-name>
          <email>sustogiag@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universita` degli Studi di Padova</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work we study gender bias in Italian word embeddings (WEs), evaluating whether they encode gender stereotypes studied in social psychology or present in the labor market. We find strong associations with gender in job-related WEs. Weaker gender stereotypes are present in other domains where grammatical gender plays a significant role.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In the literature, the study of gender bias in word
embeddings (WEs) is of interest for two main
reasons: (i) WEs, as components of automatic
decision systems (e.g. job search tools), may
contribute to harm some user groups
        <xref ref-type="bibr" rid="ref7">(De-Arteaga et
al., 2019)</xref>
        ; (ii) WEs can be employed as a tool to
measure the biases of text corpora
        <xref ref-type="bibr" rid="ref11">(Garg et al.,
2018)</xref>
        and systems for automatic text
classification or information retrieval
        <xref ref-type="bibr" rid="ref10">(Fabris et al., 2020)</xref>
        .
In both applications, it is important to isolate the
gender-related information in a subspace
        <xref ref-type="bibr" rid="ref4">(Bolukbasi et al., 2016)</xref>
        and subsequently (i) eliminate it
via orthogonal projection or (ii) exploit it as a lens
to study association of concepts with gender.
      </p>
      <p>
        A common taxonomy of bias in algorithms
concentrates on the types of harm that they may cause
        <xref ref-type="bibr" rid="ref1">(Barocas et al., 2017)</xref>
        . Allocational harms
happen when a limited resource (e.g. jobs) is assigned
unfairly to subgroups of a population (e.g. women
and men). Representational harms arise when
groups or individuals are unable to determine their
image, which is presented unfavourably or
neglected. Autocomplete suggestions in search
engines
        <xref ref-type="bibr" rid="ref18 ref19">(Noble, 2018; Olteanu et al., 2020)</xref>
        are a
      </p>
      <p>Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
clear example of this situation. Query
completion suggestions for “why are italian . . . ”
associate diverse concepts to the country and its
inhabitants. Italians contribute very little to these results
as they are unlikely to search information about
themselves in English.</p>
      <p>
        Italian WEs have been developed
        <xref ref-type="bibr" rid="ref2 ref3">(Berardi et
al., 2015; Bojanowski et al., 2017)</xref>
        and analyzed
        <xref ref-type="bibr" rid="ref1 ref21 ref3 ref5">(Tripodi and Li Pira, 2017)</xref>
        , following seminal
work in English; analysis of gender bias has
unfortunately lagged behind. Our main contribution
is to close this gap, by undertaking a systematic
study of gender stereotypes in Italian WEs,
adapting established approaches that assess gender bias
in English WEs.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Gender stereotypes are representational harms
which influence the lives of women and men both
descriptively and prescriptively, shaping the
qualities, priorities and needs that members of each
gender are expected to possess
        <xref ref-type="bibr" rid="ref8">(Ellemers, 2018)</xref>
        .
In seminal work, Bolukbasi et al. (2016) uncover
problematic associations with gender in English
WEs. Their approach to identify gender
information is adapted to Italian in Section 3.1.1. Caliskan
et al. (2017) study the stereotypical association of
gender with dichotomies such as career and
family, science and arts, following the Implicit
Association Test (IAT - Greenwald et al. (1998)).We
recall their approach in Section 3.1.2. WEs of
jobs have also been analyzed extensively due to
their potential for allocational harms in resume
search engines
        <xref ref-type="bibr" rid="ref20 ref7">(De-Arteaga et al., 2019; Prost et
al., 2019)</xref>
        and representational harms
        <xref ref-type="bibr" rid="ref5">(Caliskan et
al., 2017)</xref>
        , e.g. in general purpose search engines
        <xref ref-type="bibr" rid="ref16">(Kay et al., 2015)</xref>
        . Grammatical gender has been
found to interact strongly with semantic gender
in Spanish and German
        <xref ref-type="bibr" rid="ref10 ref17 ref19">(McCurdy and Serbetci,
2020)</xref>
        , showing that the study of bias in gendered
languages poses an additional challenge. We adapt
these experiments to the Italian language, detailing
our approach in Sections 3-5.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Gender in Italian WEs</title>
      <sec id="sec-3-1">
        <title>3.1 Identifying gender information 3.1.1</title>
      </sec>
      <sec id="sec-3-2">
        <title>Gender score</title>
        <p>To identify a vectorial subspace which encodes
information about gender, we follow Bolukbasi et
al. (2016) by building a list of gender definitional
pairs: [lui (he), lei (she)], [uomo (man), donna
(woman)], [padre (father), madre (mother)],
[marito (husband), moglie (wife)], [fratello (brother),
sorella (sister)], [maschio (male), femmina
(female)].</p>
        <p>These pairs are built so that the second word
denotes a female entity and the first word is,
semantically, its male counterpart. Moreover, given we are
interested in capturing semantic information about
gender, while avoiding entanglement with
grammatical gender, we ensure that the words in a pair
do not derive from the same root via inflection.
An example of pair discarded due to this criterion
is [figlio (son), figlia (daughter)].</p>
      </sec>
      <sec id="sec-3-3">
        <title>Principal Component Analysis. We perform a</title>
        <p>Principal Component Analysis (PCA) on the six
vector differences resulting from each gender
definitional pair. The first eigenvalue dominates the
remaining ones, with the first PC explaining 57%
of variance. We normalize the first PC and
consider it the main gender direction, denoted by
gPCA.</p>
        <p>
          This is an established procedure to isolate the
direction that captures most of the information
about gender
          <xref ref-type="bibr" rid="ref4 ref9">(Bolukbasi et al., 2016; Ethayarajh
et al., 2019)</xref>
          . In other words, by finding the
direction that best fits the six vector differences
(l!ui !lei; uom!o don!na; : : : ), we aim to obtain
a direction that summarizes them.
        </p>
        <p>Vector differences. To evaluate the robustness of
this approach and highlight potential anomalies,
we also consider each vector difference on its own,
defining six unit length gender directions gdiffi :
gdiff0 = l!ui
gdiff1 = uom!o
gdiff2 = pad!re
!
lei
don!na
mad!re
gdiff3 = mari!to
gdiff4 = fratel!lo
gdiff5 = masch!io
mog!lie
sorel!la
femmi!na</p>
      </sec>
      <sec id="sec-3-4">
        <title>Gender score computation. Given a word w, let</title>
        <p>us indicate with w its corresponding word vector.
Let us consider any of the gender directions g
defined above. We call gender score the normalized
projection of w onto the direction g, defined as
sg(w) = w g=(jwjjgj):
(1)
This scalar captures associations of w along
gendered lines. Informally, a highly positive value
means that w is closer to the male terms of the
pairs than to the female ones, while a strongly
negative value entails the opposite.
3.1.2 WEAT
The Implicit Association Test (IAT - Greenwald
et al. (1998)) is an assessment developed in
cognitive psychology to measure subconscious
associations between categories and concepts. It
is commonly employed to assess implicit
stereotypes in people. The Word Embedding
Association Test (WEAT - Caliskan et al. (2017)) is a
technique inspired by the IAT to measure
associations between concepts in WEs. Let X and
Y be two equal-sized sets of target words and
A and B two sets of attribute words, e.g., X =
fprogrammer; engineerg, Y = fnurse; teacherg,
A = fman; maleg, B = fwoman; femaleg. Let
cos(a; b) be the cosine similarity between the
word vectors a and b. The differential association
of a word w (taken from X or Y ) with the attribute
sets A and B is measured as
c(w; A; B) = meana2A cos(w; a)</p>
        <p>meanb2B cos(w; b): (2)
The normalized differential association between
targets and attributes is defined as
d =
meanx2X c(x; A; B) meany2Y c(y; A; B)
std-devw2X[Y c(w; A; B)
(3)</p>
        <p>This is called effect size in statistics, and
summarizes how different the quantity c(w; A; B) is,
when evaluated on elements of target set X as
opposed to target set Y . It is computed as a
difference of means within each set, divided by overall
standard deviation.</p>
        <p>
          Gender score and WEAT. It is worth noting that,
when jAj = jBj = 1, WEAT is almost equivalent
to the gender score defined in Section 3.1.1. Let
A = fa0g and B = fb0g be the sets of attribute
words. Since we are using normalized vectors and
the distributive property holds for the dot product,
then
:
c(w; A; B) = cos(w; a0)
cos(w; b0)
= w (a0
b0) = w g = sg(w):
(4)
Italian is a gendered language, wherein
grammatical gender is assigned to all nouns. Within a
sentence, each word is surrounded by other words of
agreeing grammatical gender. This phenomenon,
called grammatical gender agreement, in
conjunction with the distributional hypothesis
          <xref ref-type="bibr" rid="ref15">(Harris,
1954)</xref>
          , plays an important role when training WEs.
Due to these properties, words that share the same
grammatical gender to have similar vector
representations. Accordingly, grammatical and
semantic gender become entangled in WEs
          <xref ref-type="bibr" rid="ref10 ref12 ref17 ref19">(McCurdy
and Serbetci, 2020; Gonen et al., 2019)</xref>
          . As a
consequence, when computing the gender score, we
tend to obtain positive values for (grammatically)
masculine terms and a negative score for feminine
ones, making stereotypical association more noisy
and harder to study.
        </p>
        <p>Mean gender score. To compute the gender score
(Equation 1) for gendered words that have both a
feminine and a masculine version, we propose the
following approach. Let us indicate with wf and
wm the feminine and masculine version of a
gendered word w. We define their gender score as
smeang (w) = (sg(wf ) + sg(wm))=2:
(5)</p>
        <p>Averaging the masculine and feminine version
with equal weights corresponds to giving both
versions of the word the same importance. Different
approaches, based for instance on word frequency,
may be applicable in other contexts.</p>
        <p>Orthogonal projection. Some nouns cannot be
inflected into the opposite grammatical gender,
making the above approach impractical. An
example is ufficio (office). In this context, we propose
to mitigate the effect of grammatical gender by
reembedding every word through an orthogonal
projection. We build a list of 138 inflected word pairs.
Each pair consists of the feminine and masculine
inflections of the same root, such as cara and caro
(dear), which only differ in grammatical gender.
We take the embedding of both words in a pair
and compute their difference.</p>
        <p>We perform PCA on these vector differences.
The resulting PCs span a subspace U that
contains most of the variance due to grammatical
gender. To reduce the influence of grammatical
gender, we re-embed vectors by projecting them on
the orthogonal complement of U . In other words,
given a word embedding w, let us call projU w
its orthogonal projection onto the “grammatical
gender subspace” U . We propose re-emdedding
every word vector w to
w0 = w projU w : (6)</p>
        <p>jw projU wj
By means of this procedure, we obtain a new set
of WEs. By construction, in this new embedding
space, grammatical gender should have a lower
influence on the geometry of word vectors.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Datasets and embeddings</title>
      <p>
        To study gender bias we use WEs trained on two
different datasets for the Italian language, both
made available by FastText
        <xref ref-type="bibr" rid="ref13 ref3">(Bojanowski et al.,
2017; Grave et al., 2018)</xref>
        . The first group of vector
representations, which we refer to as wiki,
consists of word vectors trained on a 2016 Wikipedia
dump
        <xref ref-type="bibr" rid="ref3">(Bojanowski et al., 2017)</xref>
        .1 The second
group of word vectors (labeled wiki-cc) was
trained on the May 2017 Common Crawl2 and the
Wikipedia dump from September 11, 2017
        <xref ref-type="bibr" rid="ref13">(Grave
et al., 2018)</xref>
        .
      </p>
      <p>We compare our results from the analyses on
Italian WEs with results on their English
counterpart. To this end, we also download two sets of
FastText WEs trained on the English version of
the same corpora, i.e. the English counterparts of
wiki and wiki-cc. Given Wikipedia is a more
curated source, we expect to find weaker
stereotypes in wiki than in wiki-cc for both
languages. As a pre-processing step we normalize
every word vector to unit length.</p>
      <p>
        Census data about the labor market is required
to analyse the correlation between the gender gap
in professions and the gender score of the
respective WEs. The statistics on the American
occupation and gender representation are readily
available
        <xref ref-type="bibr" rid="ref6">(Census Bureau, 2019)</xref>
        . For their Italian
counterpart, we retrieve statistics about occupation
participation from several institutions, including
professional chambers (Comitato Unitario
Permanente degli Ordini e Collegi Professionali,
Confprofessioni) and academic databases
(AlmaLaurea).3
      </p>
      <p>Finally, in order to perform the Word
Embedding Association Tests (WEAT), we need sets of
1The authors provide no information about which
Wikipedia dump they use.</p>
      <p>
        2Common Crawl is a corpus of web pages, aimed at
representing “a copy of the internet” at a given time. The authors
train WEs on pages written in Italian, exploiting language
identification as preliminary step for their pipeline.
3The detailed list of sources is available upon request.
target and attribute words in Italian. The sets of
target words for the gender-science WEAT
(Section 5.2) are derived from the Italian version of
IAT;4 those for the gender-career WEAT (Section
5.3) were unavailable and have been translated by
the authors of this work from the original IAT
        <xref ref-type="bibr" rid="ref14">(Greenwald et al., 1998)</xref>
        .
5
5.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Occupations</title>
        <p>
          This experiment investigates gender
representation for different jobs in Italy and their association
with gender-realted information in WEs,
following studies on the English language
          <xref ref-type="bibr" rid="ref11 ref20 ref7">(De-Arteaga et
al., 2019; Garg et al., 2018; Prost et al., 2019)</xref>
          . For
each occupation, we compute its gender score
using the different gender directions defined in
Section 3.1.1, namely gPCA and gdiffi , i 2 f0 : : : 5g.
We calculate the plain gender score for the
ungendered occupations (Equation 1) and the mean
gender score for occupations characterized by
grammatical gender (Equation 5).
        </p>
        <p>We compute Pearson’s correlation r between
the gender scores and the percentage of women
employed in each profession. The same analyses
are carried out on English WEs, restricting them to
the same set of occupations considered in Italian.
Results are summarized in Table 1 and Figure 1,
showing that Italian WEs consistently capture
information about different gender representation in
jobs. Informally, this mean that ordering jobs by
percentage of women and by projection on a
gender direction yields similar results. The right pane
of Figure 1 demonstrates the significant effect of
grammatical gender.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Science and Arts</title>
        <p>In this WEAT, the sets of target words for Science
and Arts, taken from the Italian version of the IAT,
are: X = fbiologia (biology), fisica (physics),
chimica (chemistry), matematica (mathematics),
geologia (geology), astronomia (astronomy),
ingegneria (engineering)g, Y = ffilosofia
(philosophy), umanesimo (humanism), arte (arts),
letteratura (literature), italiano (italian), musica
(music), storia (history)g. The sets of male and female
attributes are taken from the gender definitional
pairs (Section 3.1.1): A = flui, uomo, padre,
marito, fratello, maschiog, B = flei, donna, madre,
4https://implicit.harvard.edu/
implicit/italy/takeatest.html
IT
gPCA
gdiff0
gdiff1
gdiff2
gdiff3
gdiff4
gdiff5
EN
gPCA
IT
all
moglie, sorella, femminag. We compute the effect
size d and the p-value using the whole attribute
sets A and B, and label this analysis “all”.
Moreover, we also perform the WEAT test over single
word pairs, e.g. A = fluig, B = fleig. Results
are reported in Table 2. We find no stereotypical
association in the expected direction. We
hypothesize that this is due to the feminine grammatical
gender of all science-related target words,
deferring a more detailed analysis to Section 5.4.
In essence, the Career and Family WEAT is very
similar to the Science and Arts WEAT; the only
difference is in the sets of target words. The
target sets are translated into Italian from the
original English IAT as follows: X = fesecutivo
(executive), management (management),
professionale (professional), azienda (corporation),
stipendio (salary), ufficio (office)g, Y = fcasa (home),
ungendered
mean gendered
rrscoeend
eg 0.1
00..01 ((tceeraulleemctcktitrordi(cnicccraiiiiossvaltztbenaaobr()l)alaaeggirorr)(ioonetngneoegcmginnniiecseoetr)re) co(amcmco(enunronoc(tdttidaaaaeerlinnioy(nst(gta))tgatieasriersctocotahlh)oliaoit(gmt(g(eldgejbihoitsooecoetoutctdoarr)t()rtngeivlnocaaalvraoiwlte)o(ilopsicyrcsr)tha(eateawfmi)rtill)ooaoesistrooeiepfror)h(evveeert)t(eepprrririinnneaacsriri((dip(ypzobp(zbea)orpaforaoloaho)rorflrmfaeiolmesorsgatsgmisaainnssooc(oaf()trnieh)rcisegu)ryitsmiar(getsbbi)ineeieoii)norsleolitosagtg(()ipdsodsitpie)yestctiicichsoi(otaiclanblon(aotsag)rgeedeoiagsagctnni)hvateeenrrt))(eo(bmotsesattaeeecttshrrtieicrcorioa)n)
40 % female 60
0
20
80
100
0
20
80
100
mantic association with gender, as measured
objectively by the percentage of women in each
profession and (ii) a grammatical association with
gender, as half of those words admit a feminine
and a masculine version.</p>
        <p>We measure the relative strength of semantic
and grammatical associations in the proposed
gender directions as follows. Let us denote by Sg,
the set of job-related words which admit a
feminine and a masculine version and by w =
sg(wm) sg(wf ) the difference in their gender
scores.5 We compute the average influence of
grammatical gender on direction g (based on set
Sg) as
g =
1</p>
        <p>X
jSgj w2Sg
w:
(7)
Visually, this corresponds to the average (signed)
length of the vertical lines, in the right pane of
Figure 1, connecting the feminine and masculine
version of a job-related word.</p>
        <p>Moreover, let us denote by S the complete set
of job-related words wj and by xj the percentage
of women in job wj . Let us indicate with sg(wj )
the respective gender score, computed according
to Equation 1 or 5, depending on whether wj
admits different masculine and feminine inflections.
We define maxS (x) (minS (x)) as the maximum
(minimum) percentage of women in a job from set
5For the sake of brevity, we concentrate on gPCA; the
remaining gender directions (gdiffi ) yield similar results.
genitori (parents), bambini (children), famiglia
(family), cugini (cousins), matrimonio (marriage),
nozze (wedding), parenti (relatives)g. Results are
summarized in Table 3. Stereotypical associations
for wiki-cc WEs are present but weak, whereas
they are more significant for wiki.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.4 Mitigating the effect of grammatical gender</title>
        <p>In this section we quantify the extent to which the
semantic gender information (Section 3.1) is
influenced by grammatical gender, and test one
approach designed to mitigate its influence (Section
3.2).</p>
        <p>The dataset used in the experiment about
jobrelated WEs (Section 5.1) is suitable for this
analysis, as it consists of words which have (i) a
seS. Furthermore, let us call m the angular
coefficient computed by (linearly) regressing sg(wj )
onto xj over set S. We compute the full-scale
influence of semantic gender on direction g (based
on set S) as
s = jm(maxS (x)
minS (x))j:
(8)
Visually, this corresponds to the vertical
component of the blue regression line in Figure 1, clipped
between minS (x) and maxS (x).</p>
        <p>Finally, we compute the relative strength of
semantic and grammatical associations in the
proposed gender direction as the ratio
k =
g :
s
(9)</p>
        <p>The first three rows of Table 4 report g, s
and k for wiki-cc (first column) on the job
dataset described in Section 4. The second
column concentrates on a set of word embeddings
derived from wiki-cc by removing information
about grammatical gender from every word, via
Equation 6.6 We label this new set of word
embbeddings wiki-cc?. In going from wiki-cc
to wiki-cc?, g is reduced by over 40% while
s decreases by less than 10%. This indicates
that the orthogonal projection procedure reduces
the influence of grammatical gender while
retaining semantic information which is present in the
original version of the WEs, hence the value of k
decreases.</p>
        <p>
          The final three rows of Table 4 report summary
statistics for stereotypical associations described
in Sections 5.1-5.3. Interestingly, the significance
of each association is larger for wiki-cc? than
for wiki-cc. In particular, the effect size for the
Science-Arts WEAT becomes positive, in
accordance with the stereotype. We interpret these
results as evidence for the hypothesis that
grammatical gender confounds and outweighs
stereotypical associations in Italian WEs, in line with prior
work on gendered languages
          <xref ref-type="bibr" rid="ref10 ref17 ref19">(McCurdy and
Serbetci, 2020)</xref>
          .
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>We successfully replicated prior analyses about
gender-stereotypical associations in English WEs,
finding them to be consistently stronger when
computed on WEs trained on a weakly curated
6In this experiment, the grammatical gender subspace U
is spanned by the first PC.
wiki-cc</p>
      <p>wiki-cc?</p>
      <p>Occupations (gPCA)
corpus. To the best of our knowledge, this is a
novel result.</p>
      <p>
        For Italian WEs, the picture is more nuanced
and tied to grammatical gender. WEs for
occupations, which are ungendered or admit a dual
form, are robustly associated with gender along
a stereotypical direction. Compared against the
other stereotypes analysed in this work, this is
the strongest association, confirming results from
prior work on English WEs
        <xref ref-type="bibr" rid="ref10">(Fabris et al., 2020)</xref>
        . In
the Science-Arts WEAT, science-related words are
all feminine nouns, departing from the expected
stereotypical association. Semantic associations
with gender are outweighed by grammatical
gender in this WEAT, in accordance with prior work
on gendered languages
        <xref ref-type="bibr" rid="ref10 ref17 ref19">(McCurdy and Serbetci,
2020)</xref>
        . Our analysis in Section 5.4 demonstrates
the importance of grammatical gender in Italian.
On the other hand, the Career-Family WEAT
features a more balanced distribution of
grammatical gender, resulting in a differential association
which is in line with gender stereotypes, especially
for wiki, less so for wiki-cc.
      </p>
      <p>In Italian WEs, we find that wiki
embeddings contain stronger stereotypical associations
than wiki-cc embeddings for the Career-Family
WEAT. This disconfirms our hypothesis that WEs
trained on a less curated corpus (wiki-cc)
would encode stereotypes more strongly. Finally,
we find no consistent property connected to
specific gender directions gdiffi . Across different
corpora and stereotypes, the aggregated analyses
(labelled “all” and gPCA) provide a reasonable
summary of the stereotypical associations encoded in
the single gender directions gdiffi .
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conlusion</title>
      <p>
        Overall, we have analyzed gender bias in Italian
WEs, adapting existing techniques and gathering
data where required. We looked for stereotypical
associations with gender-imbalanced professions,
Career and Family, Science and Arts, finding
significant associations in 2 out of 3. As expected
from prior work
        <xref ref-type="bibr" rid="ref10 ref12 ref17 ref19">(Gonen et al., 2019; McCurdy and
Serbetci, 2020)</xref>
        , grammatical gender is a strong
confounder in these analyses.
      </p>
      <p>We draw the following preliminary
conclusions: (i) Italian WEs seem to have less
potential than their English counterparts to
systematically reinforce the tested gender
stereotypes, mostly due to grammatical gender.
However, (ii) the influence of grammatical gender on
WEs may cause different harms. As an
example, in the context of job search, masculine is
likely to be the default choice for queries of
recruiters (male as norm - e.g. “psicologo”
[psychologist]). Those queries would likely be closer
to male candidates’ CVs than equivalent female
ones, in some embedded text representations,
potentially putting women at a systematic
disadvantage. Both points above require further
analysis of text retrieval/classification systems based
on Italian WEs. Finally, (iii) isolating
stereotypical concepts and gendered associations in
Italian WEs along a single direction is challenging.
The tested WEs show little promise as a reliable
measurement tool for gender-stereotypical
associations, unless combined with approaches to
mitigate the influence of grammatical gender.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Solon</given-names>
            <surname>Barocas</surname>
          </string-name>
          , Kate Crawford, Aaron Shapiro,
          <string-name>
            <given-names>and Hanna</given-names>
            <surname>Wallach</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The problem with bias: Allocative versus representational harms in machine learning</article-title>
          .
          <source>In 9th Annual Conference of the Special Interest Group for Computing, Information and Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Giacomo</given-names>
            <surname>Berardi</surname>
          </string-name>
          , Andrea Esuli, and Diego Marcheggiani.
          <year>2015</year>
          .
          <article-title>Word embeddings go to italy: A comparison of models and training datasets</article-title>
          .
          <source>In IIR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>5</volume>
          :
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Tolga</given-names>
            <surname>Bolukbasi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kai-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          , James Y. Zou, Venkatesh Saligrama, and
          <string-name>
            <surname>Adam</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kalai</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Man is to computer programmer as woman is to homemaker? debiasing word embeddings</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>4349</fpage>
          -
          <lpage>4357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Aylin</given-names>
            <surname>Caliskan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Joanna J.</given-names>
            <surname>Bryson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Arvind</given-names>
            <surname>Narayanan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Semantics derived automatically from language corpora contain human-like biases</article-title>
          .
          <source>Science</source>
          ,
          <volume>356</volume>
          (
          <issue>6334</issue>
          ):
          <fpage>183</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Census</given-names>
            <surname>Bureau</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Current population survey</article-title>
          .
          <source>Accessed = 2020-02-12.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Maria De-Arteaga</surname>
          </string-name>
          , Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai.
          <year>2019</year>
          .
          <article-title>Bias in bios</article-title>
          .
          <source>Proceedings of the Conference on Fairness, Accountability</source>
          , and Transparency - FAT* '
          <volume>19</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Naomi</given-names>
            <surname>Ellemers</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Gender stereotypes</article-title>
          .
          <source>Annual Review of Psychology</source>
          ,
          <volume>69</volume>
          (
          <issue>1</issue>
          ):
          <fpage>275</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Kawin</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          , David Duvenaud,
          <string-name>
            <given-names>and Graeme</given-names>
            <surname>Hirst</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Understanding undesirable word embedding associations</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>1696</fpage>
          -
          <lpage>1705</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Fabris</surname>
          </string-name>
          , Alberto Purpura, Gianmaria Silvello, and Gian Antonio Susto.
          <year>2020</year>
          .
          <article-title>Gender stereotype reinforcement: Measuring the gender bias conveyed by ranking algorithms</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>57</volume>
          (
          <issue>6</issue>
          ):
          <fpage>102377</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Nikhil</given-names>
            <surname>Garg</surname>
          </string-name>
          , Londa Schiebinger, Dan Jurafsky, and
          <string-name>
            <given-names>James</given-names>
            <surname>Zou</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Word embeddings quantify 100 years of gender and ethnic stereotypes</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>115</volume>
          (
          <issue>16</issue>
          ):
          <fpage>E3635</fpage>
          -
          <lpage>E3644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Hila</given-names>
            <surname>Gonen</surname>
          </string-name>
          , Yova Kementchedjhieva, and
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How does grammatical gender affect noun representations in gender-marking languages</article-title>
          ?
          <source>In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)</source>
          , pages
          <fpage>463</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Edouard</given-names>
            <surname>Grave</surname>
          </string-name>
          , Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning word vectors for 157 languages</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Anthony G.</given-names>
            <surname>Greenwald</surname>
          </string-name>
          , Debbie E.
          <string-name>
            <surname>McGhee</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jordan L. K. Schwartz</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Measuring individual differences in implicit cognition: The implicit association test</article-title>
          .
          <source>Journal of Personality and Social Psychology</source>
          ,
          <volume>74</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1464</fpage>
          -
          <lpage>1480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Zellig S Harris</surname>
          </string-name>
          .
          <year>1954</year>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          ,
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Kay</surname>
          </string-name>
          ,
          <source>Cynthia Matuszek, and Sean A Munson</source>
          .
          <year>2015</year>
          .
          <article-title>Unequal representation and gender stereotypes in image search results for occupations</article-title>
          .
          <source>In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems</source>
          , pages
          <fpage>3819</fpage>
          -
          <lpage>3828</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Katherine</given-names>
            <surname>McCurdy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Oguz</given-names>
            <surname>Serbetci</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .08864.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Safiya</given-names>
            <surname>Umoja Noble</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Algorithms of oppression: How search engines reinforce racism</article-title>
          . nyu Press.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Alexandra</given-names>
            <surname>Olteanu</surname>
          </string-name>
          , Fernando Diaz, and
          <string-name>
            <given-names>Gabriella</given-names>
            <surname>Kazai</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>When are search completion suggestions problematic? In Computer Supported Collaborative Work and Social Computing (CSCW)</article-title>
          . ACM,
          <string-name>
            <surname>August.</surname>
          </string-name>
          Pre-print, paper accepted to CSCW'
          <volume>20</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Flavien</given-names>
            <surname>Prost</surname>
          </string-name>
          , Nithum Thain, and
          <string-name>
            <given-names>Tolga</given-names>
            <surname>Bolukbasi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Debiasing embeddings for reduced gender bias in text classification</article-title>
          .
          <source>Proceedings of the First Workshop on Gender Bias in Natural Language Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Rocco</given-names>
            <surname>Tripodi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stefano Li</given-names>
            <surname>Pira</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Analysis of italian word embeddings</article-title>
          . In CLiC-it.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>