Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained
                   German and French Word Embeddings

                                          Mascha Kurpicz-Briki
                                     Bern University of Applied Sciences
                                          Biel/Bienne, Switzerland
                                       mascha.kurpicz@bfh.ch


                       Abstract                               about the gender has to be made. It is therefore
                                                              highly relevant to identify and mitigate gender bias
    Smart applications often rely on training                 in natural language processing (Sun et al., 2019).
    data in form of text. If there is a bias
    in that training data, the decision of the                   Word embeddings are applied in several types
    applications might not be fair. Common                    of applications and enhance the development of
    training data has been shown to be bi-                    machine learning and natural language processing.
    ased towards different groups of minori-                  However, they also amplify existing social stereo-
    ties. However, there is no generic algo-                  types in the human-generated training data.
    rithm to determine the fairness of training
                                                                 Different approaches to identify and mitigate
    data. One existing approach is to mea-
                                                              bias in word embeddings have been developed. A
    sure gender bias using word embeddings.
                                                              word embedding is a vectorial representation of
    Most research in this field has been ded-
                                                              a word (or phrase), trained on co-occurences in
    icated to the English language. In this
                                                              a text corpora. Each word w is represented as a
    work, we identified that there is a bias
                                                              d-dimensional word vector w    ~ ∈ Rd (Bolukbasi
    towards gender and origin in both Ger-
                                                              et al., 2016), where often d = 300 (Caliskan et al.,
    man and French word embeddings. In
                                                              2017). In such a vector space, words with similar
    particular, we found that real-world bias
                                                              meaning have vectors that are close (i.e. they have
    and stereotypes from the 18th century are
                                                              a small vector distance). It has been confirmed that
    still included in today’s word embeddings.
                                                              the vector distance can be used to represent the
    Furthermore, we show that the gender bias
                                                              relationship between two words (Mikolov et al.,
    in German has a different form from En-
                                                              2013c). Using this method, problems like the fol-
    glish and there is indication that bias has
                                                              lowing can be solved: man is to king as woman is
    cultural differences that need to be consid-
                                                              to x. With simple arithmetic on vectors this prob-
    ered when analyzing texts and word em-
                                                              lem can be solved by proposing x=queen (Boluk-
    beddings in different languages.
                                                              basi et al., 2016), because
1   Introduction                                                        −−→ − −  →≈−
                                                                              −−−−   −→
                                                                        man   woman king − −−−→
                                                                                           queen.
Bias is an important topic in machine learning ap-
plications, and in particular in natural language                Even if not perfectly equal to any vector in the
processing. For example, it can be easily shown in            vocabulary, the closest vector to the resultant will
automatic translation. As shown in Figure 1, when             often be the answer to the question (Hapke et al.,
translating ”She is an engineer. He is a nurse.” to           2019). This is useful for different types of appli-
Turkish and then back to English, we obtain ”He’s             cations, for example word embeddings are an im-
an engineer. She is a nurse.”. Due to the fact                portant source of evidence for document ranking
that in Turkish there is no difference between he             (Nalisnick et al., 2016) (Mitra et al., 2016). How-
and she, when translating back to English, a guess            ever, this relationship between words can also con-
                                                              tain problematic associations. Research demon-
Copyright c 2020 for this paper by its authors. Use permit-
ted under Creative Commons License Attribution 4.0 Interna-   strated that words like he or man are associated
tional (CC BY 4.0)                                            to jobs like programmer or doctor, whereas words
                               Figure 1: Example of bias in Google Translate.


like she or woman are associated to jobs like           2     Related Work
homemaker or nurse (Bolukbasi et al., 2016) (Lu
et al., 2018). For example, it has been shown           2.1    Word Embeddings
(Bolukbasi et al., 2016) that                           Unless a domain-specific word model is required,
                                                        pre-trained word vector representations are suf-
                −−→ − −
                man   −−−−→≈
                      woman
       −−−−−−−−−−−−−−−−→ −−−−−−−−→                      ficient, and are easily available online as open-
       computerprogrammer − homemaker.
                                                        source (Hapke et al., 2019). In the following para-
                                                        graphs we shortly describe the most common word
   Human bias in psychology is often measured           embedding training techniques:
using Implicit Association Test (IAT) (Greenwald
et al., 1998). The IAT measures differences in the      word2vec was first presented in 2013 (Mikolov
response time of the human subjects, when they          et al., 2013b) (Mikolov et al., 2013a) (Mikolov
are asked to pair two concepts. Whenever they           et al., 2013c). These word embeddings provided a
find these concepts similar, the response time is       surprising accuracy improvement on several NLP
shorter than when they find the concepts different.     tasks, and can be trained in two different ways
Based on these results, a corresponding measure         (Hapke et al., 2019): with the skip-gram approach
based on word embeddings instead of human sub-          using a word of interest as an input, or with the
jects has been developed, called Word Embedding         continuous bag-of-words approach using nearby
Association Test (WEAT) (Caliskan et al., 2017).        words as input.
The WEAT allows to demonstrate different types
of bias in word embeddings, replacing the reac-         GloVe provides another technology for generat-
tion time from IAT with word similarity (i.e. dis-      ing word embeddings (Pennington et al., 2014).
tance between word vectors). The method has             Whereas word2vec relies on a neural network with
been further developed and applied (e.g. (Karve         backpropagation, GloVe uses direct optimization.
et al., 2019) (May et al., 2019)), but mostly for       fastText provides an improvement to word2vec
the English language and gender bias. We ap-            (Bojanowski et al., 2017). Instead of predicting
ply this method to pre-trained word embeddings          the surrounding words, it predicts the surrounding
in German and French, and address the following         n-character grams. This results in the advantage
research questions:                                     to handle rare words much better than the original
  • Can known gender and origin bias found in           approach (Hapke et al., 2019). Pre-trained models
    pre-trained English word embeddings be con-         are available in 157 languages (Grave et al., 2018).
    firmed for German and French?
                                                        2.2    Bias Identification in Training Data
  • Can we identify different forms of gender           There is a concern that artificial intelligence
    bias in German word embeddings?                     and smart decision making will amplify cultural
  The paper will first discuss the related work and     stereotypes (Barocas and Selbst, 2016). Due to
provide more details about the used methods. We         historical unfairness, which is represented in the
will then describe the experimental setup. In the       training data, unfair decisions can be made in the
end, the results will be presented and discussed.       future. Research has shown that such bias can
be identified, for example by using bayesian net-       the Modified Word Embedding Association Test
works (Mancuhan and Clifton, 2014). Commonly            (MWEAT), which is then used to evaluate the bias
used datasets such as Wikipedia have been proven        in the Spanish language.
to be biased (Wagner et al., 2015) (Wagner et al.,         The WEAT was extended to measure bias in
2016). In particular, it was also shown how dialect     state-of-the-art sentence encoders (May et al.,
can lead to racial bias in common training data for     2019). The Sentence Encoder Association test
hate speech detection (Sap et al., 2019).               (SEAT) enters the words from the WEAT exper-
   Recent research concentrates on bias identifica-     iments into sentence templates such as ”This is
tion in word embeddings. The state-of-the-art will      a[n] <word>”. The results suggest that recent
be presented in the next subsection.                    sentence encoders exhibit less bias than previous
                                                        models, but future research to further clarify this
2.3   Bias Identification in Word Embeddings            is suggested. The research focusses on English
                                                        sentences only. As WEAT, SEAT can only detect
In the original WEAT paper (Caliskan et al.,
                                                        presence of bias, but not its absence.
2017), several different IAT results have been con-
                                                           Other research (Friedman et al., 2019) identifies
firmed on pre-trained GloVe and word2vec word
                                                        gender bias in word embeddings trained on Twitter
embeddings for the English language. Due to
                                                        data from 99 countries and 51 U.S. regions. The
their experiments on off-the-shelf machine learn-
                                                        results are then validated against statistical gender
ing components, they demonstrate that cultural
                                                        gaps in 18 international and 5 U.S. based statis-
stereotypes have already propagated to state-of-
                                                        tics. In this research only tweets in English were
the-art artificial intelligence applications. The
                                                        considered.
WEAT has become a common method to mea-
                                                           It has been explored (McCurdy and Serbetci,
sure bias in word embeddings, being used as a
                                                        2017) whether word embeddings in languages
metric when developing methods to reduce bias in
                                                        with grammatical gender show the same topical
word embeddings (Karve et al., 2019). The au-
                                                        semantic bias as in English. In particular, the
thors identified different biases, in particular the
                                                        authors show that for German there is a positive
following categories of gender bias: career vs.
                                                        differential association, but the WEAT shows re-
family activities, Maths vs. Arts and Science vs.
                                                        liable effects only for the evaluated natural gen-
Arts. Furthermore, they detected racial bias con-
                                                        der languages English and Dutch. The training
cerning African-Americans by comparing Euro-
                                                        data was prepared from the OpenSubtitles corpus
pean American and African American names.
                                                        (Lison and Tiedemann, 2016) with translations in
   Other research proposed a framework for tem-         German, Spanish, Dutch and English.
poral analysis of word embeddings and observed
bias changing over time and relating it to historical   3     Method
events (Garg et al., 2018). The approach helped to
quantify stereotypes and attitudes towards women        3.1   WEAT method
and ethnic minorities in the United States in the       The terminology of WEAT (Caliskan et al., 2017)
20th and 21st century.                                  is borrowed from the Implicit Association Test
   The WEAT has also been applied to word               (IAT) (Greenwald et al., 1998) from psychology.
embeddings that were trained for different spe-         The IAT measures a person’s subconscious as-
cific domains (Twitter,Wikipedia-based gender-          sociation between concepts and therefore gives a
balanced corpus GAP, PubMed and Google News)            measure for implicit bias. It is a computer-based
(Chaloner and Maldonado, 2019). The authors             measure, where users are asked to rapidly cate-
confirmed a statistically significant gender bias for   gorize two target concepts with an attribute. The
all experiments on the Google News corpus (and          IAT questions are based on combining possible an-
for some of the experiments on the other corpora).      swers to parallel non-biased questions, and there-
   It has been shown that current bias mitiga-          fore implicit stereotypes can be assessed. Easier
tion methods cannot directly be applied to lan-         pairing (i.e., shorter reaction time) is interpreted
guages with grammatical gender such as French           as as stronger association between the concepts.
or Spanish (Zhou et al., 2019). However, the au-           In the background, the experiment consists
thors show that different types of bias can still be    of two sets of target words, as for exam-
identified for those languages. They also present       ple (math, algebra, ...) and (art, poetry, ...).
Furthermore, two sets of attribute words are                 The effect size is computed as Cohen’s d (as for
defined, as for example (man, male, ...) and               the original IAT). The effect size d is computed as
(woman, f emale, ...)                                      (Caliskan et al., 2017)
   In WEAT, the distance between vectors corre-
sponds to the reaction time in IAT. As a measure
of distance between the vectors, the cosine simi-            meanx∈X s(x, A, B) − meany∈Y s(y, A, B)
larity between the vectors is used.                                 stddevw∈X∪Y s(w, A, B)
   The null hypothesis is that there is no difference                                               (3)
between the two sets of target words with regard
to relative similarity to the two sets of attribute        3.2     Experimental Setup
words. In other words, there is no bias between
                                                           This section describes the different experiments
the genders regarding the target word groups.
                                                           we executed in our implementation of the WEAT
   The WEAT test can be formalized as follows
                                                           and pre-trained word embeddings in different lan-
(Caliskan et al., 2017): X and Y are the two sets
                                                           guages.
of target words of equal size. A and B are the two
sets of attribute words. s(X, Y, A, B) is the test
                                                           3.2.1    Validation: WEAT experiments
statistics.
                                                           To validate our implementation, we executed se-
                                                           lected experiments in English (WEAT 5 for origin
                   X                    X
s(X, Y, A, B) =          s(x, A, B) −         s(y, A, B)   bias and WEAT 6-8 for gender bias) from the orig-
                   x∈X                  y∈Y                inal WEAT paper (Caliskan et al., 2017).
                                                     (1)      In a first experiment, European American and
  where                                                    African American names are used, along with
                                                           pleasant and unpleasant attributes (WEAT5-ori,
                                                           detailed setup in Table 1).
    s(w, A, B) =                                              We then defined the targets as male and female
                                    ~ ~b)
                ~ ~a) − meanb∈B cos(w,
    meana∈A cos(w,                                         names and the attributes as words regarding career
                                                           and family (WEAT6-ori, detailed setup in Table
   s(w, A, B) measures the association of w with           2).
the attribute. s(X, Y, A, B) measures the differen-           Another experiment considers words from
tial association of the two sets of target words with      maths and arts as targets, and female and male
the attribute. In the equation, cos(~a, ~b) defines the    terms as attributes. Table 4 shows the exact terms
cosine of the angle between the vectors ~a and ~b,         of the experiment. We first executed this experi-
which we use to measure the distance between the           ments in its original form (WEAT7-ori). We then
two vectors.                                               also executed it in a reduced form (words in italic
   In WEAT, a permutation test is used to measure          were skipped), in order to match what the German
the (un)likelihood of the null hypothesis, i.e. they       and French experiments explained in the next sec-
compute the probability that a random permuta-             tions (WEAT7-mod).
tion of the attribute words would produce the ob-             We then executed an experiment that considers
served (or greater) difference in sample means.            words from science and arts as targets, and male
   {(Xi , Yi )} denotes all the partitions of X ∪ Y        and female attributes. Table 4 shows the exact
into two sets of equal size. The one-sided p-value         terms of the experiment. We first executed this ex-
is then defined as (Caliskan et al., 2017):                periments in its original form (WEAT8-ori). We
                                                           then also executed it in a reduced form (words in
                                                           italic were skipped), in order to match what the
      P ri [s(Xi , Yi , A, B) > s(X, Y, A, B)]       (2)   German and French experiments explained in the
                                                           next sections (WEAT8-mod).
   In our implementation, instead of the full per-            WEAT 5-7 are based on an existing Implicit As-
mutation test we implemented a randomization               sociation Test (IAT) from literature (Nosek et al.,
test with 100’000 iterations, following (Chaloner          2002a), as well as WEAT 8 (Nosek et al., 2002b).
and Maldonado, 2019).
 Group          WEAT5-ori                              WEAT5-ger                              WEAT5-fr
                Brad, Brendan, Geoffrey, Greg,         Peter, Daniel, Hans, Thomas, An-       Jean, Daniel, Michel, Pierre,
                Brett, Jay, Matthew, Neil, Todd,       dreas, Martin, Markus, Michael,        David, Philippe, Nicolas, José,
 Group 1
                Allison, Anne, Carrie, Emily, Jill,    Maria, Anna, Ursula, Ruth,             Maria, Marie, Anne, Catherine,
                Laurie, Kristen, Meredith, Sarah       Monika, Elisabeth, Verena, Sandra      Nathalie, Ana, Isabelle, Christine
                Darnell, Hakim, Jermaine, Ka-          Ladina, Fatima, Fatma, Alma,           Ladina, Fatima, Fatma, Alma,
                reem, Jamal, Leroy, Rasheed,           Soraya, Svetlana, Elif, Vesna,         Soraya, Svetlana, Elif, Vesna,
 Group 2        Tremayne, Tyrone, Aisha, Ebony,        Mehmet, Mustafa, Aleksandar,           Mehmet, Mustafa, Aleksandar,
                Keisha, Kenya, Latonya, Lakisha,       Mohamed,       Ibrahim, Dragan,        Mohamed,         Ibrahim, Dragan,
                Latoya, Tamika, Tanisha                Hasan, Mohammad                        Hasan, Mohammad
                joy, love, peace, wonderful, plea-     Spass, Liebe, Frieden, wunderbar,      joie, amour, paix, magnifique,
 Pleasant
                sure, friend, laughter, happy          Freude, Lachen, glücklich             plaisir, ami, rire, enthousiaste
                                                                                              souffrance, terrible, horrible,
                agony, terrible, horrible, nasty,      Qual, furchtbar, schrecklich, übel,
 Unpleasant                                                                                   désagréable, mal, guerre, abom-
                evil, war, awful, failure              böse, Krieg, scheusslich, Versagen
                                                                                              inable, défaillance

Table 1: The terms from the original WEAT 5 experiment (Caliskan et al., 2017) and our adaptations/translations
to German and French.

 Group          WEAT6-ori                              WEAT6-ger1                             WEAT6-fr1
                John, Paul, Mike, Kevin, Steve,        Peter, Daniel, Hans, Thomas, An-       Jean, Daniel, Michel, Pierre,
 Male names
                Greg, Jeff, Bill                       dreas, Martin, Markus, Michael         David, Philippe, Nicolas, José
 Female         Amy, Joan, Lisa, Sarah, Diana,         Maria, Anna, Ursula, Ruth,             Maria, Marie, Anne, Catherine,
 names          Kate, Ann, Donna                       Monika, Elisabeth, Verena, Sandra      Nathalie, Ana, Isabelle, Christine
                executive, management, profes-         Führungskraft, Verwaltung, beru-      équipe,      gestion,   profession,
 Career         sional, corporation, salary, office,   flich, Konzern, Gehalt, Büro,         société, salaire, bureau, affaires,
                business, career                       Geschäft, Werdegang                   carrière
                home, parents, children, family,       Zuhause, Eltern, Kinder, Fami-
                                                                                              maison, parents, enfants, famille,
 Family         cousins, marriage, weddings, rela-     lie, Cousinen, Ehe, Hochzeit, Ver-
                                                                                              cousins, mariage, noces, proches
                tives                                  wandtschaft

Table 2: The terms from the original WEAT 6 experiment (Caliskan et al., 2017) and our adaptations/translations
to German and French for names in Switzerland.


3.2.2     Reproduction of WEAT 5-8 for German                   used names of different origins, based on the list
We translated and/or adapted the experiments to                 of the most common names in Switzerland men-
execute them on German pre-trained word embed-                  tioned before. The pleasant and unpleasant terms
dings as described in the next paragraphs.                      were translated to German. Table 1 shows the ex-
                                                                act terms of the experiment.
WEAT5-ger We reproduced the origin experi-
ment that connected names of specific origins to                WEAT6-ger1 and WEAT6-ger2 We repro-
pleasant or unpleasant words for German. We                     duced the gender experiment regarding career vs.
selected originally Swiss German names by us-                   family attributes for German. In a first experi-
ing the 8 most common names of the German                       ment (WEAT6-ger1), we used the 8 most com-
part of Switzerland for women and men respec-                   mon names of the German part of Switzerland for
tively1 . We then selected manually a list of com-              women and men respectively2 . In a second exper-
monly used names in Switzerland that are of dif-                iment (WEAT6-ger2), we used the most common
ferent origin from the same source. These names                 names of adults living in Germany3 . The career
were chosen as representatives of names of for-                 and family terms were translated to German. Ta-
eign origin. A German study has shown that the                  bles 2 and 3 show the exact terms used in the ex-
origin of the name has a major impact on the suc-               periments.
cess of job applications (Schneider et al., 2014).
                                                                WEAT7-ger and WEAT8-ger We reproduced
Instead of focussing on the percentage of different
                                                                the gender experiment regarding Math vs. Arts
minorities of the population, which is complicated
due to regional differences, we selected commonly                  2
                                                                     Bundesamt für Statistik - Vornamen der Bevölkerung
                                                                nach Jahrgang, Schweiz und Sprachgebiete, 2018
   1                                                               3
     Bundesamt für Statistik - Vornamen der Bevölkerung            https://www.beliebte-vornamen.de/49519-
nach Jahrgang, Schweiz und Sprachgebiete, 2018                  erwachsene.htm
 Group          WEAT6-ori                                WEAT6-ger2                            WEAT6-fr2
                John, Paul, Mike, Kevin, Steve,          Michael, Thomas, Andreas, Peter,      Jean, Pierre, Michel, André,
 Male names
                Greg, Jeff, Bill                         Stefan, Christian, Hans, Klaus        Philippe, René, Louis, Alain
                                                                                               Marie,        Jeanne,     Françoise,
 Female         Amy, Joan, Lisa, Sarah, Diana,           Sabine, Susanne, Petra, Monika,
                                                                                               Monique, Catherine, Nathalie,
 names          Kate, Ann, Donna                         Claudia, Birgit, Andrea, Stefanie
                                                                                               Isabelle, Jacqueline
                executive, management, profes-           Führungskraft, Verwaltung, beru-     équipe,      gestion,   profession,
 Career         sional, corporation, salary, office,     flich, Konzern, Gehalt, Büro,        société, salaire, bureau, affaires,
                business, career                         Geschäft, Werdegang                  carrière
                home, parents, children, family,         Zuhause, Eltern, Kinder, Fami-
                                                                                               maison, parents, enfants, famille,
 Family         cousins, marriage, weddings, rela-       lie, Cousinen, Ehe, Hochzeit, Ver-
                                                                                               cousins, mariage, noces, proches
                tives                                    wandtschaft

Table 3: The terms from the original WEAT 6 experiment (Caliskan et al., 2017) and our adaptations/translations
to German and French for names in Germany and France.

 Group          WEAT7-ori/mod                            WEAT7-ger                             WEAT7-fr
                math, algebra, geometry, calculus,       Mathematik, Algebra, Geometrie,       mathématiques,               algèbre,
 Math           equations, computation, numbers,         Calculus, Gleichungen, Berech-        géométrie, calcul, équations,
                addition                                 nung, Zahlen, Addition                calcul, nombres, addition
                poetry, art, dance, literature, novel,   Poesie, Kunst, Tanz, Literatur, Ro-   poésie, art, danse, littérature, ro-
 Arts
                symphony, drama, sculpture               man, Symphonie, Drama, Skulptur       man, symphonie, drame, sculpture
                male, man, boy, brother, he, him,        männlich, Mann, Junge, Bruder,       masculin, homme, copain, frère,
 Male terms
                his, son                                 Sohn                                  fils
 Female         female, woman, girl, sister, she,        weiblich,     Frau,      Mädchen,    féminine, femme, copine, soeur,
 terms          her, hers, daughter                      Schwester, Tochter                    fille

Table 4: The terms from the original WEAT 7 experiment (Caliskan et al., 2017) and our adaptations/translations
to German and French.


and Science vs. Arts for German. The pronouns in                  were translated to French. In the translation,
the attribute terms were skipped, because of con-                 words that have the same form for male and female
flicts with other terms. For example, sie can he                  (e.g. magnifique instead of merveilleux) were pre-
she, but also they; or sein could be his but also                 ferred, in order to provide consistency in the num-
refer to the verb to be. We considered NASA, Ein-                 ber of terms used in English and German. Table 1
stein and Shakespeare as internationally known                    shows the exact terms of the experiment.
and kept these words for the German experiments.
Tables 4 and 5 show the exact terms of the experi-                WEAT6-fr1 and WEAT6-fr2 We reproduced
ment.                                                             the gender experiment regarding career vs. family
                                                                  attributes for French. To translate the female and
3.2.3 Reproduction of WEAT 6-8 for French                         male names, in a first experiment (WEAT6-fr1),
We translated and/or adapted the experiments to                   we used the 8 most common names of the French
execute them on French pre-trained word embed-                    part of Switzerland for women and men5 . In a sec-
dings as described in the next paragraphs.                        ond experiment (WEAT6-fr2) we used the most
                                                                  common names in metropolitan France given be-
WEAT5-fr We reproduced the experiment that                        tween 1943 and 2019 6 . The word executive leads
connects names of specific origins to pleasant or                 to a French word with a male and a female form.
unpleasant words in French. We selected orig-                     It was therefore replaced by the business related
inally Swiss French names by using the 8 most                     word équipe. Tables 2 and 3 show the exact terms
common names of the French part of Switzerland                    of the experiments.
for women and men respectively4 . We then se-
lected manually a list of commonly used names                     WEAT7-fr and WEAT8-fr As in German,
in Switzerland that are of different origin from                  pronouns were skipped. Additionally, we re-
the same source (as described in the experiment                   placed girl/boy with copain/copine (in english:
WEAT5-ger). The pleasant and unpleasant terms                         5
                                                                       Bundesamt für Statistik - Vornamen der Bevölkerung
   4                                                              nach Jahrgang, Schweiz und Sprachgebiete, 2018
    Bundesamt für Statistik - Vornamen der Bevölkerung
                                                                     6
nach Jahrgang, Schweiz und Sprachgebiete, 2018                         https://tinyurl.com/tkgubf5
 Group         WEAT8-ori/mod                           WEAT8-ger                            WEAT8-fr
               science, technology,        physics,    Wissenschaft,       Technologie,     science, technologie, physique,
 Science       chemistry,    Einstein,      NASA,      Physik, Chemie, Einstein, NASA,      chimie,       Einstein,   NASA,
               experiment, astronomy                   Experiment, Astronomie               expérience, astronomie
                                                       Poesie, Kunst, Shakespeare, Tanz,    poésie, art, Shakespeare, danse,
               poetry, art, Shakespeare, dance, lit-
 Arts                                                  Literatur, Roman, Symphonie,         littérature, roman, symphonie,
               erature, novel, symphony, drama
                                                       Drama                                drame
               brother, father, uncle, grandfather,    Bruder, Vater, Onkel, Grossvater,
 Male terms                                                                                 frère, père, oncle, grand-père, fils
               son, he, his, him                       Sohn
 Female        sister, mother, aunt, grandmother,      Schwester, Mutter, Tante, Gross-     soeur, mère, tante, grande-mère,
 terms         daughter, she, hers, her                mutter, Tochter                      fille

Table 5: The terms from the original WEAT 8 experiment (Caliskan et al., 2017) and our adaptations/translations
to German and French.


boyfriend/girlfriend), because the French word                  GER-2 Studies have shown the perception of
fille can be both girl and daughter. For the gender-            the roles of men and women in the 18th cen-
specific adjectives we picked the male version                  tury based on dictionary entries from that time
for masculin and the female version for féminine,              (Hausen, 1981). Based on these results, a list
since we expect these words to appear more fre-                 of words describing men and women was de-
quently. We considered NASA, Einstein and                       duced7 . The list is separated in different cat-
Shakespeare as internationally known and kept                   egories describing the role of women and men
these words for the French experiments. Tables                  in the society: Bestimmung für (engl. intended
4 and 5 show the exact terms of the experiments.                for), Aktivität/Passivität (engl. activity/passivity),
                                                                Tun/Sein (engl. doing/being), and their charac-
3.2.4    Additional Gender Stereotypes in                       ters: Rationalität/Emotionalität (engl. rational-
         German Word Embeddings                                 ity/emotionality), Tugenden (engl. virtues). In
Based on real-world bias we defined the following               this study we focussed on the words indicating the
two additional experiments for German:                          characters of men and women to verify whether
                                                                these stereotypes are still reflected in today’s word
GER-1 Study choice in Switzerland is often a                    embeddings. We therefore selected the words
matter of gender. A report about equal opportuni-               from the category Rationalität/Emotionalität for
ties in Switzerland (Dubach et al., 2017) indicates             our experiment. The category Tugenden was
that at least four out of five students are female              skipped due to the different number of male and
in subjects such as special pedagogy, veterinary                female words. We therefore defined the experi-
medicine, ethnology, educational science and psy-               ment as shown in Table 7.
chology. On the other side, in technical studies
such as mechanical engineering or computer sci-                 3.3    Data Sets: Pre-trained Word
ence, only around 10-20% of the students are fe-                       Embeddings
male. In this experiment we examine whether this                The validation experiments in English were exe-
bias is reflected in the word embeddings. We se-                cuted on the same pre-trained word embeddings as
lected the five subjects with the highest percentage            in the original experiments (Caliskan et al., 2017):
of women in 2015 (Dubach et al., 2017) (special
pedagogy, veterinary medicine, ethnology, educa-                   • GloVe pre-trained word embeddings using
tional science, psychology). We then picked the                      the ”Common Crawl” corpus (300 dimen-
five subjects with the lowest percentage of women                    sions) with 840 billion tokens8
in 2015 (Dubach et al., 2017) (electrical engineer-
                                                                   • word2vec pre-trained word embeddings us-
ing, mechanical engineering, computer science,
                                                                     ing Google News (300 dimensions)9
microtechnology and physics). The same male
and female terms as for the WEAT7 experiment                       The German and the French experiments were
which considers the different interest of men and                  7
                                                                     https://de.wikipedia.org/wiki/Geschlechterrolle - Abbil-
women in arts and maths were used for this exper-               dung Polarisierung der Geschlechterrolle im 18. Jahrhundert
iment. We therefore defined target and attribute                   8
                                                                     https://nlp.stanford.edu/projects/glove/
                                                                   9
word sets as shown in Table 6.                                       https://code.google.com/archive/p/word2vec/
 Group          GER-1                                                        English
                Elektroingenieurwesen, Maschineningenieurwesen,              Electrical Engineering, Mechanical Engineering,
 Study
                Informatik, Mikrotechnik, Physik                             Computer Science, Microtechnology, Physics
                Sonderpädagogik, Veterinärmedizin, Ethnologie,             Special Pedagogy, Veterinary Medicine, Ethnology,
 Study
                Erziehungswissenschaften, Psychologie                        Educational Science, Psychology
 Male terms     männlich, Mann, Junge, Bruder, Sohn                         male, man, boy, brother, son
 Female
                weiblich, Frau, Mädchen, Schwester, Tochter                 female, woman, girl, sister, daughter
 terms

Table 6: Experiment GER-1 verifies if existing bias in study selection appears also in German word embeddings
(with English translations for better readability).

 Group          GER-2                                                        English
                                                                             Mind, Rationality, Realisation, Thinking, Knowing,
 Character      Geist, Vernunft, Verstand, Denken, Wissen, Urteilen
                                                                             Judging
                Gefühl, Empfinden, Empfänglichkeit, Rezeptivität,         Feeling, Sentiment, Receptiveness, Religiousness, Un-
 Character
                Religiosität, Verstehen                                     derstanding
 Male terms     männlich, Mann, Junge, Bruder, Sohn                         male, man, boy, brother, son
 Female
                weiblich, Frau, Mädchen, Schwester, Tochter                 female, woman, girl, sister, daughter
 terms

Table 7: Experiment GER-2 verifies if existing historical bias appears also in German word embeddings (with
English translations for better readability).


                               Effect      Bias                                                      Effect       Bias
  Experiment      p-value                                          Experiment           p-value
                               size d      detected?                                                 size d       detected?
  GloVe                                                            German
  WEAT5-ori       < 10−3       1.36        X                       WEAT5-ger            < 10−3       1.134        X
  WEAT6-ori       < 10−3       1.8         X                       WEAT6-ger1           < 10−3       1.62         X
  WEAT7-ori       0.058        0.94        (X)                     WEAT6-ger2           0.003        1.44         X
  WEAT8-ori       0.0097       1.24        X                       WEAT7-ger            0.65         0.23         ×
  WEAT7-mod       0.026        1.09        X                       WEAT8-ger            0.83         0.11         ×
  WEAT8-mod       0.01         1.2         X                       GER-1                < 10−3       1.74         X
  word2vec                                                         GER-2                0.002        1.43         X
  WEAT5-ori       0.02937      0.72        X                       French
  WEAT6-ori       < 10−3       1.88        X                       WEAT5-fr             < 10−3       1.29         X
  WEAT7-ori       0.039        0.99        X                       WEAT6-fr1            0.14         0.75         ×
  WEAT8-ori       0.008        1.24        X                       WEAT6-fr2            0.03         1.03         X
  WEAT7-mod       0.04         0.99        X                       WEAT7-fr             0.2          0.62         ×
  WEAT8-mod       0.008        1.24        X                       WEAT8-fr             0.53         0.32         ×

Table 8: Results of the validation: confirming the re-         Table 9: Results of the German and French experi-
sults of the original WEAT paper (Caliskan et al., 2017)       ments: translated and adapted WEAT experiments and
for the English language on the GloVe and word2vec             new defined experiments. We report p-values (p) and
dataset. We report p-values (p) and absolute value of          absolute value of effect size (d).
effect size (d).

                                                               4        Results
                                                               In this work, we consider a statistically signifi-
                                                               cant bias if the p-value is below 0.05, following
                                                               (Chaloner and Maldonado, 2019) and (Caliskan
                                                               et al., 2017).
executed using pre-trained fastText10 word em-
                                                                  We confirmed the bias detected by (Caliskan
beddings with 300 dimensions trained on Com-
                                                               et al., 2017) in the WEAT 5-8 experiments for
monCrawl and Wikipedia (Grave et al., 2018).
                                                               the English language (both GloVe and word2vec
Other word embeddings were considered, but they
                                                               datasets). Table 8 lists the detailed results.
had either less dimensions (e.g. (Kutuzov et al.,
                                                                  Whereas the original WEAT5 experiment con-
2017)) or missing words in the vocabulary which
                                                                   10
were relevant for our experiments.                                      https://fasttext.cc/docs/en/crawl-vectors.html
sidered European American and African Amer-                We identified a bias towards names from differ-
ican names, in our experiment common Swiss              ent origin. We can therefore confirm that stereo-
names (German and French speaking area respec-          types based on names present in our society, e.g.
tively) and common names in Switzerland of dif-         on the labour market (Schneider et al., 2014), are
ferent origin were considered. We were able to          also existing in word embeddings. We worked
measure statistically significant bias based on the     with a selection of names to get a first indication,
origin of the name, in relation to pleasant and un-     future work must further study the differences be-
pleasant words, for both German and French.             tween names encoded in word embeddings. Next
   In the WEAT6 experiments for German, we              to the origin, it has been shown that different prej-
were able to demonstrate that there is a statis-        udices such as age, the attractiveness and the intel-
tically significant gender bias for the categories      ligence of the person with the corresponding name
family and career, for the most common names            exist (Rudolph et al., 2007), or that teachers per-
from Germany and also Switzerland. For WEAT6            ceive students differently, based on their names
in French, we could not obtain statistically signifi-   (Kube, 2009). Our results indicate that there is po-
cant results for Switzerland. However, the WEAT         tential to further explore existing stereotypes and
method can only detect presence of bias, but not        prejudices in names also in word embeddings and
its absence. Therefore, future research is neces-       their implication in smart decision making.
sary to further investigate this topic. For the most       Our results on word embeddings suggest an im-
common names in France, a significant bias for the      pact on applications using machine learning or AI.
WEAT6 experiment was shown.                             Previous studies have raised the concern that such
   We could not obtain statistically significant        technologies may perpetuate cultural stereotypes
results for the word categories math vs. arts           (Barocas and Selbst, 2016) and it has been dis-
(WEAT7) and science vs. arts (WEAT8) for Ger-           cussed whether all implicit human biases are re-
man and French.                                         flected in the statistical properties of languages
   However, we identified two new sets of words in      (Caliskan et al., 2017). Therefore, whenever we
German for which we could identify a statistically      build a system that is capable of understanding or
significant bias. On one side, we confirmed that        producing natural languages (e.g. text generation,
there is a gender bias in the word categories for       machine translation), it risks to learn the stereo-
different subjects of study (GER-1). On the other       types and prejudices included in the language as
side, historical gender bias from the 18th century      well. Further research to precisely measure the
was found to be still present in today’s word em-       different types of bias in such language models
beddings (GER-2).                                       and mitigate the bias is therefore required. Future
   The detailed results for the German and French       work should also identify how the observed bias in
experiments are listed in Table 9.                      word embeddings can be related to the exact text
                                                        from which they originate.
5   Discussion
                                                        6   Conclusion
We confirmed existing results for gender and
origin bias in English word embeddings, and             Although we partially confirmed the existing gen-
examined selected word sets for German and              der and origin bias also in German and French
French word embeddings. Whereas we could par-           word embeddings, we showed in this research that
tially confirm the translated (and where necessary      known bias in pre-trained English word embed-
adapted) results of the English experiments for         dings comes in a different form in German. We
German and French, we identified new word sets          demonstrated that real-world bias and stereotypes
for bias in German word embeddings. The iden-           from the 18th century are still included in today’s
tified word sets indicate that specific regional or     word embeddings in German. Our results indicate
cultural stereotypes are included in word embed-        that there are cultural differences that need to be
dings and therefore the bias detection may vary         considered in future work.
among different languages. Future work needs to            The results were obtained from publicly avail-
further investigate the directions proposed in this     able pre-trained embeddings. Future work to iden-
paper and extend the word sets our work has iden-       tify and mitigate bias in word embeddings in dif-
tified.                                                 ferent languages is therefore highly relevant.
References                                                    History of the Family in Nineteenth-and Twentieth-
                                                              Century Germany, pages 51–83.
Solon Barocas and Andrew D Selbst. 2016. Big data’s
  disparate impact. Calif. L. Rev., 104:671.                Saket Karve, Lyle Ungar, and João Sedoc. 2019. Con-
                                                              ceptor debiasing of word representations evaluated
Piotr Bojanowski, Edouard Grave, Armand Joulin, and           on weat. arXiv preprint arXiv:1906.05993.
   Tomas Mikolov. 2017. Enriching word vectors with
   subword information. Transactions of the Associa-        Julia Isabell Kube. 2009. Vornamensforschung: Frage-
   tion for Computational Linguistics, 5:135–146.              bogenuntersuchung bei Lehrerinnen und Lehrern,
                                                               ob Vorurteile bezüglich spezifischer Vornamen
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou,                   von Grundschülern und davon abgeleitete er-
  Venkatesh Saligrama, and Adam T Kalai. 2016.                 wartete spezifische Persönlichkeitsmerkmale vor-
  Man is to computer programmer as woman is to                 liegen. Ph.D. thesis.
  homemaker? debiasing word embeddings. In Ad-
  vances in neural information processing systems,          Andrei Kutuzov, Murhaf Fares, Stephan Oepen, and
  pages 4349–4357.                                            Erik Velldal. 2017. Word vectors, reuse, and repli-
                                                              cability: Towards a community repository of large-
Aylin Caliskan, Joanna J Bryson, and Arvind                   text resources. In Proceedings of the 58th Confer-
  Narayanan. 2017. Semantics derived automatically            ence on Simulation and Modelling, pages 271–276.
  from language corpora contain human-like biases.            Linköping University Electronic Press.
  Science, 356(6334):183–186.
                                                            Pierre Lison and Jörg Tiedemann. 2016. Opensub-
Kaytlin Chaloner and Alfredo Maldonado. 2019. Mea-             titles2016: Extracting large parallel corpora from
  suring gender bias in word embeddings across do-             movie and tv subtitles.
  mains and discovering new gender bias word cat-
  egories. In Proceedings of the First Workshop on          Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Aman-
  Gender Bias in Natural Language Processing, pages           charla, and Anupam Datta. 2018. Gender bias in
  25–32.                                                      neural natural language processing. arXiv preprint
                                                              arXiv:1807.11714.
Philipp Dubach, Victor Legler, Mario Morger, and
  Heidi Stutz. 2017.      Frauen und Männer an             Koray Mancuhan and Chris Clifton. 2014. Combating
  Schweizer Hochschulen: Indikatoren zur Chancen-             discrimination using bayesian networks. Artificial
  gleichheit in Studium und wissenschaftlicher Lauf-          intelligence and law, 22(2):211–238.
  bahn. SBFI.
                                                            Chandler May, Alex Wang, Shikha Bordia, Samuel R
Scott Friedman, Sonja Schmer-Galunder, Anthony                Bowman, and Rachel Rudinger. 2019. On mea-
  Chen, and Jeffrey Rye. 2019. Relating word embed-           suring social biases in sentence encoders. arXiv
  ding gender biases to gender gaps: A cross-cultural         preprint arXiv:1903.10561.
  analysis. In Proceedings of the First Workshop on
  Gender Bias in Natural Language Processing, pages         Katherine McCurdy and Oguz Serbetci. 2017. Gram-
  18–24.                                                      matical gender associations outweigh topical gender
                                                              bias in crosslinguistic word embeddings. Proceed-
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and             ings of WiNLP.
  James Zou. 2018. Word embeddings quantify
  100 years of gender and ethnic stereotypes. Pro-          Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-
  ceedings of the National Academy of Sciences,               frey Dean. 2013a. Efficient estimation of word
  115(16):E3635–E3644.                                        representations in vector space. arXiv preprint
                                                              arXiv:1301.3781.
Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Ar-
  mand Joulin, and Tomas Mikolov. 2018. Learn-              Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
  ing word vectors for 157 languages. arXiv preprint          rado, and Jeff Dean. 2013b. Distributed representa-
  arXiv:1802.06893.                                           tions of words and phrases and their compositional-
                                                              ity. In Advances in neural information processing
Anthony G Greenwald, Debbie E McGhee, and Jor-                systems, pages 3111–3119.
  dan LK Schwartz. 1998. Measuring individual dif-
  ferences in implicit cognition: the implicit associa-     Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig.
  tion test. Journal of personality and social psychol-       2013c. Linguistic regularities in continuous space
  ogy, 74(6):1464.                                            word representations. In Proceedings of the 2013
                                                              conference of the north american chapter of the as-
Hannes Max Hapke, Hobson Lane, and Cole Howard.               sociation for computational linguistics: Human lan-
  2019. Natural language processing in action.                guage technologies, pages 746–751.

Karin Hausen. 1981. Family and role-division: the po-       Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and
  larisation of sexual stereotypes in the nineteenth cen-     Rich Caruana. 2016. A dual embedding space
  tury—an aspect of the dissociation of work and fam-         model for document ranking.      arXiv preprint
  ily life. The German Family: Essays on the Social           arXiv:1602.01137.
Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and
   Rich Caruana. 2016. Improving document ranking
   with dual word embeddings. In Proceedings of the
   25th International Conference Companion on World
   Wide Web, pages 83–84.
Brian A Nosek, Mahzarin R Banaji, and Anthony G
  Greenwald. 2002a. Harvesting implicit group at-
  titudes and beliefs from a demonstration web site.
  Group Dynamics: Theory, Research, and Practice,
  6(1):101.
Brian A Nosek, Mahzarin R Banaji, and Anthony G
  Greenwald. 2002b. Math= male, me= female, there-
  fore math6= me. Journal of personality and social
  psychology, 83(1):44.
Jeffrey Pennington, Richard Socher, and Christopher D
   Manning. 2014. Glove: Global vectors for word
   representation. In Proceedings of the 2014 confer-
   ence on empirical methods in natural language pro-
   cessing (EMNLP), pages 1532–1543.
Udo Rudolph, Robert Böhm, and Michaela Lummer.
  2007. Ein vorname sagt mehr als 1000 worte.
  Zeitschrift für Sozialpsychologie, 38(1):17–31.
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi,
 and Noah A Smith. 2019. The risk of racial bias in
 hate speech detection. In Proceedings of the 57th
 Annual Meeting of the Association for Computa-
 tional Linguistics, pages 1668–1678.
Jan Schneider, Ruta Yemane, and Martin Wein-
   mann. 2014. Diskriminierung am Ausbildungs-
   markt: Ausmaß, Ursachen und Handlungsperspek-
   tiven. Sachverständigenrat deutscher Stiftungen für
   Integration und Migration GmbH . . . .
Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang,
  Mai ElSherief, Jieyu Zhao, Diba Mirza, Eliza-
  beth Belding, Kai-Wei Chang, and William Yang
  Wang. 2019. Mitigating gender bias in natural lan-
  guage processing: Literature review. arXiv preprint
  arXiv:1906.08976.

Claudia Wagner, David Garcia, Mohsen Jadidi, and
  Markus Strohmaier. 2015. It’s a man’s wikipedia?
  assessing gender inequality in an online encyclope-
  dia. In Ninth international AAAI conference on web
  and social media.

Claudia Wagner, Eduardo Graells-Garrido, David Gar-
  cia, and Filippo Menczer. 2016. Women through
  the glass ceiling: gender asymmetries in wikipedia.
  EPJ Data Science, 5(1):5.

Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao
  Huang, Muhao Chen, Ryan Cotterell, and Kai-
  Wei Chang. 2019. Examining gender bias in lan-
  guages with grammatical gender. arXiv preprint
  arXiv:1909.02224.