<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Emerging Trends in Gender-Specific Occupational Titles in Italian Newspapers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierluigi Cassotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Iovine</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco De Gemmis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Department of Computer Science, University of Bari Aldo Moro</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The grammatical gender system can influence the way the semantic gender is perceived. Italian is a grammatical gender language, in which nouns are classified for gender. In this work, we investigate the usage of gender-specific forms of occupational titles in a diachronic corpus of 3 billion tokens extracted from two Italian newspapers. The hypothesis is that the usage of gender-specific forms might be inlfuenced by socio-cultural aspects, such as changes in the employment policy. We automatically collect a set of occupational titles and perform a diachronic analysis exploiting the frequency of gender-specific forms. Results show a correlation between changes in the usage of gender-specific forms and socio-cultural events.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Throughout history, the prerogative use of
specific gender forms over particular professions can
fade away by introducing changes in the language
lexicon (e.g., neologisms) or in the language
usage (e.g., word frequencies). The way the
lexicon is affected by those changes depends on the
grammatical gender system, i.e. the set of rules
that define the agreement between noun classes
forms and the other parts-of-speech.
Grammatical gender systems can vary dramatically from
one language to another. Gygax et al. (2019)
propose a classification of languages based on their
grammatical gender system. In this work, we
focus on the Italian language, a grammatical
gender language in which all nouns must be
classiifed for gender. The Italian gender system admits</p>
      <p>
        Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
three categories for nouns: gender-specific ending
nouns, mobile gender nouns, and nouns where the
gender is specified through determiners and
adjectives
        <xref ref-type="bibr" rid="ref12">(Marcato and Thu¨ne, 2002)</xref>
        . In
genderspecific ending nouns, the gender forms are
expressed through completely different lexical roots
(e.g., genero/nuora). In mobile gender nouns, the
specific gender forms share the same lexical root,
and the semantic gender is instead represented
by different suffixes (e.g., scrittore/scrittrice). In
other cases, the semantic gender of a noun is
inferred only by the determiner and/or adjective
(e.g., il giudice, la giudice). The peculiar
characteristic found in the Italian language has strong
repercussions in the way people refer to
occupational titles, because a specific gender form might
be preferred over the other due to historical
reasons, regardless of the gender of the actual
person being talked about
        <xref ref-type="bibr" rid="ref14">(Sabatini, 1985)</xref>
        . This has
become a hot-button issue in the last years,
especially as a result of the United Nations Resolution
“Transforming our world: the 2030 Agenda for
Sustainable Development” with its global
indicator framework for Sustainable Development Goals
(SDGs), and specifically of SDG 5 Achieve gender
equality and empower all women and girls
(subgoal 5.1 End all forms of discrimination against
all women and girls everywhere)
        <xref ref-type="bibr" rid="ref11">(Lee et al., 2016)</xref>
        .
      </p>
      <p>The objective of this paper is to monitor how
the use of gender-specific occupational titles has
changed in the Italian language over the years
through the use of diachronic analysis tools. We
would like to emphasize that the goal is not to
map the composition of men and women for each
profession over time, as this cannot be reliably
inferred from text. Instead, we are interested
in gauging the cultural relevance of the
genderspecific titles over time, as reflected in the news
domain. Accordingly, the contributions in this
paper can be summarized as follows:
(i) We analyze emerging trends in the use of
gender-specific occupational titles in the Italian
language in a corpus of newspaper articles.1
(ii) We perform a deep-dive analysis of the
figures that have guided a significant shift for two
professions in particular.</p>
      <p>
        Large diachronic corpora have already been
used to study social and cultural phenomenons that
affected language in a significant way. The Google
Ngrams Dataset
        <xref ref-type="bibr" rid="ref6">(Goldberg and Orwant, 2013)</xref>
        is a
dataset of n-grams extracted by 3.5 million books
published between 1520 and 2008. Aiden and
Michel (2011) exploit the huge quantity of
information contained in the Google Ngrams Dataset to
analyze the evolution of the language lexicon over
time. In particular, the work offers interesting
culturomics results, such as highlighting the spread of
the term influenza during historical pandemic
periods. Kutuzov et al. (2017) exploit diachronic word
embeddings to track wars and conflicts that took
place from 1994 to 2010 all around the world.
Diachronic word embeddings are trained on the
English Gigaword news corpus
        <xref ref-type="bibr" rid="ref13">(Parker et al., 2011)</xref>
        and used to predict conflict states: peace, war and
stable. Laine and Watson (2014) analyze the
linguistic sexism occurring in The Times newspaper
over vfie decades (1965-2005), relying on the
classification of linguistic sexism proposed in
        <xref ref-type="bibr" rid="ref8">(King,
1991)</xref>
        . The authors hypothesize that occupational
titles and agents would be more resistant to change
than other forms of sexism over the decades. They
confirm their hypothesis by exploring the
frequencies of male and female affixes, showing that they
keep stable. Burr (1995) performs an
empirical analysis on manually-annotated occurrences of
grammatical agents in a small synchronic corpus
of Italian newspapers. The outcomes of this work
lead the authors to conclude that women are
underrepresented in Italian newspapers, especially in
more high-position roles.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Corpus</title>
      <p>
        Occupational titles occurrences are extracted from
a diachronic corpus that comprises two
subcorpora. The former corpus is the “L’Unita`”
corpus
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2020)</xref>
        that covers the time
period 1945-2014. The latter is crawled by the
publicly available digital archive of the Italian
newspaper “La Stampa” covering the period 1945-2005
and processed using the same methodology
men1All data collected in this experiment is available here:
https://github.com/pierluigic/igsot
tioned in
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2020)</xref>
        . In order to align
the two sub-corpora time ranges, we consider a
sub-portion of the “L’Unita`” corpus that spans the
period 1948-2005. The overall corpus contains
3,529,820,155 tokens and spans the period
19482005. Corpus statistics are reported in Table 1.
The corpus presents two main critical issues. First,
despite having performed pre-processing and
filtering, the documents from the earlier periods
suffer from several OCR errors and noise. Second,
data is not equally distributed, the number of
tokens drops dramatically in the first years. Text is
processed using the UDPipe model
        <xref ref-type="bibr" rid="ref15">(Straka et al.,
2016)</xref>
        included in spaCy2. The UDPipe model is
trained on the Italian Stanford Dependency
Treebank
        <xref ref-type="bibr" rid="ref3">(Bosco et al., 2014)</xref>
        . Each sentence is
tokenized, lemmatized and annotated with
PoStags, named entity tags and dependency relations.
Moreover, the UDPipe model provides
information about inflectional features of nouns exploited
in the occupational titles extraction pipeline.
      </p>
      <sec id="sec-2-1">
        <title>Corpus L’Unita` La Stampa Overall</title>
      </sec>
      <sec id="sec-2-2">
        <title>Tokens 425,833,098 3,145,959,127 3,529,820,155</title>
      </sec>
      <sec id="sec-2-3">
        <title>Period 1948-2014 1948-2005 1948-2005</title>
        <p>
          The first step of our investigation consists of
extracting a list of occupational titles from a
common Knowledge Base. Specifically, we have
exploited Wikidata
          <xref ref-type="bibr" rid="ref10 ref16">(Vrandecˇic´ and Kro¨tzsch, 2014)</xref>
          ,
since it has collected a wide range of entities
related to professional activities. We first extracted
a list of all entities that are an instance of
profession (wd:Q28640), or of an entity that is a
subclass of it, for which a label in the Italian
language is present. This label commonly contains
the male gender form of the occupational title.
Then, we filtered the list of professions by only
including those that possess the female form of
label (wdt:P2521) property for the Italian language.
This property denotes the female variant of the
occupational title, where applicable. The next
step consists of filtering out occupational titles for
which the gender is not easily distinguishable from
text, such as those in which both gender variants
share the same lexical root (e.g. the
aforementioned il giudice/la giudice), or those that do not
feature gender variants at all (e.g. la guardia, i.e.
the guard). We also removed all occupational
titles that consist of two or more tokens. Then,
we reduced the list by filtering out polysemous
words. A common example of polysemy in the
Italian language occurs when an occupational
title shares the same lexical form as the discipline
to which it belongs, such as matematica (female
form of mathematician), or fisica (female form of
physicist). For each occupational title, we used
WordNet to find all synsets in which it appears
and then removed it if the synset is a hyponym
of the discipline.n.01 synset. Moreover, we
manually analyzed the list of remaining occupational
titles and removed other instances of polysemy,
which would otherwise hinder the quality of the
results. For instance, we filtered the word editrice
(female form of editor) as it can also appear in
the phrase casa editrice (i.e. publishing house),
and the word tecnica (female form of technician),
which can also refer to the word technique
depending on context. We also decided to remove words
that have additional figurative meanings, such as
cacciatrice (female form of hunter) and guerriera
(female form of warrior). This process was
undertaken by two independent annotators and then
checked for agreement. The final result of this
process is T , a set of tokens that unequivocally refer to
occupational titles, and that feature distinct male
and female gender variants which can reliably be
extracted from text.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <p>Once we have acquired the set of occupational
titles T , the next step of the analysis consisted of
measuring the frequency with which each term
w ∈ T occurs for each year in the corpus
described in Section 2. We also make use of the
lexical information contained in said corpus in
order to eliminate any remaining ambiguity in the
words. In fact, for each occupational title, we
counted a hit in the corpus if it appears with the
NOUN tag. This allows us to avoid counting
occupational titles that can be confused with verbs
or adjectives, such as impiegato/impiegata, which
can refer to the noun employee in Italian, but also
to the past participle conjugation of the verb to
employ.</p>
      <p>Moreover, we only counted a hit if the word has
ptwf
odds(w)t = log ptwm</p>
      <p>Operationally, odds(w)t specifies the
probability that the feminine variant will appear in a text
relative to the masculine form in the specified year
t. We then obtain the time-series by
concatenating the odds(w) values computed for each year:
(odds(w)1948, odds(w)1949, .., odds(w)2004).
Assuming a linear course of the time-series, three
different scenarios can occur: (i) the occurrences
of the female form are growing; (ii) the
occurrences of the male form are growing; (iii) the
ratio of the male and female form of an
occupational title are stable over time. We
combeen registered with the singular form. This is
done for two reasons: first, occurrences of the
plural form are outside the scope of this investigation,
because in Italian the male plural form is
traditionally used as the default, while the female variant of
the plural is only used in exceptional cases, such
as when referring to a group that is composed
entirely of women. Second, this strategy filters out
cases where the plural form shares the same
lexical root as one of the gender variants. An example
of this is the word infermiere (i.e. nurse), which
can refer to both the singular masculine form (as
in l’infermiere), or the plural feminine form (as in
le infermiere).</p>
      <p>Since the objective of this study is to observe
the trends in the use of masculine and feminine
forms for occupational titles, we are interested in
analyzing how their frequency changes from one
year to the other. However, measuring the
absolute frequency in each year for both forms would
be misleading, as it heavily depends on the amount
of data that is available for each year in the
corpus. Instead, we compute the smoothed relative
frequency ptw for each word w and each year t
using the following formula:
(1)
(2)
ptw =</p>
      <p>f wt + 1</p>
      <p>
        Ct+ | V t |
where f wt is the frequency of word w in the year
t, Ct is the count of tokens occurring in the
corpus the year t and |V t| is the vocabulary length
computed on the year t. We compute ptw for both
gender forms of each occupational title. Then we
compute odds(w)t which represents the log ratio
of the smoothed relative frequency of the female
and male forms respectively:
puted the regression line of the time-series,
using the linear least-squares regression method
provided by the SciPy library3. We use the slope
of the regression line to determine whether the
values of odds(w)t are changing over time. If
the slope is positive/negative, odds(w)t is
increasing/decreasing over time, which means that the
frequency of wf is increasing/decreasing faster
than that of wm, or that the frequency of wm is
decreasing/increasing faster than that of wf . For
each regression line, we also compute the
statistical significance of the slope parameter relying on
the Wald Test
        <xref ref-type="bibr" rid="ref5">(Fahrmeir et al., 2007)</xref>
        . Specifically,
the null hypothesis states that the slope parameter
of the regression line is zero. In this stage,
occupational titles for which we get a p − value &gt; 0.1
are filtered out.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Figure 1 describes the value of the slope for each
occupational title. Depending on the sign of the
slope, we can identify two distinct groups of
occupational titles. Green bars indicate that the slope
of odds(w)t is positive, i.e. the frequency of the
feminine form is increasing relative to that of the
masculine form. On the other hand, red bars
indicate that the slope is negative, thus the frequency
of the feminine form is decreasing relative to that
of the masculine form. Out of 35 occupational
titles, 22 have a positive slope, while 11 result
in a negative slope. In particular, the most
positive slope is the one associated to marciat-ore/-rice
(i.e. racewalker), while the most negative slope is
fotomodell-o/-a (i.e. fashion model).</p>
      <p>For many of these titles, the resulting slope can
be mapped to specific social changes. An
interesting example in this regard is infermiere (i.e.
3https://www.scipy.org/
nurse), to which a negative slope is recorded:
indeed, in Italy the position of nurse has been
opened to men starting from 19714. The odds(w)
time series of infermiera/infermiere is reported in
Figure 2.</p>
      <p>Moreover, results show that managerial roles
such as funzionaria (i.e. civil servant), ispettrice
(i.e. inspector), direttrice (i.e. director) are
associated to a positive slope, which is indicative of a
stronger perception of women in such roles.</p>
      <p>A similar push can be observed also in the
scientific domain, with a positive trend for the words
biologa (i.e. biologist), scienziata (i.e. scientist),
as well as the artistic one. On the other hand, we
observe an increase in the usage of the masculine
form for segretario (i.e. secretary), ballerino (i.e.
dancer), and stenografo (i.e. stenographer).</p>
      <p>In the second part of the experiment, we attempt
to identify the people that have driven the change
in the usage of the feminine and masculine forms
of an occupational title. To do this, we retrieve
4https://www.gazzettaufficiale.it/eli
/id/1971/04/03/071U0124/sg
(a) ballerino.
(b) poetessa.
the Named Entities (NEs) to which the
occupational titles refer for each year, and monitor their
frequency. In particular, we exploit the UDPipe
annotations to extract valid NEs, i.e. entities that
are directly connected to an occupational title via
a dependency relation.</p>
      <p>In Figure 3, we report the NEs extracted for
two particular occupational titles: ballerino (i.e.
male dancer) and poetessa (i.e. female poet). We
have chosen these titles because they feature the
largest number of occurrences of NEs in the
corpus. The data is presented in the form of stacked
line charts, which report the absolute frequency of
each NE so that the height of a coloured line
represents how many times a NE has been mentioned
within a specified period. The dotted black line
reports the overall smoothed relative frequency for
the occupational title. Both the absolute frequency
of NEs mentions and the overall smoothed relative
frequency are aggregated in bins of 5 years.</p>
      <p>Three male dancers are referenced over a wide
period due to their historical role in the field:
Rudolf Nureyev, Antonio Gades and Gene Kelly.
However, the last years have seen a rise in
popularity of new figures such as Raffaele Paganini,
Joaquin Cortes, Andre´ de La Roche and Roberto
Bolle.</p>
      <p>Occurrences of specific female poets in the
corpus keep low until the late ’70s. Ignoring a
spike in 1953-1957, probably due to the quality
issues in the data collected, the individual
absolute frequency of NE mentions seems to agree
with the overall smoothed relative frequency of
the noun poetessa. In the 1988-2002 period,
four figures overwhelm the scene: Joy Grisham,
Elena Carasso, Maria Luisa Spaziani and Alda
Merini. Even though the first work of Maria Luisa
Spaziani dates back to 1954, we observe a
significant rise in the occurrences in the early ’90s, when
she is nominated three times for the Nobel Prize
for Literature 5. The increase in NE mentions over
time is even more apparent in this case, however,
it follows a different trend compared to that of the
overall frequency of the noun poetessa, which
suggests that the word may have been used differently
in the earliest period.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper investigates the usage of
genderspecific forms of occupational titles in the Italian
language in a diachronic corpus of 3 billion
tokens extracted from two popular Italian
newspapers. Through this analysis, we show that there
are significant changes in the way newspaper
articles refer to the masculine and feminine form of an
occupational title and that they are consistent with
socio-cultural events, such as changes in the
employment policy. Moreover, we performed a more
ifne-grained analysis by extracting the most
influential figures that have guided this shift for two
occupational titles (male dancers and female poets).</p>
      <p>5https://en.wikipedia.org/wiki/Maria L
uisa Spaziani
As future work, we propose to continue work on
this field by increasing the size of the corpus and
by including sources other than news, such as
social media, job applications, and legal documents.
This can help reduce any form of linguistic bias
that may have been introduced by journalists and
increase the significance of the results. Moreover,
we will extend the list of occupational titles, as
well as group titles together based on category.
Finally, we propose to improve the process used to
extract named entities that are associated with
occupational titles in text.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research has been partially funded by ADISU
Puglia under the post-graduate programme
“Emotional city: a location-aware sentiment analysis
platform for mining citizen opinions and
monitoring the perception of quality of life”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Erez</given-names>
            <surname>Lieberman</surname>
          </string-name>
          Aiden and
          <string-name>
            <surname>Jean-Baptiste Michel</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Culturomics: Quantitative Analysis of Culture Using Millions of Digitized Books</article-title>
          .
          <source>In 6th Annual International Conference of the Alliance of Digital Humanities Organizations, DH, page 8</source>
          , Stanford, CA, USA, June. Stanford University Library.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Tommaso Caselli, Pierluigi Cassotti, and
          <string-name>
            <given-names>Rossella</given-names>
            <surname>Varvara</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>A Diachronic Italian Corpus based on ”L'Unita`”</article-title>
          . In Johanna Monti, Felice Dell'Orletta, and Fabio Tamburini, editors,
          <source>Proceedings of the Seventh Italian Conference on Computational Linguistics</source>
          , CLiC-it
          <year>2020</year>
          , volume
          <volume>2769</volume>
          <source>of CEUR Workshop Proceedings</source>
          , Bologna, Italy, 3. CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Felice Dell'Orletta, Simonetta Montemagni, Manuela Sanguinetti, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Simi</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The EVALITA 2014 dependency parsing task</article-title>
          .
          <source>The Evalita 2014 Dependency Parsing task</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>Burr</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Agentivi e sessi in un corpus di giornali italiani</article-title>
          . In Gianna Marcato, editor,
          <source>Atti del Convegno Internazionale di studi Dialettologia al femminile</source>
          , pages
          <fpage>349</fpage>
          -
          <lpage>365</lpage>
          , Padova, Italy, April. Cleup.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Ludwig</given-names>
            <surname>Fahrmeir</surname>
          </string-name>
          , Thomas Kneib, Stefan Lang, and
          <string-name>
            <given-names>Brian</given-names>
            <surname>Marx</surname>
          </string-name>
          .
          <year>2007</year>
          . Regression. Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jon</given-names>
            <surname>Orwant</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books</article-title>
          . Atlanta, Georgia, USA, page
          <volume>241</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Mark</surname>
          </string-name>
          <string-name>
            <surname>Gygax</surname>
          </string-name>
          , Daniel Elmiger, Sandrine Zufferey, Alan Garnham, Sabine Sczesny, Lisa von Stockhausen, Friederike Braun, and
          <string-name>
            <given-names>Jane</given-names>
            <surname>Oakhill</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Language Index of Grammatical Gender Dimensions to Study the Impact of Grammatical Gender on the Way We Perceive Women and Men</article-title>
          . Frontiers in Psychology,
          <volume>10</volume>
          :
          <fpage>1604</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Ruth</given-names>
            <surname>Elizabeth King</surname>
          </string-name>
          .
          <year>1991</year>
          .
          <article-title>Talking gender: A guide to nonsexist communication</article-title>
          .
          <source>Copp Clark Professional.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          , Erik Velldal, and
          <string-name>
            <given-names>Lilja</given-names>
            <surname>Øvrelid</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Tracing armed conflicts with diachronic word embedding models</article-title>
          .
          <source>In Tommaso Caselli</source>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Miller</surname>
          </string-name>
          , Marieke van Erp,
          <string-name>
            <surname>Piek Vossen</surname>
          </string-name>
          , Martha Palmer,
          <string-name>
            <surname>Eduard H. Hovy</surname>
          </string-name>
          , Teruko Mitamura, and David Caswell, editors,
          <source>Proceedings of the Events and Stories in the News Workshop@ACL</source>
          <year>2017</year>
          , pages
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          , Vancouver, Canada, August. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Tarutuulia</given-names>
            <surname>Laine</surname>
          </string-name>
          and
          <string-name>
            <given-names>Greg</given-names>
            <surname>Watson</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Linguistic sexism in The Times-A diachronic study</article-title>
          .
          <source>International Journal of English Linguistics</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Bandy X Lee</surname>
            ,
            <given-names>Finn</given-names>
          </string-name>
          <string-name>
            <surname>Kjaerulf</surname>
          </string-name>
          , Shannon Turner, Larry Cohen,
          <string-name>
            <surname>Peter D Donnelly</surname>
          </string-name>
          , Robert Muggah, Rachel Davis, Anna Realini, Berit Kieselbach, Lori Snyder MacGregor, et al.
          <year>2016</year>
          .
          <article-title>Transforming our world: implementing the 2030 agenda through sustainable development goal indicators</article-title>
          .
          <source>Journal of public health policy</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Gianna</given-names>
            <surname>Marcato</surname>
          </string-name>
          and
          <string-name>
            <surname>Eva-Maria Thu</surname>
          </string-name>
          ¨ne.
          <year>2002</year>
          .
          <article-title>Gender and female visibility in Italian. Gender across languages: The linguistic representation of women and men</article-title>
          ,
          <volume>2</volume>
          :
          <fpage>187</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Robert</given-names>
            <surname>Parker</surname>
          </string-name>
          , David Graff,
          <string-name>
            <given-names>Junbo</given-names>
            <surname>Kong</surname>
          </string-name>
          , Ke Chen, and
          <string-name>
            <given-names>Kazuaki</given-names>
            <surname>Maeda</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>English Gigaword fifth edition</article-title>
          ,
          <year>2011</year>
          . Linguistic Data Consortium, Philadelphia, PA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Alma</given-names>
            <surname>Sabatini</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Occupational titles in Italian: Changing the sexist usage</article-title>
          .
          <source>In Sprachwandel und feministische Sprachpolitik: Internationale Perspektiven</source>
          , pages
          <fpage>64</fpage>
          -
          <lpage>75</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Milan</given-names>
            <surname>Straka</surname>
          </string-name>
          , Jan Hajic, and Jana Strakova´.
          <year>2016</year>
          .
          <article-title>UDPipe: Trainable Pipeline for Processing CoNLLU Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing</article-title>
          . In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, He´le`ne Mazo, Asuncio´n Moreno, Jan Odijk, and Stelios Piperidis, editors,
          <source>Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC</source>
          <year>2016</year>
          ,
          <article-title>Portorozˇ</article-title>
          ,Slovenia, 5.
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Denny Vrandecˇic</surname>
          </string-name>
          ´ and Markus Kro¨tzsch.
          <year>2014</year>
          .
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>57</volume>
          (
          <issue>10</issue>
          ):
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . Publisher: ACM New York, NY, USA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>