<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fine-grained Sexism Detection in Italian Newspapers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federica Manzi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leon Weber-Genzel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Plank</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT University of Copenhagen Rued Langgaards Vej 7</institution>
          ,
          <addr-line>2300 Copenhagen</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ludwig-Maximilians-University Munich (LMU University)</institution>
          ,
          <addr-line>Geschwister-Scholl-Platz 1, 80539 Munich</addr-line>
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>2015</fpage>
      <lpage>2016</lpage>
      <abstract>
        <p>In recent years, tasks revolving around hate speech detection have experienced a growing interest in the field of Natural Language Processing. Two main trends stand out in the context of sexism recognition: the focus on overt forms of sexism such as misogyny on social media and tackling the problem as a text classification task. The main objective of this work is to introduce a new approach to tackle sexism recognition as a sequence labelling task, operating on the token level rather than the document level. To achieve this goal, we introduce (i) the FGSDI (Fine-Grained Sexism Detection in Italian) corpus, containing Italian newspaper articles annotated with fine-grained linguistic markers of sexism, and (ii) a two-step pipeline that sequentially performs sexism detection on the sentence level and sexism classification on the token one. Our primary ifndings include that (i) tackling the task of sexism recognition as a sequence labelling task is possible, however, a large amount of labelled data is needed; (ii) leveraging few-shot learning for sexism detection proves to be an efective solution in scenarios where only a limited amount of data is available; (iii) the proposed pipeline approach allows for better results compared to the baseline by doubling the overall precision and achieving a better F1-score.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Sexism recognition</kwd>
        <kwd>Token classification</kwd>
        <kwd>Hate-speech detection</kwd>
        <kwd>Transformers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>media platforms such as Twitter and Facebook.</p>
      <p>
        The main contributions of this paper are as follows. First,
According to the Sapir-Whorf hypothesis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], lan- we concentrate on specific linguistic markers of sexism
guage shapes the way we think and interact with the introducing more fine-grained classes than those
usuworld. It becomes therefore crucial to analyse our usage ally considered in the sexism detection and classification
of linguistic expressions to reveal the intricate dynamics tasks. Inspired by linguistic work by Alma Sabatini [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
of societal norms, power structures, and cultural values we propose a new annotation scheme and corpus for
embedded within our belief system. In this context, lan- fine-grained sexism detection, resulting in the FGSDI
guage can also become a vehicle for diferent forms of (Fine-Grained Sexism Detection in Italian) corpus of
Italbias and discrimination, including sexism. Sexism in ian newspaper articles with the annotation guidelines
language encompasses a variety of phenomena, ranging released in appendix A. Second, we address the
recogfrom more subtle ones, nested within the grammar and nition of linguistic markers of sexism as a token-level
semantics choices we make when talking about women, classification task, assigning a label to each token
accordto more overt instances of misogyny, characterized by ing to the fine-grained classes introduced before. This
aggressiveness and violence against individuals based on constitutes an innovation in that, to the best of our
knowltheir gender identity. edge, no other work—in Italian or other languages—has
In recent years, sexism and misogyny detection and clas- tackled this task at such a granularity.
sification have witnessed a growing interest in Natural In particular, we compare two diferent approaches. The
Language Processing (NLP), especially after the advent ifrst one, which we used as baseline, consists of
fineof transformers models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which unravelled new possi- tuning a RoBERTa [5] model on the token classification
bilities in nearly every NLP task. However, these eforts task using whole texts as input. The second, novel one is
have mainly focused on misogyny and hate speech in a two-step pipeline approach inspired by [6] which
pergeneral, tackled as text classification tasks on the docu- forms sexism detection and classification subsequently.
ment level, and specifically within the context of social The sexism detection task is tackled as binary
classification applied at the sentence level. Sentences classified
as potentially containing linguistic markers of sexism
will then undergo the second step of the pipeline, which
again involves classification on the token level. 1
      </p>
      <p>1Code available at:
https://github.com/fede-m/Fine-grained-sexismdetection-in-Italian-Newspapers</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Sexism in language</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. FGSDI - Corpus</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          The interest in the role of language in reflecting and per- We concentrated our analysis on newspaper articles,
petuating societal gender inequalities emerged during which represent an underexplored text type in automatic
the so-called second-wave feminism. In Italy, the main sexism recognition and provide the opportunity to
invescontributor was Alma Sabatini, whose works focused on tigate the presence of linguistic sexism in a more formal
analysing the language used in mass media and educa- context (and covert style) than social media. In particular,
tional publishing, identifying discriminatory patterns [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], we focused on articles from three Italian newspapers,
and suggesting alternative non-sexist forms [7]. namely La Repubblica, La Stampa, and Il Corriere della
This analysis is particularly relevant since Italian belongs Sera. We chose these newspapers based on their
popularto the class of grammatical gender languages, which as- ity in Italy,2 availability of articles online, and the broad
sign gender to every noun and decline articles, pronouns focus in the thematic areas they cover.
and adjectives accordingly [8]. Although having linguis- After exploring diferent datasets, we settled on Webz.io,
tic markers for gender does not make a language automat- which contains web-scraped articles from many diferent
ically sexist [8], it does make the language more suscepti- Italian newspapers, including the ones mentioned above.
ble to sexist phenomena [9] and it seems to exist a positive The articles are all from the October 2015 dump,3
availcorrelation between countries speaking grammatical gen- able in JSONL format, and include plenty of additional
der languages and lower levels of gender equality [10]. metadata for each article, such as news category, author,
We will use [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as the foundation of our research, enrich- and comments.
ing it with other relevant contributions ([11] [12] [13] Leveraging the metadata, we chose common news
cate[14]) to make the analysis more comprehensive, provide gories for all selected newspapers, following two main
insights on specific phenomena, and consider potential criteria. The first was the number of available articles
social changes that occurred in the last 20 years. and whether the category was present in all newspapers,
while the second concerned the coverage and presence
2.2. Automatic sexism recognition of women in those articles. The final selected categories
were Cronaca (News), Politics, and General News. We then
Automatically assessing the presence of sexism and hate selected 50 articles for each newspaper and category
comspeech in a text has multiple practical applications, from bination (or all the available ones in case they had less
helping reduce gender bias and promoting gender fair- than 50 articles) obtaining a final dataset of 469 articles.
ness in language to content moderation in social media.
        </p>
        <p>
          Most relevant works on this topic focus on sexism de- 3.2. Label Definition and Annotation
tection and categorization tackled as classification tasks,
which assess whether a sentence exhibits sexist content Since we decided to approach the problem of
recognizand which type of sexism it contains. As shown by [6], ing diferent linguistic markers of sexism as a sequence
the most significant shift in this field was the advent of labelling task, the first step was to define the labels to
transformers [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and the development of transfer learn- include in our analysis and annotate them.
ing techniques [15].
        </p>
        <p>
          Regarding the Italian language, the main focus so far has 3.2.1. Label Definition
been on misogyny and hate speech detection. In
particular, [16] and [17] studied misogyny and aggressiveness As a baseline, we referred to the work of Alma Sabatini
in Twitter posts. Although not leveraging classification [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] which provides a comprehensive list and analysis of
techniques, we want to highlight the works of [18] and linguistic markers of sexism in the Italian language.
How[19] since they focus on detecting single linguistic phe- ever, there is not a one-to-one correspondence between
nomena in text relevant to the scope of this work. our labels and Sabatini’s. According to the frequency and
Notably, two main trends stand out in the reviewed lit- non-ambiguity of a specific linguistic phenomenon in the
erature. The first is the focus on more explicit forms corpus, we decided for each label whether to keep it the
of sexism such as misogyny in the framework of social same, divide it into more fine-grained sub-categories
demedia. The second is tackling the detection and catego- scribing more specific phenomena, or combine multiple
rization of sexism as a text classification task focusing
on the document level instead of the token one. These
also represent the main diferences introduced in the
approach adopted by the current work.
2DMS Data is published by the ADS (Accertamenti Difusione
Stampa), a company based in Milan which publishes certified data
on the circulation of Italian newspapers. The mentioned data can be
found at the following link: https://www.adsnotizie.it/Dati/DMS_
Page
3https://webz.io/free-datasets/italian-news-articles/
phenomena together. This process resulted in the
following 14 final labels. We provide here a brief description
for each label and refer to appendix A for more detailed
annotation guidelines with examples.
1. Generic masculine: use of masculine as a
"neutral" form to refer to people of all genders. It is the
broadest class we considered in the analysis and
encompasses a variety of diferent phenomena. 3.2.2. Annotation
2. Usage of feminine for stereotypically female We annotated the articles using the doccano4 annotation
professions: sub-category of Generic Masculine tool, which provides an intuitive and easy-to-use
interto identify cases where the "rule" of generic mas- face for diferent annotation tasks.
culine was not applied for professions and roles In total, we annotated 469 newspaper articles, which we
stereotypically occupied by women. split into 5 folds to apply cross-validation and obtain
3. Masculine of professions: usage of the mascu- more robust evaluation results. Since each document
line form for professional titles (especially high- could contain multiple labels and in order to maintain
status ones) to refer to specific female referents. the labels’ distribution consistent across the folds, we
4. Usage of "-essa" sufix : sub-category of Mas- used group stratified k-fold, with k = 5 to keep the 20-80
culine of professions. The sufix is considered as ratio between test and training sets.
bearing a negative connotation when used to cre- In each article, we highlighted spans of text that
conate the feminine form of a profession (see [20], tained instances of linguistic markers of sexism
follow[21] and [22]). ing the annotation guidelines in appendix A. We allowed
5. Asymmetric usage of names, surnames, and the annotation of multiple and diferent labels in single
titles: cases where female referents are referred documents and single sentences within them. However,
to by their first name only. we decided not to allow overlapping spans to achieve
6. Feminine article before surname: sub- better and unambiguous results. For the label annotation,
category of Asymmetric usage of names, surnames, we used the BIO (B:begin, I:Inside, O:Outside) format.
and titles, it refers to the usage of the article la in The annotation process resulted in the label distribution
front of the surname of female referents. illustrated in appendix B. Notably, the distribution of
la7. Asymmetric usage of adjectives: adjectives bels across categories is not well-balanced, with labels
belonging to three semantic areas that perpet- Generic masculine, Masculine of professions, and
Asymuate the gender bias of seeing women as small, metric usage of names, surnames, and titles containing
silent, and uniquely identified through physical significantly more examples than the others. Conversely,
characteristics. Usage of feminine for stereotypically female professions
8. Asymmetric usage of substantive: substan- and Usage of "-essa" sufix only have one instance.
Theretives (usually belonging to areas such as sexual- fore, although included in the training, we do not report
ity, physical appearance, and marital status) for the classification results for these classes.
which only the feminine form exists, and sub- Furthermore, the data is particularly sparse on both an
stantives for which both forms exist but only the inter-document and intra-document level. The former is
feminine one bears a negative connotation. caused by the fact that almost half of the analysed
docu9. Asymmetric usage of verbs: verbs belonging ments did not contain any sexist marker at all. The latter
to semantic areas stereotypically associated with arises from the fact that, even in texts that did contain
women and asymmetries in the roles assumed by sexist markers, these markers constituted only a small
female and male actors in the usage of agency fraction of the overall tokens. Consequently, most tokens
verbs. in each text were irrelevant to the analysis.
10. Diminutives: co-occurrence of diminutives and As a last note, we want to stress that we relied on a single
female referents. annotator for the entire dataset (the first author), also to
11. Asymmetric usage of tropes and tone: test to what degree the task is doable at this fine-grained
metaphors, metonymy and synecdoche that re- level. This constraint, while having the positive result
inforce stereotypical representations of women. of providing a higher degree of consistency across the
For the tone, co-occurrence of the usage of scare annotation, did not ofer the benefit of having diversified
quotes and female referents. perspectives and interpretations.
12. Identification through man : instances where
women are presented as wife/sister/daughter of a
male referent. 4https://github.com/doccano/doccano
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Models and Approaches</title>
      <p>it hard for the model to focus on tokens associated with
the other labels relevant to our analysis.</p>
      <p>During hyperparameter tuning, better results were
achieved when training for 5 epochs with a learning rate
of 6e-5, and weight decay of 0.004. Training for more
epochs caused the model to overfit.</p>
      <p>After having defined the FGSDI corpus, we next evaluate
on how well we can automatically detect sexist
markers. In particular, we decided to tackle the problem as
a token classification task, comparing two approaches:
ifne-tuning a  [5] model and a two-step
pipeline inspired by [6]. For each approach, we exper- 4.2. Pipeline
imented with diferent models and settings. All
experiments were conducted on Google Colab using a T4 GPU. The second approach is a modular two-step pipeline
illustrated in Figure 1, which leverages both sequence and
4.1. Baseline token classification sequentially. The main diference
from the baseline was the introduction of a preliminary
The first approach, which we used as baseline, involved ifltering step, modelled as a binary sentence
classificaifne-tuning a   model on the token classi- tion task, whose goal was to reduce the total number of
ifcation task. We chose RoBERTa since it achieves better non-sexist tokens passed on to the second step which
results in many diferent tasks compared to other models performs token classification.
from the same BERT family [5]. In particular, we com- We changed our input to sentences instead of entire
docpared the performance of two models, namely XLM-R uments to fit the binary sentence classification task. In
[23] and Hugging Face’s   Italian.5 order to prevent an information loss deriving from
havWe used whole documents as input to maximize the con- ing less context for the model to make the prediction, we
text provided to the model to make the prediction. As modified the original text by applying coreference
resolupre-processing steps, we truncated and padded the texts tion. In particular, we first extracted all coreference heads
to fit the 512-token limit of the RoBERTa tokenizer. and respective clusters from the full text using
crosslinFor training, we used Cross Entropy with cost-sensitive gual coreference.6 Then, for each sentence, we looked at
learning techniques [24] to assign a higher penalty to the whether it contained a coreference and, if so, we added
model when it misclassified one of the minority classes the corresponding coreference head at the beginning of
(i.e. all the classes marking signs of sexism). The best the sentence in square brackets. Finally, we adjusted the
results were achieved by initializing the weight of the "O" labels by assigning label 1 to all sentences containing at
label to 0.05 and all the others to 2. This intervention was least one sexist marker, and 0 otherwise.
necessary since, due to the sparsity of the data, the ma- In the first step of the pipeline, we applied binary
senjority of tokens were classified with the "O" label, making tence classification to filter out sentences that did not
5https://huggingface.co/osiria/roberta-base-italian
6https://pypi.org/project/crosslingual-coreference/
contain markers of sexism (i.e. were assigned label 0 from
the model). In performing this task, we compared two
diferent transfer-learning methods, namely fine-tuning
and few-shot learning and selected the one producing
the best results.</p>
      <p>For the former approach, we employed the pre-trained
  model fine-tuned on Italian that we
used for the baseline, this time trained on the binary
classification task. The model was trained for 14 epochs
using a learning rate of 2e-5. No cost-sensitive learning
techniques were applied for this task since the labels
were more balanced compared to the token classification
setting.</p>
      <p>For few-shot learning, we employed the prompt-free
SetFit (Sentence Transformer Fine Tuning) framework [25]
which is composed of two steps. Firstly, it leverages
pretrained Sentence Transformer models [26] to generate
semantically meaningful embeddings for the provided
labelled examples. Then a classification head assigns a
class to the embeddings generated by the first step. After
experimenting with diferent models, we picked</p>
      <p>distilusebase-multilingual-cased-v17 as transformer and the
default logistic regression model for the predictions. As
additional parameters, we used 10 iterations i.e. number
of sentence pairs to generate for contrastive learning (see
[25] for more information), 1 epoch with batch size 16
and Cosine Similarity to calculate the distance between
embeddings in the learned vector space.</p>
      <p>Unlike LLMs and other few-shot learning methods [27]
[28], SetFit ofers the advantages of not relying on prompt
engineering and of providing outputs in the form of
vectors directly containing predictions that do not need
additional formatting. Moreover, using few-shot learning
allowed us to re-distribute the presence of the labels so
that each class was equally represented. In particular, for
label 1, we randomly sampled 30 sentences for each
phenomenon, whereas for label 0 we sampled 45 sentences.</p>
      <p>Finally, the sentences that were assigned label 1 from the
ifltering step were used to train the   for
Italian on the token classification task.</p>
      <sec id="sec-4-1">
        <title>4.3. Evaluation Methodology</title>
        <p>The metrics we considered for evaluation are precision,
recall, and F1-score. Given the unbalanced distribution
of labels in the dataset in favour of non-sexist tokens,
we excluded accuracy, since models could achieve high
accuracy by predicting the majority class for all tokens.
To assess the token classification results, we used the
seqeval [29] framework, which is specifically suited for
measuring models’ performance on sequence labelling
tasks providing both overall and per-class metrics. For
the pipeline, we additionally incorporated the results
7https://huggingface.co/sentence-transformers/distiluse-basemultilingual-cased-v1
Models Performance Metrics. For each metric mean  and
standard deviation  are reported.</p>
        <p>Model</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>and pipeline approaches. For the baseline, we consider
the   Italian, and for the pipeline, the
combination of SetFit for the binary classification and
  Italian for the token classification.</p>
      <p>The pipeline method almost doubled the value for
precision compared to the baseline, despite achieving a worse
recall. This result was expected since the goal of the
ifltering step was to reduce the number of non-sexist
sentences to pass on to the next step, therefore lowering the
overall recall, to achieve higher precision in the token
classification of the remaining sentences. This shows
the importance of the filtering step in reducing the
imbalance between majority and minority labels, allowing
the model to concentrate on more subtle relationships
between tokens.</p>
      <p>Another aspect to consider is that the baseline is applied
to whole documents, whereas the pipeline is based on
single sentences. Despite using coreference resolution,
this could prevent the model from considering certain
relationships between tokens that could help better
classify them.</p>
      <p>With a higher F1-score, the pipeline approach had overall
better results, although both approaches only reached
modest values. Nevertheless, this result was expected
due to the high imbalance of the dataset, the complexity
of the task, and the fact that most minority labels did
not have suficiently many examples for the model to
learn from. However, we hypothesize that increasing the
amount of relevant data could lead to a greater
performance gain.</p>
      <p>
        This hypothesis is backed by the error annotation we
conducted to acquire a more detailed overview of which
phenomena were better and which worse recognized by
the model. In the analysis, we focused on the results
of the pipeline only, since it achieved higher precision
showing, therefore, a more fine-grained understanding
of the labels at hand. Moreover, we only consider the
classes with a precision higher than 0.25 since the
remaining ones are characterized by too few instances for grained Sexism Detection in Italian) corpus for which we,
an in-depth examination. Overall, the best results were importantly, provided new in-depth annotation
guideachieved for labels Feminine article before surname and lines. They are based on foundational linguistic work by
Identification through man , followed by Masculine of pro- [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and can be applied to other text genres in the future.
fessions, Asymmetric usage of names, surnames, and titles, Second, diferently from previous research, we modelled
and Generic Masculine. the task of sexism classification as a sequence labelling
inSome common trends stand out from this error annota- stead of a text classification task. To achieve this goal, we
tion, which we performed manually for each of the labels compared two approaches, the baseline and the two-step
mentioned above. The first was that all these classes were pipeline, which allowed for a better overall performance
indeed characterized by a higher number of examples in on the task.
the corpus. In particular, Masculine of professions, Asym- Working on enriching the corpus with new articles
anmetric usage of names, surname, and titles, and Generic notated with relevant labels would be the biggest
contrimasculine were the labels with the highest amount of bution to bring this project forward. At the same time,
training instances. However, the results showed that this having multiple annotators could enhance insights on
was not the only crucial aspect to take into consideration. the annotations and lower the risk of bias and
subjecLabels Feminine article before surname and Identification tivity related to having a single annotator. Moreover,
through man, despite having only about half as many the modularity of the pipeline makes it open for further
instances as the aforementioned classes, were the ones experimentation, especially in scenarios where more
relfor which the best results were achieved, probably due evant data are available. One example could be using the
to the limited variability and high repetitiveness of the multi-class classification setting of SetFit, which was
exphenomena they encompassed. This second trend is also cluded from the final pipeline since it performed slightly
supported by the fact that Generic masculine, which was worse than the binary setting we ultimately used. Finally,
the most diverse class, was also the label obtaining the further improvements can be made to the use of
coreferworst results. ence resolution, which in many cases is not accurate in
Another noteworthy aspect that could be observed across recognizing occurrences of the same referent in text.
diferent classes was the tendency of the model to pick up
only certain aspects of a pattern, showing only a
superifcial understanding of the phenomenon analysed. For Acknowledgements
example, for the class Masculine of professions,
characterised by both the highest number of samples and a We would like to thank the reviewers for the feedback
certain repetitiveness, the model was able to correctly and encouraging words. This work is supported by the
link the label to the pattern of high-status jobs but failed MaiNLP research lab at LMU Munich.
completely to consider the gender dimension. Therefore,
it limited itself to classifying all instances of words such References
as minister or lawyer as members of this class, regardless
of the gender of the referent, which was the salient aspect
to consider. A similar behaviour was also noticed for the
labels Generic masculine and Identification through man .
      </p>
      <p>We refer to appendix C for more comprehensive per-label
results and error analysis.</p>
      <p>By looking at the discrepancies between annotations and
model predictions, we could not only shed light on which
specific phenomena within a class needed more
examples to improve results but also test the robustness of
the annotation. In some cases, legitimate doubts arose,
highlighting the dificulty of the task and the need for
additional annotators to increase the confidence level of
the annotation itself.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This work aimed to bridge a gap in the research area of
sexism detection and classification in Italian by the
following contributions. First, we proposed the FGSDI
(Fine[5] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, for Computational Linguistics: Human Language
Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Technologies, Volume 1 (Long and Short Papers),
Luke Zettlemoyer, Veselin Stoyanov, RoBERTa: A Association for Computational Linguistics,
Minrobustly optimized bert pretraining approach, arXiv neapolis, Minnesota, 2019, pp. 4171–4186. URL:
(2019). doi:10.48550/arXiv.1907.11692. https://aclanthology.org/N19-1000.
[6] Angel F. M. de Paula, Roberto F. da Silva, Detec- [16] Arianna Muti, Francesco Fernicola, Alberto
Barróntion and classification of sexism on social media Cedeño, Misogyny and aggressiveness tend to come
using multiple languages, transformers, and ensem- together and together we address them, in:
Proble models, in: Proceedings of the Iberian Lan- ceedings of the Thirteenth Language Resources
guages Evaluation Forum (IberLEF 2022), CEUR, and Evaluation Conference, European Language
Spain, 2022. Resources Association, Marseille, France, 2022, pp.
[7] Alma Sabatini, Raccomandazioni per un uso non 4142–4148. URL:
https://aclanthology.org/2022.lrecsessista della lingua italiana per la scuola e per 1.0.
l’editoria scolastica, Presidenza del consiglio dei [17] Samuel Fabrizi, fabsam @ AMI: A Convolutional
ministri-Direzione generale delle informazioni della Neural Network Approach, EVALITA Evaluation of
editoria e della proprietà letteraria artistica e scien- NLP and Speech Tools for Italian - December 17th,
tifica, Rome, 1986. . ed., Accademia University Press, 2020, pp. 35–39.
[8] Dagmar Stahlberg, Friederike Braun, Lisa Irmen, doi:10.4000/books.aaccademia.6782.</p>
      <p>Sabine Sczesny, Representation of the sexes in [18] Alessandra T. Cignarella, Mirko Lai, Andrea Marra,
language, Social Communication, 1st ed., A. W. Manuela Sanguinetti, ’La ministro é incinta’ : A
Kruglanski and J.P. Forgas, New York, 2007, pp. 163– Twitter account of women’s job titles in italian, in:
187. doi:10.4324/9780203837702. Proceedings of the Eighth Italian Conference on
[9] Irene Biemmi, Il sessismo nella lingua e nei libri Computational Linguistics CliC-it 2021, Torino:
Acdi testo: Una rassegna della letteratura pubbli- cademia University Press, Milan, Italy, 2022, pp. 85–
cata in italia In: Educazione sessista: Stereotipi di 91. doi:10.4000/books.aaccademia.10525.
genere nei libri delle elementari, Educazione Ses- [19] Pierluigi Cassotti, Andrea Iovine, Pierpaolo Basile,
sista. Stereotipi di genere nei libri delle elemen- Marco de Gemmis, Giovanni Semeraro, Emerging
tari, Rosenberg &amp; Sellier, Torino, 2017, pp. 19–60. trends in gender-specific occupational titles in
italdoi:10.4000/books.res.4696. ian newspapers, in: Proceedings of the Eighth
Ital[10] Jennifer Prewitt-Freilino, T. Andrew Caswell, Emmi ian Conference on Computational Linguistics
CliCLaakso, The gendering of language: A comparison it 2021, Torino: Accademia University Press,
Miof gender equality in countries with gendered, nat- lan, Italy, 2022, pp. 369–374. doi:10.4000/books.
ural gender, and genderless languages, Sex Roles aaccademia.10907.
66 (2012) 268—-281. doi:10.1007/s11199-011- [20] Elisa Merkel, Anne Maass, Laura Frommelt,
Shield0083-5. ing women against status loss. the masculine form
[11] Gianna Marcato, Eva-Maria Thüne, Italian. Gender and its alternatives in the italian language,
Jourand female visibility in Italian, Gender Across Lan- nal of Language and Social Psychology 31 (2012)
guages, Torino, 2002, pp. 187–217. doi:10.1075/ 311–320. doi:10.1177/0261927X12446599.
impact.10.14mar. [21] Anna Lepschy, Giulio Lepschy, Helena Sanson,
Lin[12] Cecilia Robustelli, Linee guida per l’uso del genere gua italiana e femminile, Quaderns d’Italià 6 (2001)
nel linguaggio amministrativo, Progetto Accademia 9–18. doi:10.5565/rev/qdi.51.
della Crusca e Comune di Firenze Comune di [22] Elisabeth Burr, Agentivi e sessi in un corpus di
Firenze, Firenze, 2012. giornali italiani., in: Dialettologia al femminile.
[13] Federica Formato, Linguistic markers of sexism Atti del Convegno Internazionale di Studi, Padova:
in the italian media: A case study of ministra and CLUEB, Sappada/Plodn (Belluno), 1995, pp. 349–
ministro, Corpora 11 (2016) 371–399. doi:10.3366/ 365.</p>
      <p>cor.2016.0100. [23] Alexis Conneau, Kartikay Khandelwal, Naman
[14] Fabiana Fusco, Stereotipo e genere : il punto di vista Goyal, Vishrav Chaudhary, Guillaume Wenzek,
della lessicografia, Linguistica 49 (2009) 205–225. Francisco Guzmán, Edouard Grave, Myle Ott, Luke
doi:10.4312/linguistica.49.1.205-225. Zettlemoyer, Veselin Stoyanov, Unsupervised
cross[15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, lingual representation learning at scale, in:
ProceedKristina Toutanova, BERT: Pre-training of deep ings of the 58th Annual Meeting of the
Associabidirectional transformers for language understand- tion for Computational Linguistics, Association for
ing, in: Proceedings of the 2019 Conference of Computational Linguistics, Online, 2020, pp. 8440–
the North American Chapter of the Association 8451. doi:10.18653/v1/2020.acl-main.747.
[24] Victoria López, Alberto Fernández, Salvador Gar- Language (2016) 17.</p>
      <p>cía, Vasile Palade, Francisco Herrera, An insight [34] Gilda Sensales, Alessandra Areni, Alessandra Dal
into classification with imbalanced data: Empirical Secco, Linguistic sexism in the news coverage of
results and current trends on using data intrinsic women ministers from four italian governments:
characteristics, Information Sciences 250 (2013) An analysis from a social-psychological
perspec113–141. doi:10.1016/j.ins.2013.07.007. tive, Journal of Language and Social Psychology 35
[25] Lewis Tunstall, Nils Reimers, Unso E. S. Jo, Luke (2016) 1–9. doi:10.1177/0261927X16629787.</p>
      <p>Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg, [35] Daniel Jurafsky, Universal tendencies in the
seEficient few-shot learning without prompts, arXiv mantics of the diminutive, volume 72, De Gruyter
(2022). doi:10.48550/arXiv.2209.11055. Mouton, Berlin, Boston, 1996, pp. 533–578. doi:10.
[26] Nils Reimers, Iryna Gurevych, Sentence-BERT: Sen- 2307/416278.</p>
      <p>tence embeddings using siamese BERT-networks, [36] Robin Lakof, Language and woman’s place,
volin: Proceedings of the 2019 Conference on Empiri- ume 2, Cambridge University Press, 1973, pp.
45—cal Methods in Natural Language Processing, Asso- 79. doi:10.1017/S0047404500000051.
ciation for Computational Linguistics, Hong Kong, [37] Federica Formato, ‘Ci sono troie in giro in
ParlaChina, 2019, pp. 3982—-3992. doi:10.18653/v1/ mento che farebbero di tutto’: Italian female
politiD19-1410. cians seen through a sexual lens, Gender and
Lan[27] Tom Brown, Benjamin Mann, Nick Ryder, Melanie guage 11 (2016) 389–414.</p>
      <p>Subbiah, Jared D. Kaplan, Prafulla Dhariwal, [38] Istituto della Enciclopedia Italiana fondata da
GioArvind Neelakantan, Pranav Shyam, Girish Sastry, vanni Treccani, 2012.</p>
      <p>Amanda Askell, Sandhini Agarwal, Ariel Herbert- [39] Caitlin Hines, Let me call you sweetheart: The
Voss, Gretchen Krueger, Tom Henighan, Rewon WOMAN AS DESSERT metaphor, in: Cultural
perChild, Aditya Ramesh, Daniel Ziegler, Jefrey Wu, formances, Proceedings of the Third Women and
Clemens Winter, Chris Hesse, Mark Chen, Eric Language Conference, April 8-10, 1994, Berkeley
Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Women and Language Group, University of
CaliforJack Clark, Christopher Berner, Sam McCandlish, nia, Berkeley, California, 1994, pp. 295–303.
Alec Radford, Ilya Sutskever, Dario Amodei, Lan- [40] Caitlin Hines, What’s So Easy about Pie?: The
Lexguage models are few-shot learners, arXiv (2020). icalization of a Metaphor, CSLI Publications,
Standoi:10.48550/arXiv.2005.14165. ford, California, 1996, pp. 189–200.
[28] Haokun Liu , Derek Tam, Mohammed Muqeeth, Jay [41] Caitlin Hines, She-wolves, tigresses, and
morphoseMohta, Tenghao Huang, Mohit Bansal, Colin Rafel, mantics, in: Gender and Belief Systems.
ProceedFew-shot parameter-eficient fine-tuning is better ings of the Fourth Berkeley Women and Language
and cheaper than in-context learning, arXiv (2022). Conference, April 19-21, 1996, Berkeley Women and
doi:10.48550/arXiv.2205.05638. Language Group, University of California, Berkeley,
[29] Hiroki Nakayama, seqeval: A python framework California, 1996, pp. 303–311.
for sequence labeling evaluation, 2018. URL: https: [42] Caitlin Hines, Foxy chicks and Playboy bunnies:
//github.com/chakki-works/seqeval. A case study in metaphorical lexicalization,
Ben[30] Jeanette Silveira, Generic masculine words and jamins, Amsterdam, 1999, pp. 9–23. doi:10.1075/
thinking, Women’s Studies International Quarterly cilt.152.04hin.
3 (1980) 165–178. doi:10.1016/S0148-0685(80) [43] Caitlin Hines, Rebaking the Pie: The ‘WOMAN
92113-2. AS DESSERT’ Metaphor, Oxford University Press,
[31] Peter V. Hegarty, Sandra Mollin, Rob Foels, Bi- New-York and Oxford, 1999, pp. 145–162. doi:10.
nomial word order and social status, in: Ad- 1093/oso/9780195126297.001.0001.
vances in intergroup communication, Peter Lang [44] Sara Mills, Feminist Stylistics, Routledge, 1995.
Publishing, 2016, pp. 119—-135. URL: https://api. doi:10.4324/9780203408735.</p>
      <p>semanticscholar.org/CorpusID:151450125. [45] Daniel Gutzmann, Erik Stei, How quotation marks
[32] Marlis Hellinger, Hadumod Bußmann, Gender what people do with words, Journal of
PragmatAcross Languages - The linguistic representation of ics 43 (2011) 2650–2663. doi:10.1016/j.pragma.
women and men., volume 1, John Benjamins Pub- 2011.03.010.</p>
      <p>lishing Company, 2001. doi:10.1075/impact.9.
[33] Gilda Sensales, Alessandra Areni, Alessandra Dal</p>
      <p>Secco, Italian political communication and gender A. Annotation Guidelines
bias: Press representations of men/women
presidents of the houses of parliament (1979, 1994, and We present the annotation guidelines where for each
2013), International Journal of Society, Culture &amp; label we provide a general description of the phenomena
falling within that label, relative examples, and, where
deemed necessary, an explanation of the specific example.</p>
      <p>Additionally, we provide a translation into English made
by us of the Italian examples.</p>
      <sec id="sec-6-1">
        <title>A.1. Generic Masculine</title>
        <p>This phenomenon encompasses the usage of the
masculine form of substantives as "neutral" to address people of
all genders. For this class, the label we used corresponds
to the same level of granularity as the one proposed by
Sabatini. The only diference is that we decided to follow
[12] and exclude the generic masculine used to refer to
indefinite groups or individuals. For example, we decided
not to include the following cases:</p>
        <sec id="sec-6-1-1">
          <title>Italian: [...] la mobilitazione dei giornalisti italiani</title>
          <p>contro il ddl recentemente approvato alla Camera
[..] 8
English: [...] the mobilisation of Italian journalists
against the recently approved bill [...]
Italian: Ma cosa prevede la legge e quali sono le tappe
in caso di dimissioni di un sindaco?9
English: So, what does the law stipulate and what are
the steps to follow in case a mayor resigns?
Explanation: In both examples, using techniques such
as splitting [11] (dei giornalisti e delle giornaliste
italiane and un sindaco o una sindaca) might hurt
the readability of the article, especially if this
technique is employed in all cases featuring this type
of generic masculine, which is the most common
and frequent one.</p>
          <p>It follows a list of phenomena the annotator should
include in the category Generic Masculine. For each
specific phenomenon, we provide examples, an eventual
explanation of the example and motivation for
considering it in the analysis.
a) Usage of words uomo/uomini (man/men) with
generic meaning, instead of using more inclusive
words such as esseri umani (human beings) or
persone (people).</p>
          <p>Examples:
8http://www.repubblica.it/cultura/2015/10/08/news/appello_
contro_la_nuova_legge_bavaglio_primo_firmatario_rodota_124630230/?rss
9http://roma.repubblica.it/cronaca/2015/10/08/news/dimissioni_
del_sindaco_ecco_l_iter_che_ne_consegue_secondo_la_legge124599210/?rss
Italian: [...] e dei barconi utilizzati dagli scafisti
e dai mercanti di uomini. 10
English: [...] and the boats used by men
smugglers and traders.</p>
          <p>Italian: Insieme avevano deciso di tenere in
piedi il sindaco fino alla fine del Giubileo
per votare nel 2017, con una sostanziale
sovrapposizione del partito e dei suoi
uomini nella gestione del Campidoglio. 11
English: Together, they decided to hold up the
mayor until the end of the Jubilee to vote
in 2017 with a substantial overlap of the
party and its men in the management of
the Capitol.</p>
          <p>Italian: Sono un anarchico io, sono per il libero
pensiero però come diceva Lucrezio metto
l’uomo al centro della natura, noi siamo
ifgli del De Rerum Natura. 12
English: I am an anarchist, I stand for free
thinking but as Lucrezio used to say, I put
the man at the centre of nature, we are
children of the De Rerum Natura.</p>
          <p>Italian: Mi avrebbe fatto piacere se avesse
parlato [Silvio Berlusconi], ma ha scelto di non
intervenire in attesa di un risarcimento, il
pronunciamento della Corte europea dei
diritti dell’uomo. 13
English: I would have liked him [Silvio
Berlusconi] to talk, but he decided not
to speak pending compensation, the
pronouncement of the European Court of</p>
          <p>Men Rights.</p>
          <p>Italian: [...] hanno firmato la delega afidata agli
uomini del nucleo di polizia giudiziaria
[...]. 14
English: [...] they signed the proxy entrusted to
the men of the judicial police [...].
10https://www.repubblica.it/politica/2015/10/11/news/i_
protagonisti_sono_tre_obama_putin_e_francesco-124804296/
11https://www.repubblica.it/politica/2015/10/09/news/renzi_ha_
gia_deciso_niente_primarie_il_nome_lo_scelgo_io_-124662736/
12https://firenze.repubblica.it/cronaca/2015/10/26/news/cecchini_
gli_allarmismi_oramai_sono_di_moda_-125935450/
13http://www.corriere.it/politica/15_ottobre_22/no-stop-twittergasparri-il-selfie-orban-stimo-8e46ae4e-78f6-11e5-95d8a1e2a86e0e17.shtml
14http://roma.corriere.it/notizie/cronaca/15_ottobre_12/rischi-ilgiubileo-roma-piedi-oltre-duemila-anni-305f7aa4-70bd-11e5a92c-8007bcdc6c35.shtml#post-0
The use of the generic "man" contributes to
making women invisible and it reinforces the idea of
women as someone who deviates from the norm.
Diferently from the generic masculine used to
refer to indefinite groups or individuals and as
underlined by Sabatini, there are good alternatives
that can be used to avoid the word "man" in this
generic meaning. Moreover, [30] also argues that
it is more dificult for a woman to feel included
in the concept of "man" or "he".
b) Usage of plural masculine with names where at least
one of the names is masculine, even if there are
more females than males in the group.</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Examples:</title>
          <p>Italian: In questi giorni infatti diversi attori,
artisti e cantanti come Christiane
Filangeri, Claudia Zanella, Claudio
Corinaldesi, Daniela Poggi, Elena Santarelli,
Fabio Troiano, Filippo Timi, Francesca
Inaudi, Giulia Bevilacqua, Jasmine Trinca,
Libero De Rienzo, Lillo Petrolo, Lorenza
Indovina, Lorenzo Lavia, Luca Argentero,
Lucia Ocone, Ludovico Fremont, Maria
Rosaria Omaggio, Maya Sansa, Michele
Riondino, Sonia Bergamasco, Susanna
Tamaro, Valentina Lodovini, Vinicio
Marchioni, Remo Girone [...]. 15
English: In the past days, a large number of
actors, artists and singers such as
Christiane Filangeri, Claudia Zanella,
Claudio Corinaldesi, Daniela Poggi, Elena
Santarelli, Fabio Troiano, Filippo Timi,
Francesca Inaudi, Giulia Bevilacqua,
Jasmine Trinca, Libero De Rienzo, Lillo
Petrolo, Lorenza Indovina, Lorenzo Lavia,
Luca Argentero, Lucia Ocone, Ludovico
Fremont, Maria Rosaria Omaggio, Maya
Sansa, Michele Riondino, Sonia
Bergamasco, Susanna Tamaro, Valentina Lodovini,</p>
          <p>Vinicio Marchioni, Remo Girone [...].</p>
          <p>Explanation: In the rather long list of names,
we can notice that the majority of names
are female (14 women and 11 men).
Nevertheless, the substantives attori, artisti, and
cantanti are only declined in the masculine
form. Also, we can see that a long list of
names is provided, so the space required to
add the words attrici (actress) and artiste
15https://roma.repubblica.it/cronaca/2015/09/30/news/_no_alla_
privatizzazione_dei_canili_comunali_di_roma_il_presidio_dei_
lavoratori_all_ex_cinodromo-123987648/
(artists feminine) would have had a
minimal impact on the readability of the article.</p>
          <p>Italian: [...] per permettere il soccorso dei due
feriti (due donne alla guida delle
utilitarie: non sarebbero gravi).16
English: [...] to allow the two injured (two
women driving an economy car, none of
them seem to be in danger) to be rescued.</p>
          <p>Explanation: This last example shows how
even in circumstances where the definite
referents are all females, the generic
masculine is still employed. Note how the
author had to add a parenthesis to specify
that the two injured were both women,
highlighting how the generic masculine
alone was not enough to correctly include
them.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>Motivation for class</title>
          <p>In this case, we are not referring to an indefinite
group of people but to a definite one, in which
both women and men are present. Therefore, the
specification made at the beginning about leaving
out of the analysis the generic masculine when
referred to an indefinite group does not hold.
c) Usage of the male form for word pairs where female
and masculine have diferent lexical roots:
fratello (brother), padre (father), fratellanza
(brotherhood).</p>
        </sec>
        <sec id="sec-6-1-4">
          <title>Examples:</title>
          <p>Italian: Io, che nel tempo vengo da lontano
quando usava il buon costume, la
fratellanza e la gente viveva felice senza tante
pretese [...]. 17
English: I come from a past time when good
manners were used, there was
brotherhood and people lived happily without
many pretensions [...].</p>
          <p>Explanation: The word fratellanza
(brotherhood) comes from the word fratello
(brother). The symmetric feminine would
be sorellanza (sisterhood). Note that in</p>
          <p>Italian there is no word like the English
16http://www.lastampa.it/2015/10/20/edizioni/cuneo/incidente-infrazione-s-benigno-due-feriti-GbEo1nAQWh0SQFVYZwGSxM/
pagina.html
17http://www.corriere.it/cronache/15_ottobre_15/compie-99-annichiede-eutanasia-decido-io-quando-ora-morire-0c8ca15c-730a11e5-8fc1-d31255f25c65.shtml
siblings or the German Geschwister to
indicate the generic brother and sister
relationship.</p>
          <p>Italian: [...] Ma il Papa che c’entra? «È venuto
un anno fa a Redipuglia, ha fatto un gran
discorso sull’amore fraterno, la comunista
si è infatuata e ha montato la tendopoli
davanti alla scuola [...]. 18
English: [...] What does the Pope have to do
with this? «He came last year to Redipuglia
and gave a great speech about fraternal
love, the communist got a crush and put
together the tent city in front of the school
[...].</p>
          <p>Explanation: Like the word fratellanza
(brotherhood), also the word fraterno (fraternal)
comes from fratello (brother). Note that
in this case, no symmetric equivalent of
fraterno is commonly used in Italian,
although some proposals such as sorerno or
sorellesco have been made.19
Interestingly, the pair materno (maternal)
paterno (paternal), similar in meaning and
relationship to each other, do preserve the
symmetry. Also, as noted by [11] in the
case of mother and father, there is a
tendency to explicitly include both genders in
complex expressions, a practice that seems
to be instead considered a "stretch" in
basically any other situation. A possible
interpretation could be that the realm of
motherhood and mother is the only space and
role in society which is considered
suitable for women and in which women are
at least as important as men.</p>
        </sec>
        <sec id="sec-6-1-5">
          <title>Motivation for class</title>
          <p>All the words included in this sub-category
belong to the class of nouns in Italian in which
gender is expressed by using diferent lexical roots
rather than adding sufixes. As for the generic
"man", also these words tend to hide women’s
presence and make them invisible.
d) Masculine precedence in male/female oppositional
couples.
18http://www.corriere.it/cronache/15_ottobre_11/goriziamigranti-quel-bivacco-parco-caduti-be33ec74-6fe7-11e5a08a-e76f18e62e8d.shtml#post-0
19https://accademiadellacrusca.it/it/consulenza/concorrenti-alfemminile-di-fraterno-scendono-in-gara-sororale-sororiosorellevole-e-sorellesco/10082</p>
        </sec>
        <sec id="sec-6-1-6">
          <title>Examples:</title>
          <p>Italian: E ricordare che "la pari dignità fra uomo
e donna [...] all’insegna della sola
differenza che tenta di allontanare le identità
uomo-donna". 20
English: And remember that "the equal dignity
between men and women [...] under the
sign of the only diference that tries to
keep man-woman identities apart".</p>
          <p>Italian: [...] italiani come noi vogliono una
buona legge sui diritti civili ma non
vogliono che si tolga il diritto ad un
bambino di avere un papà ed una mamma.</p>
          <p>21
English: [...] Italians like us want a good civil
right law but don’t want to take away
the right of children to have a dad and a
mum.</p>
          <p>Italian: "Sono convinto che la maggioranza
degli italiani ritenga che la famiglia
naturale sia quella formata da un uomo e
una donna." 22
English: "I believe that the majority of Italians
considers a natural family the one of a
man and a woman."
Italian: Il progetto presentato dalla società
prevede spazi dedicati alla vendita di
abbigliamento maschile e femminile e
accessori[...]. 23
English: The project presented by the company
includes spaces dedicated to the sale of
men and women clothing.</p>
          <p>Italian: Arrivano in piazza del Campidoglio in
piccoli gruppetti, marito e moglie, tre
amiche [...]24
20http://www.repubblica.it/vaticano/2015/10/09/news/sinodo_
emendamenti_italiani_contro_il_gender_e_per_la_famiglia_
uomo-donna-124706948
21http://www.lastampa.it/2015/10/14/italia/politica/unioni-civili-ilsenato-boccia-lo-stop-2JF3S0Cf01foHuw9kzCecM/pagina.html
22https://www.repubblica.it/cronaca/2015/10/01/news/_il_rifiuto_
della_diversita_dietro_queste_mistificazioni_-124088213/
23https://milano.repubblica.it/cronaca/2015/09/30/news/milano_
hugo_boss_galleria-124024653/
24http://www.lastampa.it/2015/10/26/italia/politica/lapiazza-spontanea-di-marino-adesso-imbarazza-il-pdvxCVzqw4AWLnQM4LlgSb3I/pagina.html
English: They arrive in Campidoglio square in
small group, husbands and wives, three
friends [...]
The underlying idea for this class is that word
order can be used as a syntactic means to express
existing hierarchies in society. We refer to [31]
for an in-depth overview of this phenomenon.
e) Usage of donne (women) to indicate a separate
category (as if they would not be a part of the
other mentioned categories).</p>
        </sec>
        <sec id="sec-6-1-7">
          <title>Examples:</title>
          <p>Italian: Arrestati cinque cittadini marocchini e
due italiani, tra cui una donna, per rapina
aggravata in concorso. 25
English: Five Moroccan and two Italian citizens,
one of which a woman were arrested
for aggravated robbery in complicity.</p>
          <p>Italian: Gorizia è città di frontiera, siamo
abituati ad accogliere, quando scoppiò la
guerra in Jugoslavia arrivarono 17 mila
profughi; ma c’erano anche donne e
bambini. 26
English: Gorizia is a border city, we are used
to hosting, when the war in Yugoslavia
broke out, 17 thousand refugees came; but
at the time there were also women and
children.</p>
          <p>Italian: Tre giovani, di 17, 22 e 23 anni, e una
ragazza di 22 anni [...]. 27
English: Three young people aged 17, 22 and 23,
and a 22-year-old girl [...].</p>
        </sec>
        <sec id="sec-6-1-8">
          <title>Motivation for class</title>
          <p>The fact that women are appointed as a separate
category where a group of individuals are
mentioned has two main efects. On the one hand,
this validates the fact that the generic masculine
25https://milano.repubblica.it/cronaca/2015/10/17/news/milano_
rapine_sui_treni-125266311/
26http://www.corriere.it/cronache/15_ottobre_11/goriziamigranti-quel-bivacco-parco-caduti-be33ec74-6fe7-11e5a08a-e76f18e62e8d.shtml#post-0
27http://www.corriere.it/cronache/15_ottobre_07/catania-scontromoto-quattro-giovani-muoiono-carbonizzati-3e3ade90-6d3711e5-8dcf-ce34181ab04a.shtml#post-0
is not really neutral, as pointed out by the first
and last examples. On the other hand, women
are perceived as a whole homogenous category,
as if their gender would already attribute certain
characteristics to them.
f) Use of masculine forms for specific female subjects
(also for personifications).
Italian: Dopo 20 anni di gestione animalista e no
profit, vincitore è risultata una impresa
barese, proprietaria di un mega canile da
1200 posti a Bari assai fatiscente e gestore
di stabulari per animali da laboratorio per
l’università di Bari. 28
English: Thanks to 20 years of animal welfare
and non-profit management, the winner
was a company from Bari, owner of a big,
very run-down 1200-seat dog shelter in
Bari, which manages a facility for lab
animals for the University of Bari.</p>
          <p>Explanation: Here we can see that the subject
of the sentence is una impresa (a company),
which is grammatically feminine. Despite
that, both vincitore (winner) and gestore
(manager, supervisor) are declinated in
the masculine form. It is interesting that
proprietaria (owner) is instead correctly
feminine.</p>
          <p>Italian: Questo vuole Putin, che sa tuttavia di
dover stipulare un accordo con gli Usa e
con Obama in particolare perché chi tra
un anno gli succederà non è detto che
conceda alla Russia il ruolo di comprimario
che Obama, pur cercando di limitarlo, è
comunque disposto a riconoscergli. 29
English: This is what Putin wants, however,
he knows that he will have to make a
deal with the US and Obama in
particular because whoever will succeed him
might grant Russia the supporting role that
Obama, although trying to limit it, is still
willing to recognise it.</p>
          <p>Explanation: Russia is feminine in Italian,
nevertheless both the word comprimario and
the clitic gli are in the masculine form. As
28https://roma.repubblica.it/cronaca/2015/09/30/news/_no_alla_
privatizzazione_dei_canili_comunali_di_roma_il_presidio_dei_
lavoratori_all_ex_cinodromo-123987648/
29https://www.repubblica.it/politica/2015/10/11/news/i_
protagonisti_sono_tre_obama_putin_e_francesco-124804296/
noted in [11], clitic gli often replaces the
feminine le even in contexts where the
referent is clearly feminine.</p>
          <p>Italian: Il ministro Maria Elena Boschi [...]. 31
English: Minister Maria Elena Boschi [...].
This is in line with the tendency of the generic
masculine already observed before. Also, in all
these cases the choice could additionally be biased
by the fact that the roles expressed by these terms
are usually associated with men.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>A.2. Usage of feminine for stereotypically female professions</title>
        <p>We added this label as sub-category of Generic masculine
to identify cases where the "rule" of generic masculine
was not applied if the profession or role indicated by
the substantive was stereotypically occupied by women.
This phenomenon is related to the concept of social
gender described by [32], which refers to the tendency
to use female pronouns or nouns when referring to
professions which are lower status and usually occupied
by women and male ones in all other cases. We found
however only one example for this class.</p>
      </sec>
      <sec id="sec-6-3">
        <title>A.3. Masculine of professions</title>
        <p>This class corresponds to Sabatini’s label Asymmetries
Italian: [...] racconta il suo avvocato Erika Galati [...].</p>
        <p>32
English: [...] says her lawyer Erika Galati [...].
Italian: [...] candidare una donna premier?33</p>
        <sec id="sec-6-3-1">
          <title>English: [...] nominate a woman prime minister?</title>
          <p>Italian: Henriette Reker, la candidata che sabato è stata
vittima di un’aggressione xenofoba per il suo
impegno a favore dei migranti, è stata eletta sindaco
di Colonia. 34
English: Henriette Reker, the candidate who was the
victim of xenophobic aggression on Saturday due
to her commitment to immigrants, was elected
mayor of Cologne.</p>
          <p>Italian: [...] candidata sindaco di Colonia alle elezioni</p>
          <p>in programma domani [...] 35
English: [...] mayor candidate of Cologne in
tomor</p>
          <p>row’s elections [...]
Italian: Per questo, Salvini dopo l’endorsement al
leader di Fratelli d’Italia Giorgia Meloni come
possibile candidato sindaco del centrodestra a</p>
          <p>Roma [...]. 36
English: For this reason, Salvini, after the endorsement
of Giorgia Meloni, leader of Fratelli d’Italia, as a
possible centre-right mayor candidate in Rome
[...].
in the usage of agentives and analyses two phenomena.
31http://www.corriere.it/politica/15_ottobre_13/senato-riformaOne of them is the usage of the masculine form of
traguardo-opposizioni-non-voteranno-ae4eb4fc-716c-11e5professional titles (especially for high-status ones) to b015-f1d3b8f071aa.shtml
refer to specific female referents. The other is the use of
32http://www.corriere.it/cronache/15_ottobre_08/funeralidonna (woman) as a modifier attached to the masculine
1ca1tet5o-l8icaie-cla-3-m6da7d8rfe2-dfca6t0im4.ash-jtimhal#dpisotsat--i0taliana-7bdc0eb6-6de3form of the profession. Sabatini also included in this
33http://www.lastampa.it/2015/10/20/italia/politica/la-grandecategory the creation of the agentive forms through the
tentazione-di-casaleggio-in-campo-direttamente-lui-oppuresufix -essa, for which we decided to create a separate una-donna-O6NTs3Vtws2OYuRIPoIl8J/pagina.html
class.
34http://www.corriere.it/esteri/15_ottobre_18/colonia-candidatavittima-aggressione-stata-eletta-sindaco-cfb286da-75bf-11e5a6b0-84415fd3d85.shtml
Examples:
35http://www.lastampa.it/2015/10/17/esteri/agguato-acolonia-ferita-a-coltellate-candidata-sindaco-indipendente30http://www.repubblica.it/scuola/2015/10/20/news/l_ora_di_ sDXbWLi9YLLuwPujF7TQpO/pagina.html
religione_in_aule_semivuote_ma_e_vietato_unire_le_classi_- 36https://milano.repubblica.it/cronaca/2015/10/09/news/salvini_
125463096 maroni-124724471/
Explanation: Although having used the neutral form A.5. Asymmetric usage of names,
leader to refer to the politician Meloni, the author surnames, and titles
of the article still endorses the masculine form by
using the male preposition al (to) instead of the Sabatini includes in this class instances where female
correct female one, alla. The same can be noted referents are only referred to by their first name,
also for the compound candidato sindaco, where asymmetries in the usage of the word signora (which
both nouns are declined in the masculine forms, translates to both lady and Mrs), and the usage of the
although they refer to a woman. This compound feminine article before surnames. We primarly focused
occurred often in the corpus, also in the form on the first phenomenon i.e. the asymmetry of the
candidata sindaco (first noun in the feminine and usage of the first-name-only to refer to women, and the
second in the masculine form) but never in the latter, to which, given the high frequency with which it
whole feminine form candidata sindaca. occurred, we dedicated a separate class.</p>
          <p>One can argue that this last form might sound In [33], the authors note how using first-names-only has
incorrect, but it is asymmetric with respect to a trivializing and degrading function since first names
similar constructions such as candidata maestra are commonly used to refer either to children, people
vs candidata maestro (teacher candidate), where belonging to the personal sphere, or those deemed
probably the first option would sound more ap- occupying an inferior position in the social hierarchy
propriate than the second one, although both are scale. Additionally, while appearing in the news provides
rare in usage. visibility, this is ofset by the impossibility of obtaining
more information about the referenced people, as it is
not feasible to search for somebody only by their first
A.4. Usage of "-essa" sufix name (e.g. in a search engine). In general, we noted the
co-occurrence of this phenomenon almost exclusively
with female referents.</p>
          <p>Among the diferent sufixes that the Italian language
uses to derive the feminine form from the masculine
one, the -essa sufix seems to be consistently considered
in the literature as bearing a negative connotation
(see [20], [21] and [22]). This is also evident from the
fact that there exist alternative forms for nearly all
substantives that make use of this sufix. In this regard,
we must make a distinction between words which
are nowadays commonly used in Italian and which
have therefore lost the negative connotation, such as
professoressa (professor), and more recent neologisms
such as avvocatessa (lawyer) for which using the form
avvocata is to be preferred.</p>
          <p>Explanation: [20] analyses the perception of people
towards diferent professional titles used to refer
to women. At the time of the analysis, avvocata
(which is the grammatical feminine derivation of
avvocato) was still considered to be agrammatical.</p>
          <p>Nevertheless, participants in the study attributed
a higher degree of competence to female referents
designated with this title, than with the more
spread avvocatessa.
37http://www.corriere.it/esteri/15_ottobre_15/oscar-pistoriusandra-domiciliari-partire-20-ottobre-dbda2808-7337-11e5-b97329d2e1846622.shtml</p>
        </sec>
        <sec id="sec-6-3-2">
          <title>Examples:</title>
          <p>Italian: La vita (social) di una moderna eremita Rachel
Denton, 52 anni, è una carmelitana cattolica. [...]
Ma a diferenza degli eremiti del passato, Rachel
non vive in una grotta [...] Rachel ha comunque
deciso di continuare a vivere in solitudine. 38
English: The social media life of a modern hermit.</p>
          <p>Rachel Denton, 52 years old, is a Carmelite
catholic. [...] However, diferently from the
hermits of the past, Rachel does not live in a cave
[...] Rachel still decided to keep living in solitude.</p>
        </sec>
        <sec id="sec-6-3-3">
          <title>Italian: Si chiamano Miriam, Liliya, Marsica, Fiona</title>
          <p>o Sonya ma indosseranno il reggiseno «Elena»,
o quello «Sofia», il modello «Gioia» oppure
«Francesca». 39</p>
        </sec>
        <sec id="sec-6-3-4">
          <title>English: Their names are Miriam, Liliya, Marsica,</title>
          <p>Fiona or Sonya, but they will wear the «Elena»
or «Sofia» bra, or the «Gioia» or «Francesca»
model.
38http://www.corriere.it/foto-gallery/esteri/15_ottobre_12/vitasocial-una-moderna-eremita-ba59d8c2-70da-11e5-a92c8007bcdc6c35.shtml#post-0
39http://www.corriere.it/moda/news/15_ottobre_12/miriambarbara-marsica-modelle-sono-ragazze-normali-c5300ad4-710f11e5-a92c-8007bcdc6c35.shtml
Explanation: In the first example, Rachel Denton is
introduced with both name and surname only
at the beginning of the text. Instead of
referring to her by surname, as we noticed to be the
norm in analogous cases where men were
subjects, the author keeps calling her by
first-nameonly throughout the whole article. In the second
example, women’s surnames were not mentioned
even at the beginning of the article.</p>
          <p>Explanation: See here the dissymmetric use of the
article la in front of the surnames Taverna and
Boschi, but not in front of Castaldi.</p>
          <p>Italian: Berlusconi chiede alla Merkel un aiuto [...] 42
English: Berlusconi asks the Merkel for help [...]</p>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>A.7. Asymmetric usage of adjectives</title>
        <p>A.6. Feminine article before surname This category is part of Sabatini’s Asymmetries in the
usage of adjectives, substantives, diminutives, and verbs,
We decided to dedicate a separate class to this phe- though we decided to address each of these phenomena
nomenon due to its high frequency. The asymmetric separately. The decision was mainly motivated by the
usage of the feminine article la followed by the surname low frequency of the single categories, whose specific
of a woman, also defined as dissymmetric feminine in nuances were easier to identify using smaller and less
[34], is widely spread in the Italian language. Being not ambiguous labels.
used for men, the functionality of this marker is mainly The adjectives that we considered in the analysis refer
to make the gender of the person visible, attaching even mainly to three semantic areas that perpetuate the
to proper names, as noted in [11], the gender bias that gender bias of seeing women as small, silent, and
perceives women as the exception to the norm. uniquely identified through physical characteristics
(which reinforces the idea of women as sex objects).</p>
        <p>Examples: Additionally, we included other adjectives that we
noticed being used asymmetrically for men and women.</p>
        <p>Italian: La Eva Longo... che lo sai, no? è grande amica Following the approach used in [13], we double-checked
di Nicola Cosentino, Nick o’ mericano, a sua volta each potentially asymmetric adjective on Word Sketch43,
amico dei Casalesi... beh, la Longo s’aspetta di a tool that shows in which contexts a word typically
diventare presidente della commissione Infras- appears and to which other words it is generally
trutture... Poi c’è [...]. 40 associated.</p>
        <p>English: The Eva Longo... who, you know right?, is a
good friend of Nicola Cosentino, Nick the Ameri- Examples:
can, who is in turn a friend of the Casalesi family... Italian: Lei è stata per decenni la nostra vivacissima,
well, the Longo expects to become president of intelligentissima ’spalla’. [...] era una persona
inthe Infrastructure Commission... Then there is tellettualmente vivace [...] con quel suo musetto
[...]. dolce e furbo [...] sproporzionata rispetto al
corpo esile [...]. Quel lavoro silenzioso [...]. 44
Explanation: Note how here even the full name of</p>
        <p>Eva Longo is preceded by the feminine article la,
asymmetrical to Nicola Cosentino’s name which
has no article.</p>
        <p>Italian: Anche quando la Taverna chiama prostituta
la Boschi, o quando Castaldi mi dà del parassita
sociale. 41
English: Also when the Taverna calls the Boschi a
prostitute, or when Castaldi calls me a social
parasite.
40http://www.corriere.it/politica/15_ottobre_02/accuse-dollarifalsi-veleni-verdiniani-resa-ce-chiudono-3a2b9798-68c5-11e5a7ad-17c7443382c3.shtml
41https://www.corriere.it/politica/15_ottobre_05/non-devoscusarmi-quel-gestaccio-l-ha-fatto-lezzi-io-l-ho-mimato9194e8f4-6b4a-11e5-9423-d78dd1862fd7.shtml</p>
        <p>English: She has been for decades our very lively, very
clever ’sidekick’. [...] she was an intellectually
lively person [...] with her lovely, astute little
face [...] disproportionate to the slight body [...].</p>
        <p>Her silent work [...].</p>
        <p>Explanation: Noteworthy is here the usage of the
word vivace (lively). This term is usually used to
refer to children, for example in the expression
è un bambino vivace (he is a lively child). This
is also backed by Word Sketch, where the only
42http://www.lastampa.it/2015/10/23/italia/politica/berlusconichiede-alla-merkel-un-aiuto-per-tornare-in-sella</p>
        <p>AYibqnhAmZhvGxdgRHlJaJ/pagina.html
43https://www.sketchengine.eu/guide/word-sketch-collocations</p>
        <p>and-word-combinations/
44http://www.repubblica.it/cultura/2015/10/14/news/daniela_</p>
        <p>bellingeri_lutto-125034845
human referents for the adjective are namely
the substantives bambino, bimbo, bambina.</p>
        <p>The expression is not typically used to refer
to adult men. The underlying idea is to draw
a parallelism between women and children
[35]. Also, the adjectives dolce (sweet) and esile
(slender) are rarely used for men since they do
not adhere to their stereotypical gender roles.</p>
        <p>Both mostly refer to inanimate subjects and their
only human subject indicated on Word Sketch is
femmina (female). Additionally, [36] mentions
the adjectives lovely and sweet (both translated
as dolce in Italian) as being typically feminine. As
for the association between women and silence,
here the silent work conveys precisely the idea of
knowing one’s place, highlighted also by the use
of ’sidekick’ to describe the referent’s attitude to
do her job in the shadows, without seeking due
recognition.</p>
        <sec id="sec-6-4-1">
          <title>Italian: Grintosa e parecchio determinata, la violin</title>
          <p>ista nizzarda Solenne Païdassi approda domani
sera alla Verdi, sull’onda di una notorietà ormai
internazionale. 45
English: Gutsy and quite resolute, the violinist from
Nice Solenne Païdassi will land tomorrow evening
at the Verdi theatre, on the wake of an at this point
international notoriety.</p>
          <p>Explanation: The word grintoso (gutsy) has
femminilità grintosa (gutsy femininity) among its
noun modifiers on Word Sketch. Note that in
[14], in the initial examples that refer to the
Signorino Buonasera, we find the word grinta
(grit), sarcastically used to refer to a man. It is
noteworthy, that the word determinata (resolute)
would itself be more stereotypically masculine,
therefore the author uses quite to smooth its
meaning.</p>
        </sec>
        <sec id="sec-6-4-2">
          <title>Italian: Nomi semplici e accattivanti di donne «nor</title>
          <p>mali». Perchè loro, le splendide «modelle per
caso» di Intimissimi [...] rendendo protagoniste
le personalità di donne reali[...]. 46
English: Simple and charming names for «normal»
women. Because they, the splendid «models
by chance» from Intimissimi [...] featuring real
women [...].
45https://milano.repubblica.it/cronaca/2015/10/08/news/solenne_
pai_dassi_il_mio_stravinskij_brioso_e_ardente_vi_emozionera_124622485/
46http://www.corriere.it/moda/news/15_ottobre_12/miriambarbara-marsica-modelle-sono-ragazze-normali-c5300ad4-710f11e5-a92c-8007bcdc6c35.shtml
Explanation: It is interesting to see how the words
normal and real are here used to refer to and
comment on the bodies and the physical appearance
of these women. This intention is made evident
by the fact that the referenced women are called
"models by chance", which explicitly draws a
parallel between models’ physical appearance and
the one of "normal" women. We can note the
asymmetric usage of the adjectives by changing
the referent to a male one, since the expression
real man is more related to moral and behavioural
attitudes. The same can be noticed also for the
expression normal man, where normal also refers
more to the mental/psychological sphere rather
than the physical one. Splendida, which has in
Italian a connotation similar to amazing in
English, and charming are also mentioned in [36] as
typically feminine adjectives.</p>
          <p>Italian: [...] ci sono scatti di Sebastiano F. assieme a una
showgirl bionda che gli cinge la vita, a una mora
altrettanto famosa e procace.47</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>A.8. Asymmetric usage of substantives</title>
        <p>This class is part of Sabatini’s Asymmetries in the usage
of adjectives, substantives, diminutives, and verbs and
exhibits asymmetry in two key aspects.</p>
        <p>The first is the presence of words exclusively associated
with women, for which a corresponding male form does
not exist. These words mostly come from semantic
domains such as sexuality, physical appearance, and
marital status, which describe societal realms in which
women are often relegated. This phenomenon can be
exemplified by the absence of a masculine form for
the word prostituta (prostitute). As noted in [11], this
is not in line with the trend in Italian of creating a
masculine term when men start occupying professions
traditionally occupied by women only (see the word
ostetrico, obstetrician).</p>
        <p>The second phenomenon we want to investigate in this
47http://roma.corriere.it/notizie/cronaca/15_ottobre_23/scandalogay-gigolo-collaborava-la-onlus-fondata-un-cardinale6defd1b8-7903-11e5-95d8-a1e2a86e0e17.shtml
class is word pairs that, despite having a denotatively
equivalent male version, carry a negative connotation
when used in their feminine form. In this context as well,
the semantic loading (see [11]) attributed to the female
version of these words often has a sexual undertone.</p>
        <p>One example of this phenomenon is the asymmetric
usage of zitella (spinster) and scapolo (bachelor).
Italian: Il video della campagna - che ha come
testimo</p>
        <p>nial la showgirl Filippa Lagerbäck - [...]. 48
English: The campaign video - which has showgirl</p>
        <p>Filippa Lagerbäck as testimonial - [...]
Explanation: The term showgirl does not have a male
equivalent, since *showboy does not exist. Rather,
the word presentatore (host, presenter) would
be used for men. Also, the original meaning of
showgirl in English was a young woman regarded
as an object of display49, which gives a sexual
connotation to the term, moving the attention
to the outer appearance of women rather than
to their profession or talent and reinforcing the
idea of women as objects.</p>
        <p>Italian: Anche quando la Taverna chiama prostituta
la Boschi, o quando Castaldi mi dà del parassita
sociale. 50
English: Also when the Taverna calls the Boschi a
pros</p>
        <p>titute, or when Castaldi calls me a social parasite.</p>
        <p>Explanation: Prostituta is asymmetric in that there
exists no male equivalent, both grammatically
(*prostituto) and semantically (the use of gigolò
does not have the same negative connotation).</p>
        <p>Italian: Si tutela il diritto del fanciullo alla continuità
afettiva e si rendono entrambi i partner titolari
di diritti e doveri verso di esso. 51
English: The right of the child to emotional
continuity is protected, and both partners are appointed
rights and duties towards it.
48http://www.repubblica.it/ambiente/2015/10/02/news/</p>
        <p>salvaciclisti_limite_auto_citta_-124169246
49https://www.oed.com/dictionary/showgirl_n?tab=meaning_and_</p>
        <p>use
50https://www.corriere.it/politica/15_ottobre_05/non-devoscusarmi-quel-gestaccio-l-ha-fatto-lezzi-io-l-ho-mimato9194e8f4-6b4a-11e5-9423-d78dd1862fd7.shtml
51http://www.lastampa.it/2015/10/14/italia/politica/unioni-civili-ilsenato-boccia-lo-stop-2JF3S0Cf01foHuw9kzCecM/pagina.html</p>
        <p>Explanation: In this case, the word fanciullo is used
to indicate children in general. However, the
asymmetry here lies in the fact that, while the
male form is used as a synonym for children, the
feminine fanciulla is employed also for young
women. The definition of fanciulla is namely
young woman or non-married woman of any age
or young woman with whom one makes love52.</p>
        <p>This last definition shows how the term can also
be loaded with sexual connotations (not carried
by the word bambina, which better defines a
girl-child). Therefore, we can argue that the
word bambino is to be preferred in this context.</p>
        <p>Italian: La signora, assunta con un contratto a tempo</p>
        <p>determinato di cinque mesi [...]. 53
English: The lady, employed on a fixed-term contract</p>
        <p>for five months [...].</p>
        <p>Italian: [...] Lui in due anni e mezzo ha fatto quello che
“questi qua” non hanno fatto in 40 anni», protesta
una signora. 54
English: [...] In two years and a half, he managed to do
what "that others" did not manage in 40 years»,
complains a lady.</p>
        <p>Explanation: In the last two examples, we can see
the asymmetric usage of the word signora, in
the meaning of lady. In the examined corpus,
all instances of signore were always followed
by the last and/or first name of the referent,
which suggests its usage mainly as a title. In
contrast, signora, exactly like lady in English, can
be used as a synonym for woman and appears in
contexts, like the ones in the examples, where for
the masculine the word man would be employed.</p>
        <p>Italian: Ma che sarebbe solo la piccola parte scoperta di
una imponente rete sommersa, bracconieri e
commercianti che farebbe capo proprio alla signora</p>
        <p>Yang Feng Glan [...]. 55
52https://www.treccani.it/vocabolario/fanciulla_%28Sinonimi-e</p>
        <p>Contrari%29/
53http://roma.corriere.it/notizie/cronaca/15_ottobre_09/buzzivince-tribunale-ma-solo-contro-l-ex-amante-c55a683e-6e8911e5-aad2-b4771ca274f3.shtml#post-0
54http://roma.corriere.it/notizie/cronaca/15_ottobre_12/rischi-ilgiubileo-roma-piedi-oltre-duemila-anni-305f7aa4-70bd-11e5a92c-8007bcdc6c35.shtml
55http://www.corriere.it/cronache/15_ottobre_08/tanzania-larresto-regina-dell-avorio-007-italiani-ca0e9016-6df4-11e58aec-36d78f2dc604.shtml#post-0
Explanation: Conversely, we decided not to consider
cases like this last example, where signora was
followed by the name and/or surname of the
person. This choice was motivated by the fact that,
at least in the corpus examined, we did not find
strong asymmetries with the masculine
counterpart.</p>
        <p>English: However, this would be only the small un- Explanation: The asymmetry lies here in the reference
covered part of a huge underground network of to give kisses, which is a verb that belongs to
poachers and traders under the control of Mrs. the private sphere and is here used instead in
Yang Feng Glan [...]. a public context. This is in line with what was
noted by [37] about the overlapping of the private
and public spheres which permeates the Italian
political scene and becomes even more evident
in connection with women.</p>
        <p>Italian: [...] Matteo Salvini che considera «pazzesco»
che venga indagato e «sputtanato» un «leghista
onesto e concreto». 58</p>
      </sec>
      <sec id="sec-6-6">
        <title>A.9. Asymmetric usage of verbs</title>
        <p>Since Sabatini provided only some very specific examples
for this category, we tried to identify and assess possible
asymmetries based on the examples found in the
corpus and what was examined for the other categories.
Through this analysis, we identified two main trends.
The first pertained to the usage of verbs derived from the
same or similar semantic areas stereotypically associated
with women that were pointed out in the previous
classes. The second focuses on the roles assumed by
female and male actors in the use of certain verbal
constructions. In particular, we limited our analysis to
verbs in which both men and women referents were
included in the action, but only men had the agentive
roles, leaving women the role of passive objects.
Italian: [...] il compagno musicista, la portava in
campagna. 56
English: [...] the partner, who’s a musician, took her
to the countryside.</p>
        <p>Explanation: In the construction "male subject + take
+ female object + to do something", men and
women do not participate together in the action.
Rather, the man takes on an agentive role and the
woman the passive role of being the one "taken
somewhere to do something".</p>
        <p>Italian: [...] e poi alla Boschi passerà la voglia di ridere,
di dare baci e inizierà a sudare freddo. 57
English: [...] and then, the Boschi will get over the urge
to laugh, give kisses, and she will break out in a
cold sweat.
56http://www.repubblica.it/cultura/2015/10/14/news/daniela_
bellingeri_lutto-125034845
57http://www.corriere.it/politica/15_ottobre_05/ddl-boschisenato-articolo-6-voto-segreto-846e6dae-6b80-11e5-9423d78dd1862fd7.shtml#post-0
English: [...] Matteo Salvini, who considers «insane»
that an «honest and authentic member of the Lega
party» will be investigated and «fucked up».
Explanation: The verb sputtanare (to fuck up) comes
from the root of puttana (slut). As for prostituta,
puttana does not have a male equivalent, which
makes the word itself and all its derivations
asymmetric.</p>
      </sec>
      <sec id="sec-6-7">
        <title>A.10. Diminutives</title>
        <p>Diminutives are the last aspect taken into consideration
in Asymmetries in the usage of adjectives, substantives,
diminutives, and verbs. In [35], the author draws a
detailed picture of the semantic meanings associated with
the diminutive. In particular, he identifies a link between
diminutives and the female gender across all languages,
based on the conceptual metaphor of women as children
and "small things" in general. This conceptualization
derives from the opposition between female/male, which
sees women as smaller than men, both on a physical and
power level. It is interesting to note, that this parallel
between women and children could also explain the
asymmetry in first-name references to women and men.</p>
        <sec id="sec-6-7-1">
          <title>Examples:</title>
          <p>Italian: Con il sorriso, con quel suo musetto dolce e
furbo, gli occhialetti [...]. 59
English: With her smile, her lovely astute little face,
the small glasses [...].</p>
          <p>Explanation: In Italian, diminutives are formed using
sufixes -etto, -ino, -ello, and -uccio [38] as
modiifers of the lexical root to which they are attached.
Note that the article from which both examples
are taken refers to a woman in her 50s, although
58http://www.lastampa.it/2015/10/13/italia/politica/berlusconimantovani-corretto-sono-stupito-MeStPSkDhe5HPxfSAV3iyH/
pagina.html
59http://www.repubblica.it/cultura/2015/10/14/news/daniela_
bellingeri_lutto-125034845
the use of diminutives associates her more with
a child than with an adult woman. Moreover, the
word musetto, diminutive of muso (face, snout),
contributes to the metaphor of women as small
animals (see A.11).</p>
        </sec>
      </sec>
      <sec id="sec-6-8">
        <title>A.11. Asymmetric usage of tropes and tone</title>
        <p>This label corresponds to the same level of granularity as
Sabatini’s Asymmetries in the usage of images and tone.
Concerning the tropes, we focused mainly on the use of
metaphors, metonymy and synecdoche since they are
more common, but other types of tropes should also
be considered in this category if instances of them are
present in the corpus. The methaphors we focused on
are based on [39], [40], [41], [42], and [43], and are:
• Women as small animals: echoes back to the
idea of women as prey in the "sex-is-hunting"
metaphor
• Women as femmes fatales: compares women,
usually occupying positions of power, to either
felines (tigers, lionesses, cats), to underline their
slyness and charm, or insects known to have power
over their male counterparts (lucciola, firefly)
• Women as flowers : suggests the idea of the
fragility and powerlessness of women.</p>
        <p>Another trope that seems to be widely used in this
context is metonymy, and more specifically synecdoche,
in which women are presented by only referring to
their single body parts. This has the result (and aim) of
objectifying the woman referent by presenting her as a
mere anatomical fragment, only there for the male gaze
to be pleased [44].</p>
        <p>As for the asymmetric usage of tone, we limited our
analysis to a single phenomenon which seemed to
co-occur frequently with women referents in the corpus,
namely the use of scare quotes [45]. This decision was
motivated by the high level of interpretability of what to
consider a "sexist tone" and the dificulty (already for
human beings, let alone for models) to assess it.
We also included in this class idioms and proverbs that
have a misogynistic and sexist undertone.
Italian: [...] uno scricciolo di donna. 60
English: [...] a little slip of a woman.
60http://www.corriere.it/cronache/15_ottobre_23/ciao-vera-fattamercurio-elegante-irrequieta-dificile-non-averti-qui-23ca7324796f-11e5-a624-46f9df231ebf.shtml
Explanation: Here we can see the usage of the woman
as small animal metaphor. In Italian, scricciolo
literary means Winter Wren, a bird characterized
by its small dimensions. Moreover, the definition
provided by Treccani 61 attests to its usage to
refer specifically to children, which makes the
whole metaphor also in line with the parallel
woman-child.</p>
        <p>Italian: E quello che “rinuncia a 42 milioni di euro
mentre gli altri hanno approvato la Legge Boccadutri
(o bocca di rosa) con tempi da speedy gonzales”.
62
English: And the one who ’gives up 42 millions euros,
while the other approved the Boccadutri Law (or
bocca di rosa) at speedy gonzales speed.’
Explanation: The expression bocca di rosa (mouth of
rose) is particularly interesting. On the one hand,
it represents the metaphor women as flower due
to the reference to the rose, which is rich in
symbolism in Western cultures. On the other,
bocca di rosa is the title of a song by Fabrizio De
Andrè, a famous Italian singer-songwriter. The
song narrates the story of a sex worker, who is
referred to namely as bocca di rosa, and the term
has therefore become a synonym for prostitute
in Italian. Thus, in this example, the dimension
of fragility and that of sex intertwine in a single
oxymoronic metaphor.</p>
        <p>Italian: Dall’altro il pragmatismo di Casaleggio che fa
capire con chiarezza chi porta - e continuerà a
portare per un po’ - i pantaloni in casa
Movimento 5 Stelle [...]63
English: On the other side, we have the pragmatism of
Casaleggio which shows who wears - and will
keep wearing for a while - the trousers in
the house of Movimento 5 Stelle.</p>
        <p>Explanation: This example refers to sexism in idioms.</p>
        <p>Trousers were in the past a piece of cloth worn
only by men so that the expression has the same
meaning as to be the man of the house. This
refers to the clear patriarchal hierarchy that sees
men as the ones who decide and rule within the
61https://www.treccani.it/vocabolario/scricciolo/
62http://www.corriere.it/politica/15_ottobre_17/grillo-bis-sognotogliere-mio-nome-logo-maio-candidato-premier-non-certoabbiamo-regole-5faea604-750e-11e5-a7e5-eb91e72d7db2.shtml
63http://www.lastampa.it/2015/10/19/italia/politica/
casaleggio-stoppa-di-maio-non-passiamo-il-testimoneVHrr5YruY7MTPfLZtITs2I/pagina.html
domestic walls. This idea is reinforced by the
juxtaposition of the word casa (home), which
indicates a private space, and the name of the
political party, which is instead public [37].</p>
        <p>Italian: [...] ci sono scatti di Sebastiano F. assieme a una
showgirl bionda che gli cinge la vita, a una mora
altrettanto famosa e procace.64
English: [...] there is a photo shoot with Sebastiano F.
together with a blonde showgirl encircling his
waist, and an equally famous and provocative
brunette.</p>
        <p>Explanation: This is an example of synecdoche. Note
how the information provided to identify the
subjects varies across the sentence. First, the
only man among them is presented by his first
name (and the initial of the surname, probably
for privacy reasons). Then, the first woman is
described by her hair colour and her professional
title (asymmetric as we noted in A.8 ). Finally,
the last one is only denoted by a fragment
of her body, namely her hair colour, and her
attitude, which additionally carries a clear sexual
undertone.</p>
        <sec id="sec-6-8-1">
          <title>Italian: La «regina dell’avorio» è una imprenditrice</title>
          <p>cinese di successo, traficante di zanne nel tempo
libero. 65
English: The «queen of ivory» is a successful Chinese
entrepreneur, who trafics ivory fangs in her free
time.</p>
          <p>Italian: [...] sarebbe diventata un «capo» assoluto [...].</p>
          <p>66
English: [...] she would have become an absolute
«boss» [...].</p>
          <p>Italian: [...] la presidente nazionale della Fiab Giulietta
Pagliaccio si è "armata" di vernice bianca e
pennello [...]. 67
64http://roma.corriere.it/notizie/cronaca/15_ottobre_23/scandalogay-gigolo-collaborava-la-onlus-fondata-un-cardinale6defd1b8-7903-11e5-95d8-a1e2a86e0e17.shtml
65http://www.corriere.it/cronache/15_ottobre_08/tanzania-larresto-regina-dell-avorio-007-italiani-ca0e9016-6df4-11e58aec-36d78f2dc604.shtml#post-0
66http://www.corriere.it/cronache/15_ottobre_23/ciao-vera-fattamercurio-elegante-irrequieta-dificile-non-averti-qui-23ca7324796f-11e5-a624-46f9df231ebf.shtml
67http://www.repubblica.it/ambiente/2015/10/02/news/
salvaciclisti_limite_auto_citta_-124169246
English: [...] the national president of Fiab Giulietta
Pagliaccio "armed" herself with white paint and
brush [...].</p>
          <p>Italian: Senza mezzi termini le ’ha cantate’ su Facebook
a un’agenzia di modelle che le aveva chiesto di
dimagrire [...]. 68
English: Bluntly, she ’gave it’ on Facebook to a
modelling agency who asked her to lose weight
[...].</p>
          <p>Explanation: In many texts, we detected the usage
of quotation marks to attenuate the meaning
of verbs or substantives usually associated
with masculinity when used to refer to female
subjects. The first two cases exemplifying this
phenomenon are the words capo (boss) and
regina (queen) in quotation marks. Regarding the
ifrst, there is no contextual reason that suggests
such use of scary quotes, since being a boss
should not be something extreme for women.
For the second, one can argue that the intention
was to mark the whole expression queen of ivory
as a nickname for the woman. If that is the case,
this would attribute a sense of paternalism and
trivialization to the story, which is nevertheless
to be considered an instance of sexism in the
use of tone and therefore classified under this
category.</p>
          <p>The remaining examples employ scare quotes to
attenuate verbs. In the first case, the verb armarsi
(to arm oneself), clearly echoes images of war
and violence. This must have seemed too strong
to be associated with a woman, and therefore
the author preferred to attenuate its meaning by
adding quotation marks. As for the second, the
choice of the verb cantarle is already attributing
a note of attenuation and trivialization to the
narration, even without the usage of scare quotes.
Italian: [...] intelligentissima ’spalla’, l’anima
dell’archivio [...]. Lei era la nostra ’complice’
[...] le piaceva ’regalare’ le sue capacità [...]
molti di noi hanno continuato a ’saccheggiare’
la disponibilità e cultura di Daniela [...]. 69
English: [...] very clever ’sidekick’, the life of the
archive [...]. She was our ’accomplice’ [...] she
68http://www.corriere.it/salute/nutrizione/15_ottobre_16/modelladice-basta-andate-fare-c-non-posso-tagliarmi-ossa-6e4a9b5a7400-11e5-846d-a354bc1c3c5e.shtml
69http://www.repubblica.it/cultura/2015/10/14/news/daniela_
bellingeri_lutto-125034845
liked ’giving away’ her abilities [...] many of us
continued to ’plunder’ Daniela’s willingness and
knowledge [...].</p>
          <p>Explanation: Diferently from the previous examples,
we can see here the apologetic usage of quotation
marks (see [45]) to express detachment from the
arguably not-quite-correct attitude of Daniela’s
colleagues towards her. The picture which this
description evokes is a woman with many
capabilities (she is elsewhere in the text defined as
"very intelligent", "well-read" and "educated"), but
who nonetheless has a marginal role and whose
knowledge is exploited by others (here
saccheggiare is in quote marks to achieve some sort of
attenuation of the behaviour, although the term
exactly describes the attitude of the colleagues
towards her).</p>
        </sec>
      </sec>
      <sec id="sec-6-9">
        <title>A.12. Identification through man</title>
        <p>We decided to split Sabatini’s Asymmetries in the usage
of identification of women through men, age, profession
and role into two categories, namely this one and the one
in the following section A.13. Also, we did not include
in the analysis the variables of age and profession. On
the one hand, this choice was motivated by the fact that
Sabatini herself did not provide any examples for these
categories. On the other, both profession and age were
variables already analysed in other classes in the current
study.</p>
        <p>In general, this class refers to instances where women
are presented in texts through their relationship to a man
in expressions such as daughter of, wife of or girlfriend of.
Italian: Sergio e la moglie erano finiti in carcere</p>
        <p>nell’ambito dell’inchiesta del procuratore [...]. 70
English: Sergio and his wife were imprisoned as a
re</p>
        <p>sult of an investigation by the prosecutor [...].</p>
        <p>Explanation: Here Sergio’s wife has no name and she
is just identified through the relationship to her
husband.</p>
        <p>Italian: La prima vittoria in un’aula di tribunale
Salvatore Buzzi l’ha ottenuta con la sua ex amante.
71http://roma.corriere.it/notizie/cronaca/15_ottobre_09/buzziKatia Cipolla, con cui [...]. Buzzi aveva denun-
vince-tribunale-ma-solo-contro-l-ex-amante-c55a683e-6e89ciato la ex [...]. Dietro la richiesta, la minaccia 11e5-aad2-b4771ca274f3.shtml#post-0
velata di rivelare la relazione alla moglie. [...]
72shttatplk:/i/nwg-wdwal.llaa-setaxm-mpao.gitl/i2e0-c1i5n/q10u/a1n7t/eendnizei-oinmi/pimerpieesreia-/aagclci-uasrarteos-tdi-idomiciliari-whZpW2q8UJlOinsRECEJvL/pagina.html
70https://milano.repubblica.it/cronaca/2015/10/05/news/milano_ 73http://firenze.repubblica.it/cronaca/2015/10/11/news/
scarcerato_dopo_tre_mesi_i_genitori_di_fatima_la_foreing_
perseguitava_l_ex_fidanzata_arresti_domciliari_per_un_25enneifghter_dell_is-124402650/ 124838732
Ma al processo non si è costituito parte civile
contro l’ex amante, per la quale il pm aveva chiesto
l’assoluzione. 71
English: Salvatore Buzzi achieved his first win in
court against his ex-lover. Katia Cipolla, with
whom [...]. Buzzi pressed charges against the ex
[...]. Behind the request, there was the threat of
revealing the afair to the wife. [...] But at the
trial, he did not bring a civil action against the
ex-lover, of whom the public prosecutor asked
for acquittal.</p>
        <p>Italian: [...] per evitare che le cose potessero degenerare
in atti di violenza nei confronti della ex moglie
e del figlioletto. 72
English: [...] to avoid that the situation could
degenerate in violence against the ex-wife and the little
child.</p>
        <p>Italian: Il giovane, tra giugno 2011 e aprile 2012, aveva
più volte perseguitato e minacciato l’ex
fidanzata. 73
English: The young man had harassed and threatened
the ex-girlfriend multiple times between June
2011 and April 2012.</p>
        <p>Explanation: Sabatini highlights as particularly
ofensive the expression ex-girlfriend/lover/wife, which
implies that a woman continues to be identified
by her male partner, even after the relationship
has ended.</p>
        <p>Note that the two last examples refer to situations
of possible domestic violence. This makes even
more problematic the usage of the terms ex-wife
and ex-girlfriend respectively because it suggests
the identification of possible victims through their
oppressors.</p>
      </sec>
      <sec id="sec-6-10">
        <title>A.13. Identification through gender/role</title>
        <p>In this section, our primary objective is to highlight
instances where women are portrayed in texts through
their role as mothers. Note that we excluded instances
of mother of from the previous category, as one can be
the mother of individuals of any gender, rendering it
incongruent with the description Indentification through
men.</p>
        <p>In this context, the asymmetry arises from the societal
expectation that becoming a mother constitutes a
defining and comprehensive experience for women,
while the same expectation does not apply to men. We
evaluate this phenomenon in two aspects. Firstly, when
information about being a mother is mentioned out
of context, diverting attention from other aspects of
the referent’s life. Secondly, when being a mother is
relevant to the context, but no additional information
is provided about the woman in question, suggesting
that being qualified as a mother alone sufices for
identification. Furthermore, we will consider cases
where women are identified by their gender rather than
their profession, particularly in situations where the
latter holds significance.
Italian: Molti esponenti politici si sono detti
scandalizzati, ma la reazione più eficace è stata quella di
Caroline Boudet, mamma di Louise [...]. 74
English: Many politicians said to be shocked, but the
most impressive reaction was the one by Caroline
Boudet, mother of Louise [...].</p>
        <p>Explanation: Caroline Boudet is a journalist. Although
in this specific context, the fact that she was
the mother of Louise was relevant, it was not
the only main focus of the story. Nonetheless,
this is the only title used to qualify her in the
whole article. We argue that the contrast is here
made even more evident by the contraposition
with the word politicians, who are described
exclusively by their professional role and not by
that of parents (since it is highly likely that most
of them are parents themselves).</p>
        <p>Italian: Finisce così la storia di Assunta, la madre di</p>
        <p>Fatima, la jihadista italiana [...]. 75
English: Thus ends the story of Assunta, mother of</p>
        <p>Fatima, the Italian jihadist [...].</p>
        <p>Explanation: Similarly to the previous example, we
have no further information about Assunta
74http://www.corriere.it/esteri/15_ottobre_08/vignetta-choccharlie-hebdo-cita-de-gaulle-ma-ofende-down-4bc6e35c-6df911e5-8aec-36d78f2dc604.shtml
75http://www.corriere.it/cronache/15_ottobre_08/funeralicattolici-la-madre-fatima-jihadista-italiana-7bdc0eb6-6de311e5-8aec-36d78f2dc604.shtml#post-0
except that she is the mother of someone.
Additionally, note how both women referents
(namely Assunta and Fatima) are here only
introduced by their first names. Notably, in other
parts of the article, the father of Fatima is not
presented only through his relationship with the
daughter.</p>
        <p>Italian: Le mamme sono preoccupate [...]. 76
English: The mothers are worried [...].</p>
        <p>Explanation: Here, women are considered as a
separate homogeneous category, where members
are uniquely characterized by the fact of being
mothers.</p>
        <p>Italian: E così, le ragazze si allenano tutto l’anno con
sessioni di training speciali tra yoga, pilates e
boxe. [...] E poi, diciamocelo, una ragazza
farebbe qualsiasi cosa per non perdere il posto su
quella passerella [...]. 77
English: Thus, the girls work out the whole year with
special training sessions involving yoga, pilates,
and boxing. [...] And let’s be honest, a girl would
do anything not to lose her spot on that catwalk
[...].</p>
        <p>Explanation: Here, ragazze is used as a synonym of
models, which is the profession occupied by the
subjects of this article. The suggested efect is of
trivialization of the profession, probably because
mainly associated with women and based on
outer appearance, which is one of the few aspects
considered important for women.</p>
      </sec>
      <sec id="sec-6-11">
        <title>A.14. Usage of physical characteristics to describe and present women</title>
        <p>This category was not directly included in Sabatini’s
work. Nevertheless, we wanted to gather in one class
all instances in which women were depicted through
their physical appearance and that could not be resolved
in one of the previous categories. Here, we are not
delving into specific word classes as we did for the
asymmetries in the usage of substantives, adjectives,
and verbs. Instead, our focus lies on the organization of
information and the decision to emphasize aspects of
women’s outer appearance rather than other facets.
76
77http://www.corriere.it/moda/news/15_ottobre_08/soltanto-4litri-d-acqua-angeli-victoria-s-secret-dieta-1b54aab8-6de711e5-8aec-36d78f2dc604.shtml#post-0</p>
        <p>Explanation: In all these examples, but particularly in
the last two, references to the outer appearance of
these women are completely out of context. Note
that we excluded from this category references
to women’s bodies in cases where it could be
considered relevant for the profession, for example
in the case of models. Although this choice can
be considered arguable, we explicitly wanted to
consider only cases where the inappropriateness
of these comments was obvious.
78http://www.repubblica.it/spettacoli/cinema/2015/10/24/news/
morta_maureen_o_hara_stella_di_john_ford-125818160
79https://milano.repubblica.it/cronaca/2015/10/08/news/solenne_
pai_dassi_il_mio_stravinskij_brioso_e_ardente_vi_emozionera_124622485/
80http://www.corriere.it/moda/news/15_ottobre_12/miriambarbara-marsica-modelle-sono-ragazze-normali-c5300ad4-710f11e5-a92c-8007bcdc6c35.shtml
Table 2 shows the labels’s distribution in the dataset. The
results are cumulative of all newspapers included in the
analysis, namely Repubblica, Il Corriere della Sera and La
Stampa.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>C. Error Analysis</title>
      <p>We present the error analysis and the concrete results
achieved by both pipeline and baseline for the labels
Generic masculine, Masculine of professions, Asymmetric
usage of names, surnames and titles, Feminine article
before surname and Identification through man .</p>
      <p>The error annotation was done manually by first
extrapolating all misclassified sentences for each label, splitting
false positives and false negatives. Then, we collected
and clustered similar error patterns in the misclassified
instances and analysed the possible reasons that led to
diferent error types.</p>
      <sec id="sec-7-1">
        <title>C.1. Generic Masculine</title>
        <p>This was the most diverse class among those considered
in this analysis. Table 3 shows the results obtained for
this label. Overall, the model was able to understand
the main features of the phenomena falling into this
category, although not always classifying them correctly.</p>
        <p>With a higher number of false positives than false
negatives, the model tended to classify more instances than
the annotated ones, sometimes showing only a
superficial understanding of the phenomena, and other times
posing legitimate doubts about the annotation itself.</p>
        <p>In particular, the precedence of masculine in female/male
oppositional couples was the marker that was better
recognized by the model, which even pointed out cases that
were not correctly included in the annotation. These are
the only instances misclassified by the model:
Italian: «[il Pd] Caro Pd siamo pronti a difendere il</p>
        <p>diritto dei bambini ad avere mamma e papà».81
Translation: «[Democratic Party] Dear Pd, we are
ready to defend the right of children to have a
mother and a father».</p>
        <p>Explanation: Non-sexist oppositional couple, where
the feminine precedes the masculine, which
should therefore not be classified as a member of
this class.</p>
        <p>Italian: [...] Arrivano in piazza del Campidoglio in
piccoli gruppetti, marito e moglie, tre amiche, due
compagni di sezione del Pd [...]. 82
Translation: [...] They come to Campidoglio Square
in small groups, husband and wife, three friends,
two fellow members of the Pd [...].</p>
        <p>Explanation: This instance was misclassified as</p>
        <p>Identification through man , probably for the
occurrence of the words husband and wife that
are common for this class.</p>
        <p>Italian: Invece in Italia ci sono voluti circa quindici
anni, e un lavoro di mediazione certosina, perché
mani, anche ai genitori dell’afido di " concorrere"
all’adozione del ragazzino e della ragazzina
dei quali, di fatto, sono già figure fondamentali. 83</p>
        <p>Translation: It took Italy instead about fifteen years,
and painstaking mediation work, to come to a law
that, starting tomorrow, allows also foster parents
to "compete" for the adoption of the boy and girl
to whom they actually already are fundamental
ifgures.</p>
        <p>Explanation: In this example, both genders are made
explicit by using splitting (i.e. both the male and
female forms occurred). Although this results
in the masculine form preceding the feminine
one, during the annotation process, we decided
not to classify it as Generic masculine because by
using splitting the authors intended to precisely
avoid the use of generic masculine, and we did
not want to penalize this choice. However, the
model correctly identified the precedence of the
masculine form in this case. Therefore, the
annotation should probably be revisited to make it
more strict in this regard and less ambiguous.</p>
        <p>Additionally, the model was able to link the presence
of the substantives uomo/uomini (man/men) with this
class. However, it seemed to limit itself to identifying
and marking all occurrences of these words, rather than
showing an actual understanding of the phenomenon.</p>
        <p>For example, in many cases, the model wrongly classified
instances of the word uomo when referring to one or
more explicit male referents.</p>
        <p>Italian: Moravia, un uomo che amava le donne [...]. 84
Translation: Moravia, a man who loved women [...].</p>
        <p>Italian: Mentre sono partite le indagini continua la
cac</p>
        <p>cia ai due uomini. 85
Translation: While investigations have started, the</p>
        <p>hunt for the two men continues.</p>
        <p>Finally, the model struggled to recognize sexist markers
where women were treated as a separate category and
the disagreement in gender between a subject and its
nominal predicate. While both did not present enough
had the additional obstacle of being more abstract and
less ascribable to the occurrence of specific words.</p>
        <p>Italian: Arrestati cinque cittadini marocchini e due
italiani, tra cui una donna, per rapina aggravata in
concorso. 86
si arrivasse ad una legge che permetterà, da do- examples for the model to properly learn from, the latter
84https://www.corriere.it//cultura//15_ottobre_26//creare-poiafido_puo_diventare_adozione_la_legge_sulla_continuita_
124842364//?rss
85https://firenze.repubblica.it//cronaca//2015//10//11//news/
86https://milano.repubblica.it//cronaca//2015//10//17//news/
/milano_rapine_sui_treni-125266311//?rss
Translation: Five Moroccan and two Italian citizens, Italian: Il ministro dell’Economia Pier Carlo Padoan
one of which a woman were arrested for
aggravated robbery in complicity.</p>
        <p>Translation: The promoters are thirteen organiza- professional titles, some of which still struggle to
permeItalian: I promotori sono tredici organizzazioni di
varie nazioni [...]. 87
tions from diferent countries [...].</p>
        <p>As already pointed out, the model shows some
understanding of which phenomena belong to this class and
hardly ever misclassifies it with other labels. However,
the diversity of the markers included in Generic Masculine
has the detrimental efect of making it dificult for the
model to focus more specifically on single phenomena,
especially in our setting, where only a scarce number of
examples per label is provided. Hence, a possible
solution could be to split this class into smaller classes, each
identifying a more specific marker.</p>
      </sec>
      <sec id="sec-7-2">
        <title>C.2. Masculine of Professions</title>
        <p>Albeit being the class with the most samples and
describing a less complex phenomenon compared to other
classes, the model presented some dificulties in correctly
assessing this sexist marker (see table 4).
Pipeline results for label "Masculine of professions"
Even though it seemed to recognize the link of this label
with high-status professions such as minister, lawyer or
mayor, it was unable to identify the key aspect
considered in this class, lying in the usage of the masculine
form also for women. Rather, it marked all instances of
these titles, regardless of the gender of the referent.
Italian: «Ma possiamo ancora migliorare», ammette il
direttore sportivo Carlo Deslex. 88
Translation: «But we can still do better»,
acknowledges the sports director Carlo Deslex.
124200861//1//?rss
1.35216208/
18/news/basket-la-poli-oppisti-cipir-vince-al-debutto-in-casa</p>
        <p>Translation: The Minister of Economy Pier Carlo
This behaviour could be caused by the absence of
"positive" examples of the correct feminine forms for these
ate and become part of the Italian language. In this case,
eforts in providing more such examples could help the
model focus on the key aspect of this class and thereby
achieve better performance.</p>
      </sec>
      <sec id="sec-7-3">
        <title>C.3. Asymmetric usage of names, surnames and titles</title>
        <p>This was the second class in terms of the number of
samples after Masculine of Professions, and as shown by
table 5 it obtained comparable results.</p>
        <p>Pipeline results for label "Asymmetric usage of names,
surclass, we can notice that the model can correctly link
the class to the presence of female names. Notably, it
seems even more strict than the annotator in classifying
instances where women are referenced only by name.</p>
        <p>The reason could be that the model struggles to identify
contexts in which using only names might be appropriate.</p>
        <p>This is made worse by the fact that the pipeline takes into
account single sentences so that only a limited context is
provided to the model for the prediction.</p>
        <p>Italian: [DANIELA Bellingeri] Daniela era una persona
intellettualmente vivace, colta, amava la musica
e la poesia. 90
Translation: [DANIELA Bellingeri] Daniela was an
intellectually lively, well-read, loved music and
poetry.</p>
        <p>Explanation: In this case, context mattered for the
annotation since the author of the article was
writing about a person they knew, therefore
89https://www.corriere.it//economia//15_ottobre_23//padoan90https://www.repubblica.it//cultura//2015//10//14//news//daniela_
referencing her only by first name. However, this
context was not provided to the model, which
was therefore correctly pointing out the use of
the name only.</p>
        <p>Italian: Fatima. Sono stati scarcerati dopo 3 mesi di
detenzione Sergio Sergio e la moglie Assunta, i
genitori di Maria Giulia ’Fatima’ Sergio , la presunta
jihadista italiana convertita all’Islam e partita per
la Siria per combattere nelle fila del Califato.</p>
        <p>Translation: Fatima. After 3 months in prison, Sergio</p>
        <p>Sergio and his wife Assunta, parents of Maria
Giulia ’Fatima’ Sergio, the alleged Italian jihadist
who converted to Islam and went to Siria to fight
for the Caliphate, have been released.</p>
        <p>Explanation: Here, the name Fatima could be correctly
considered a member of this class. We decided
not to annotate it since it was used as a nickname,
but this decision can give rise to interpretations.</p>
        <p>The last two examples show the dificulty of the
annotation process and the interpretability of single phenomena.</p>
        <p>A possible solution could be to be more strict in the
annotation or expose the model to more fine-grained examples
where the usage of names can be appropriate. The trade- Table 6
of between the two should be considered with respect
to the specific use case where the model is employed.</p>
        <p>Additionally, in the second example, the model classifies
Maria as asymmetric, although the name does contain
her surname. This points out a possible inability of the
model to distinguish cases where either multiple first and
last names are present or some nicknames are introduced
in the middle of the name. Similarly, potential errors
derive from not correctly distinguishing names from
surnames or not recognizing names as such, especially when
the referent does not have an Italian name.</p>
        <p>Italian: [Amazon] [Global] [Jay Carney] [all’ inchiesta
del New York Times] In un post su Medium dal
titolo Quello che il New York Times non ti ha
metodo di lavoro dei due giornalisti che hanno
curato l’inchiesta. 92
Translation: [Amazon] [Global] [Jay Carney]
[investigation of New York Times] In a Medium post
titled What the New York Times did not say,
Carney harshly attacked the working method of the
two journalists that curated the investigation.
/milano_scarcerato_dopo_tre_mesi_i_genitori_di_fatima_la_
foreing_fighter_dell_is-124402650//?rss
ribatte-al-new-york-times-la-vostra-inchiesta-non-rispetta-icriteri-giornalistici-yIvf1nQCNzl8AFWRirtIrJ//pagina.html
raccontato, Carney ha attaccato duramente il Italian: Anche quando la Taverna chiama prostituta
la Boschi, o quando Castaldi mi dà del parassita
93https://palermo.repubblica.it//cronaca//2015//10//02//news/
94https://www.corriere.it/politica/15_ottobre_05/non-devoscusarmi-quel-gestaccio-l-ha-fatto-lezzi-io-l-ho-mimato9194e8f4-6b4a-11e5-9423-d78dd1862fd7.shtml
Explanation: In this case, the model interpreted</p>
        <p>Carney as a female name and misclassiefid it as
member of this class.</p>
        <p>Italian: Lo ha detto Piera Maggio, la madre di Denise</p>
        <p>Pipitone, subito dopo la sentenza di assoluzione
per Jessica Pulizzi, la sorellastra di Denise
accusata di sequestro di persona. 93
91</p>
        <p>Translation: This is what Piera Maggio, mother of</p>
        <p>Denise Pipitone, said right after the verdict of
acquittal for Jessica Pulizzi, Denise’s step-sister,
accused of kidnapping.</p>
        <p>Explanation: On the contrary, here Denise was not
recognized as a female name and therefore not
correctly classified by the model.</p>
      </sec>
      <sec id="sec-7-4">
        <title>C.4. Feminine article before surname</title>
        <p>Thanks to the limited variability and high repetitiveness
of the phenomenon which made it easier for the model
to recognize, this was the class that achieved the best
overall results (see Table 6).</p>
        <p>Pipeline results for label "Feminine article before surname"</p>
        <p>Baseline
Pipeline</p>
        <p>Precision</p>
        <p>Recall

However, we can point out some examples where the
model was unable to identify the label, mainly because it
did not correctly assess the presence of a surname
following the article. In some cases, surnames were interpreted
as nouns either because they also function as nouns in
Italian or because they have a structure that recalls the
one of an Italian noun. This is the case in the following
two examples.</p>
        <p>sociale. 94
parasite.</p>
        <p>Translation: Also when the Taverna calls the Boschi
a prostitute, or when Castaldi calls me a social
Italian: A motivare le pressioni sul protagonista di
svolto dalla Cipolla nell’estate del 2011 [...]. 95
Translation: Pressures on the lead of Mafia Capitale
were motivated by the loss of her job as a
bartender, which the Cipolla did during the summer
of 2011 [...].
Moreover, the model struggled with some foreign sur- figlia , fidanzata
names or surnames with a particular structure such as
O’Hara in the following example, which the model did
not recognise as a surname.</p>
        <p>Italian: [...] la O’Hara aveva ricevuto nel febbraio</p>
        <p>scorso l’Oscar alla carriera. 96
Translation: [...] the O’Hara received last February an
false positives.</p>
        <p>Oscar to her career.</p>
        <p>Finally, there were a few instances where the model was
misled by the surrounding context, resulting in errors
where names of other entities, like bands (first example)
or cars (second example), were mistakenly identified as
surnames:
Italian: San Siro, la Banda Bassotti e la Champions</p>
        <p>sfumata: gli striscioni sfottò anti-Juve.</p>
        <sec id="sec-7-4-1">
          <title>Translation: San Siro, the Banda Bassotti and the</title>
          <p>vanished Champions League: the mocking
banners against Juve. 97
Italian: Un nuovo diesel per la Opel Meriva Opel torna
alle cabrio la Cascada a 29.400 euro [...]. 98
Translation: A new diesel for Opel Meriva Opel reverts
to convertibles the Cascada for 29.400 euros [...].</p>
        </sec>
      </sec>
      <sec id="sec-7-5">
        <title>C.5. Identification through man</title>
        <p>only preceded by Feminine article before surname, the
highest precision (see Table 7).</p>
        <p>The model correctly identifies a link between this class
and the presence of female substantives such as moglie,
125818160//?rss
This was the class that achieved the highest recall and, both phenomena in a single class.
foreing_fighter_dell_is-124402650//?rss
98https://www.repubblica.it//motori//sezioni//prodotto//2015//10/
/legnano_dalla_nonna_ai_cugini_sgominata_un_intera_
famiglia_di_ricettatori_piu_una_complice-124904073//?rss
Pipeline results for label "Identification through man"
or compagna. In many cases, model
predictions raised legitimate doubts about the annotation,
which sometimes had to be reconsidered. Nevertheless,
as we also noted for the occurrence of uomo/uomini in
the Generic Masculine class, the model tends to classify
any instance of such words in the text without lingering
on more subtle analysis. However, diferently from
uomo/uomini, this poses fewer problems, as it causes fewer
One of the most common errors in the model’s
predictions is neglecting whether the relationship is actually
with a man. For example, in the following sentence, the
relationship sorella di (sister of) is referred to a woman,
Fatima, and was therefore not included in the annotation.</p>
        <p>However, one can argue that the phenomenon can be
extended to all cases where someone is presented by their
relationship with someone else, independently of gender.</p>
        <p>The annotation could therefore be revisited to include
also these cases.</p>
        <p>Italian: La sorella di Fatima è ancora detenuta. 99
Translation: Fatima’s sister is still in custody.</p>
        <p>Another interesting factor to consider is that the model
classifies instances of type</p>
        <p>mother of as members of this
class, where we had instead set up a separate class to
include them, namely Identification through gender/ role .</p>
        <p>This could lead to two possible solutions. Either
introducing more instances of the latter class, so that the model
can correctly learn to distinguish between the two cases.</p>
        <p>
          Or we could restore the original class by [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] that included
Italian: [...] hanno denunciato anche la madre del
27enne e una donna di 52 anni che in cambio
di soldi accettava di portare a proprio nome la
refurtiva nei ’compro oro’ della zona. 100
Translation: [...] they reported also the mother of the
27 years old and another 52 years old woman,
who in exchange for money, agreed to take the
/milano_scarcerato_dopo_tre_mesi_i_genitori_di_fatima_la_
stolen goods under her own name to the local
gold exchange shops.
        </p>
        <p>Furthermore, by analysing the errors, we noticed that the
word compagna could potentially pose a problem since
it can mean both partner in a romantic relationship and
mate in a sports team. Hence, more focused examples
on this aspect might be needed to teach the model to
distinguish between these two usages.</p>
        <p>Italian: [...] Nadia Fanchini, solo undicesima al
traguardo dello slalom gigante a 3 secondi e un
decimo dalla compagna di squadra. 101
Translation: [...] Nadia Fanchini, who finished only
eleventh in the giant slalom, 3 and one-tenth
seconds after the teammate.</p>
        <p>Finally, the model correctly identified some instances of
this class in the part added by the coreference resolution
at the beginning of the sentence, that had however not
been annotated. This can be solved by adding the
annotation also for the coreference part or creating ad hoc
examples to teach the model not to consider the text in
that part of the sentence for the annotation. However,
this does not have any negative efect on the performance
of the model and can therefore be overlooked.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Edward</surname>
            <given-names>Sapir,</given-names>
          </string-name>
          <article-title>The status of linguistics as a science</article-title>
          ,
          <source>Language</source>
          <volume>5</volume>
          (
          <year>1929</year>
          )
          <fpage>207</fpage>
          -
          <lpage>214</lpage>
          . doi:
          <volume>10</volume>
          .1525/
          <fpage>9780520311893</fpage>
          -
          <lpage>004</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Benjamin</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Whorf</surname>
          </string-name>
          , Science and linguistics, volume
          <volume>234</volume>
          of Bobbs-Merrill
          <source>Reprint Series in the Social Sciences, Technology Review</source>
          , Indianapolis, IN, USA,
          <year>1940</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, Illia Polosukhin,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , pp.
          <fpage>6000</fpage>
          --
          <lpage>6010</lpage>
          . doi:
          <volume>10</volume>
          .5555/ 3295222.3295349.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alma</given-names>
            <surname>Sabatini</surname>
          </string-name>
          ,
          <article-title>Il sessismo nella lingua italiana, Presidenza del Consiglio dei Ministri e Commissione Nazionale per la Parita e le Pari Opportunità tra uomo e donna</article-title>
          , Rome,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>