=Paper=
{{Paper
|id=Vol-3878/42_main_long
|storemode=property
|title=Exploring YouTube Comments Reacting to Femicide News in Italian
|pdfUrl=https://ceur-ws.org/Vol-3878/42_main_long.pdf
|volume=Vol-3878
|authors=Chiara Ferrando,Marco Madeddu,Viviana Patti,Mirko Lai,Sveva Pasini,Giulia Telari,Beatrice Antola
|dblpUrl=https://dblp.org/rec/conf/clic-it/FerrandoMPLPTA24
}}
==Exploring YouTube Comments Reacting to Femicide News in Italian==
Exploring YouTube Comments Reacting to Femicide News
in Italian
Chiara Ferrando1,*,† , Marco Madeddu1,*,† , Beatrice Antola2 , Sveva Silvia Pasini3 , Giulia Telari3 ,
Mirko Lai4 and Viviana Patti1
1
Università di Torino, Italy
2
Università di Padova, Italy
3
Università di Pavia, Italy
4
Università del Piemonte Orientale, Italy
Abstract
In recent years, the Gender Based Violence (GBV) has become an important issue in modern society and a central topic in
different research areas due to its alarming spread. Several Natural Language Processing (NLP) studies, concerning Hate
Speech directed against women, have focused on misogynistic behaviours, slurs or incel communities. The main contribution
of our work is the creation of the first dataset on social media comments to GBV, in particular to a femicide event. Our dataset,
named GBV-Maltesi, contains 2,934 YouTube comments annotated following a new schema that we developed in order to
study GBV and misogyny with an intersectional approach. During the experimental phase, we trained models on different
corpora for binary misogyny detection and found that datasets that mostly include explicit expressions of misogyny are an
easier challenge, compared to more implicit forms of misogyny contained in GBV-Maltesi.
Warning: This paper contains examples of offensive content.
Keywords
Hate Speech, Misoginy Detection, Femicide, Social media, News, Responsibility framing
1. Introduction statistics become even more alarming when we consider
studies that show the correlation between misogynistic
Nowadays, the term Gender Based Violence (GBV) is online posts and GBV [4].
used to identify all forms of abuse based on gender hatred Like other countries, Italy is affected by GBV, with the
and sexist discrimination [1]. Scholars in social science national observatory managed by the “Non Una di Meno”
have defined as “rape culture” the society that normalizes association reporting 117 femicides in 2022, 120 in 2023
sexist behaviours: from more common occurrences like and more than 40 until June 20243 .
victim blaming, slut shaming and gender pay gap to the Several studies about Hate Speech (HS) directed to-
apex of violence with femicide [2]. While general vio- wards women often focus on developing taxonomies [5]
lent crimes decreased over time, GBV did not, alarming rather than investigating low resource subjects in com-
various bodies in modern society1 . A report from the EU putational linguistics like GBV. These works often gather
commission2 states that 31%, 5% and 43% of European corpora by keyword search of gender slurs [6], retrieving
women suffered respectively from physical, sexual and comments left on misogynistic spaces like incel blogs
psychological violence. Regarding the Internet sphere, a [5, 7] or considering messages directed towards popular
survey found that 73% of women journalists experienced women figures highly debated on social media [8].
online violence (threats, belittling, shaming,...) [3]. These As GBV is a broad topic, we want to clarify that we fo-
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
cus on GBV in Western societies, particularly in Italy. The
Dec 04 — 06, 2024, Pisa, Italy main goal of this project is to show what is the current
*
Corresponding authors. perception of femicides expressed through comments on
†
These authors contributed equally. social media, focusing on the specific case of Carol Mal-
$ chiara.ferrando@unito.it (C. Ferrando); tesi. We chose this femicide because the victim was a
marco.madeddu@unito.it (M. Madeddu); sex worker, meaning that she presented an intersectional
beatrice.antola@studenti.unipd.it (B. Antola);
svevasilvia.pasini01@universitadipavia.it (S. S. Pasini);
trait, and it was a popular case in the media, enabling
giulia.telari01@universitadipavia.it (G. Telari); mirko.lai@uniupo.it us to select enough material for the study. Further, we
(M. Lai); viviana.patti@unito.it (V. Patti) want to highlight how the socio-demographic character-
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). istics of the victims determine the way they are described
1
https://www.interno.gov.it/it/stampa-e-comunicazione/ and how this influences the perception of the news. For
dati-e-statistiche/omicidi-volontari-e-violenza-genere
2
https://commission.europa.eu/strategy-and-policy/
instance, victim’s features such as age, job, origin, skin
policies/justice-and-fundamental-rights/gender-equality/
3
gender-based-violence/what-gender-based-violence_en https://osservatorionazionale.nonunadimeno.net/anno/
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
color, nationality, religion have different weight and de- Misogyny has become a pervasive phenomenon,
termine the lesser or greater spread of the news [9]. To widespread in very different spheres and expressed in
overcome the cited issues in current literature, in this both explicit and implicit forms [5, 18]. For this reason,
research we considered the phenomenon by focusing on even in online conversation about a dramatic act such
users’ reactions in social media to news about femicides. as femicide, it is possible to find examples of veiled or
We collected YouTube comments in response to videos explicit hostility towards the victims. The femicide phe-
talking about a specific case. In order to overcome the nomenon has been studied from different points of view.
constraints of traditional sentiment analysis schemas, we Several studies focused on GBV representation in Italian
annotated the data following a new semantic grid that media [19, 20]. In 2020, Mandolini focused on the journal-
can be used as a standard for comments regarding GBV. istic narratives of femicide in newspapers by means of a
In the experimental phase of this work, we created qualitative discourse analysis on two specific case studies
models based on different Italian misogyny datasets (in- [21]. The researcher attempted to describe changes in
cluding ours). The goal of such experiments is to analyze attitudes in the portrayal of femicide, focusing on dis-
the different features of these corpora and what forms cursive strategies that (directly or indirectly) blame the
of misogyny are harder to detect. We performed both a victim and implicitly excuse the perpetrator, referring to
quantitative and qualitative analysis of the results. gender stereotypes and romantic love rhetoric.
In the next sections, we describe: related work on hate Other studies focused on the responsibility framing
speech and misogyny detection(Section 2), the annota- in femicides news, by conducting an experiment where
tion scheme and both a quantitative and qualitative anal- annotators rated excerpts from local newspapers on how
ysis of the dataset (Section 3), and the results obtained much responsibility was given to the perpetrator [22].
in our experiments (Section 4). Lastly, we present some As far as we know, there is only one line of work in NLP
conclusions and delineate possible future developments on GBV [23, 24, 25], which focuses on reader’s percep-
(Section 5). tion of femicide news headlines and analyses the percep-
tion of responsibility attributed to victim and perpetra-
tor; whereas, to our knowledge, there is no other study
2. Related Work analysing social media reactions to GBV cases.
In recent times, the creation and dissemination of hate
speech are increasingly pervasive on online platforms, 3. Dataset
making social media a fertile ground for hateful discus-
sions [10]. The escalation of offensive and abusive lan- 3.1. Corpus Background
guage, understood as content that discriminates a per-
son or group on the basis of specific characteristics such In a preliminary phase of our work, we conducted a
as ethnicity, gender, sexual orientation, and more has research on the femicide case of Sara Di Pietrantonio4 , 22
aroused considerable interest in various fields. In fact, years old, a white Italian student, from a wealthy family,
over the last decade, a large number of computational murdered by her ex boyfriend on May 2016 [21]. In this
methods involving NLP and Machine Learning have been preliminary research we set out to develop a corpus by
proposed for automatic online hate speech detection collecting Twitter users’ comments to femicide news on
[11, 12]. Most of prior works have mainly considered newspapers published online 5 . We created an annotation
hate speech as a classification task, by distinguishing scheme for the data corpus consisting of two layers: the
between hate and non-hate speech. Hate speech takes first focused on the dimensions of sentiment analysis
on different nuances depending on the target groups at and composed of three subtasks (subjectivity, polarity
which it is directed, i.e. depending on the specific features and irony), relevant for the detection of sentiment in
that the target group have in common. Moreover, in some social media [26]; the second focused on hate speech
cases, these traits may intersect with each other, leading detection, including labels for misogyny, aggressiveness
to different degrees of discrimination. This concept takes and its target. For more details on the annotation scheme
the name of intersectionality [13]. and corpus description, please read below Appendix A.
Among abusive languages, misogyny, considered as a Observing the results of the preliminary study, we
specific offensive language against women, has become discovered how the victim’s characteristics influence
a contemporary research topic [14]. In automatic hate the way newspapers present her femicide and users talk
speech detection field, the Automatic Misogyny Identi- about it on social media. In fact, analyzing Di Pietranto-
fication (AMI) [15] series of shared tasks launched in nio’s case, as she was a young, white, wealthy and Italian
EVALITA [6] and the SemEval-2019 HatEval challenge 4
https://www.agi.it/cronaca/news/2019-09-11/sara_di_
[16] have produced evaluation frameworks to identify pietrantonio_processo_tappe-6170806/
5
misogynous tweets in English, Italian and Spanish [17]. the dataset is available at https://github.com/madeddumarco/
GBV-Maltesi
student, we found very few examples of misogyny and, voluntarily participated to the project. The annotation
in most cases, the aggressiveness was directed against guidelines were decided with the annotators after a pi-
the perpetrator. Furthermore, the scheme was not con- lot study and a subsequent group discussion where the
sidered sufficiently suitable for bringing out important raters pointed out the main faults of the schema. Each
elements of femicide cases. In fact, the annotators ex- annotator analyzed all the comments according to the
pressed their difficulties caused by the scheme developed following guidelines:
as it was deficient and too simplistic to recognise com-
• Non classifiable: if the comment cannot be anal-
plex features of femicide events. In order to solve these
ysed because it is not written in Italian, because
issues, we decided to direct our efforts on another case
it consists only of emojis, because it is not com-
study in which the victim exhibits intersectionality traits,
prehensible or not relevant to the topic (any com-
which we assume may lead to more misogynistic content.
ment that was marked as NC from at least 1 an-
In addition, we developed new schema and guidelines
notator was removed from the corpus);
to have more accurate annotations specifically related to
• Empathy: whether, in the comment, there are ex-
the femicide domain.
pressions of empathy in support of the victim, her
family or the event in general (i.e., condolences);
3.2. Data Collection • Misogyny: whether, in the comment, there is a
In this section we provide a description of the new dataset presence of discriminatory expression against
built and the methodology used. women, including blaming, objectifying, discrim-
As mentioned above, we focused our research on the inatory and sexist practices used towards them
femicide of Carol Maltesi6 , a 26 years old, white Italian and their life choices. If misogyny is present, we
woman, mother and online sex worker, who was bru- asked annotators to indicate its target (group or
tally murdered in January 2022 by her ex partner, Davide individual) based on [16]. Moreover, we asked to
Fontana, a 44 years old white Italian bank employee. specify if the expressed misogyny contained in-
With the aim of collecting users’ responses to femicide, tersectionality traits and to select from a list what
we chose to collect comments using YouTube Data API, other dimensions were involved: age, religion,
as it is freely available and allows us to easily access com- job, nationality, skin color, class, sexual orien-
ments focused on specific news. The process of obtaining tation, gender, physical condition, educational
data followed several steps: first, we selected the 31 most background, language and culture;
popular YouTube videos based on number of views and • Aggressiveness: whether there is aggressiveness
comments. We chose videos about Maltesi femicide from in the comment and to whom it is directed (allow-
different types of sources: national (mainly the Italian ing multiple choices): victim, perpetrator, social
broadcaster RAI) and local news. The selection of videos network (family, friends, colleagues), media, rape
is diachronic spanning from March 2022 to June 2023; culture;
this was done because the various media channels cov- • Responsibility: if there is explicit attribution of
ered the story as it evolved starting from the discovery of responsibility for the murder in the text, state
the nameless body and ending with the sentence given who is blamed (allowing multiple choices): vic-
to the perpetrator. Afterwards, we collected comments tim, perpetrator, social network (family, friends,
from all the videos selected. Due to the API policy, we colleagues), media, rape culture;
were restricted to collect only first-level comments and • Humor: specify whether the text conveys humor-
at most 5 oldest responses to them. In total, we retrieved ous content through irony, sarcasm, word games
3,821 comments. or hyperbole;
• Macabre: specify whether there are macabre as-
pects detailing how the victim was killed;
3.3. Annotation Scheme
• Context: indicate whether the context was help-
From the previous experience of the Di Pietrantonio cor- ful to better understand the meaning of the com-
pus, we decided that a generic sentiment analysis schema ments;
proved to be too rigid to understand such a complex phe- • Notes: free space for suggestions, observations or
nomenon. We created an annotation scheme and a new doubts.
online platform to facilitate the raters work. We involved
5 annotators, 4 of them self-identified as women and 1 3.4. Dataset Analysis
as a man, all interested in the topic and mostly coming
7
from humanistic background. They were all students and The dataset, GBV-Maltesi , is composed of 2,934 com-
6
ments annotated on all categories by all annotators. We
https://www.agi.it/cronaca/news/2024-02-21/
omicidio-maltesi-condannato-ergastolo-ex-davide-fontana-25397937/ 7 https://github.com/madeddumarco/GBV-Maltesi
(a) Distribution of the misogyny label (b) Distribution of the aggressiveness (c) Distribution of the responsibility la-
and its subcategories label bel
Figure 1: Histograms for distributions of relevant labels
aggregated dimensions through majority voting. As our as they lacked ambiguity. On the other hand, we can see
schema is composed by many different labels, we will that aggressiveness towards the victim is much lower
focus only on the dimensions that we consider the most (0.28). In our discussions with the raters, it emerged how
relevant, but all statistics can be found in Appendix C. attacks towards the victim were harder to identify as
Starting from misogyny, in Appendix C and in Figure they were more subtle leading to disagreement among
1a, we can see that 9.03% of cases are positive. This un- annotators.
balance is typical of hate speech datasets [27] and we
consider it surprisingly high if we take into account the
tragic theme of GBV. It is very interesting that intersec- 4. Experiments
tionality represents over 50% of misogynous examples
We conducted experiments to validate our resource and
indicating how the personal traits of the victim affect
to gain more insight into the difficulty of the misogyny
the perception of the users commenting. Unsurprisingly,
detection task. The goal of this analysis is to understand
as the victim was a sex worker, ‘work’ is almost always
how the presence of different forms of misogyny (implicit
the category chosen by the annotators. The target of
and explicit) affect the evaluation of modern classifica-
misogyny was mostly individual, confirming the findings
tion models. We consider as explicit misogyny discourses
of SemEval-2019 Task 5 [16]. The annotators explained
that intentionally spread hate towards women mostly
to us how the misogyny target was a difficult category
through slurs and other aggressive behaviors. Mean-
to annotate as often comments used the victim as an
while, we intend implicit misogyny as more subtle and
example to offend the broader group of women and sex
less conscious practices like victim blaming, slut sham-
workers.
ing, de-responsibilization of the perpetrator and more. In
Aggressiveness is more present than misogyny in our
addition to our corpus, we used 3 other datasets regard-
dataset, with 24% positive examples mostly directed to-
ing the topic in Italian: AMI [6], PejorativITy [29] and
wards the perpetrator. Responsibility follows a similar
Inters8 [8]. The former two have been mainly gathered
trend with 32.89% positive examples most directed to-
by keyword search of sexist terms8 , meanwhile, Inters8
wards the perpetrator. Unlike aggressiveness, we can
and our corpus are focused on more implicit forms of
see a significant amount of comments holding the victim
sexist hate directed towards a specific woman (i.e., Silvia
responsible (6.55%).
Romano and Carol Maltesi). Details about all the datasets
In Appendix B, we reported the inter-annotator agree-
can be found in Appendix D.
ment (IAA) scores for all dimensions. As our dataset is
To explore the potential bias of models towards explicit
fully annotated by multiple people, the metric we chose
forms of misogyny, we created 4 different models for
is Fleiss’ Kappa [28]. The metric has a possible range of
binary misogyny detection: BERT-Maltesi, BERT-AMI,
[-1,1], with 1 indicating perfect agreement, and any value
BERT-PejorativITy and, BERT-Inters8 that were respec-
of 𝜅 ≤ 0 indicates more disagreement between the an-
tively trained on the GBV-Maltesi, AMI, PejorativITy and
notators than expected by chance. We can see that most
dimensions have a 𝜅 in the [0.2, 0.7] range, indicating
variable levels of agreement depending on the label. The 8
AMI is created following an hybrid approach selecting also com-
dimensions with the highest agreement at 0.69 are em- ments from known misogynistic accounts and responses directed
to feminist public figures. We conducted a qualitative analysis and
pathy towards the event and aggressiveness towards the
we found that the misogyny contained is almost always explicit
perpetrator. In fact, annotators explained to us that these and depending on slurs. This lead us to place it in the keyword
two categories were the easiest phenomena to annotate category.
Maltesi Test Intes8 Test PejorativITy Test AMI Test
Model
F1 Macro F1 1-Label F1 Macro F1 1-Label F1 Macro F1 1-Label F1 Macro F1 1-Label
BERT-Maltesi 0.611 0.351 0.512 0.174 0.571 0.436 0.633 0.611
BERT-Inters8 0.377 0.169 0.621 0.49 0.55 0.538 0.659 0.725
BERT-PejorativITy 0.528 0.226 0.483 0.128 0.67 0.604 0.675 0.732
BERT-AMI 0.494 0.155 0.59 0.299 0.654 0.601 0.877 0.886
Average 0.502 0.225 0.551 0.273 0.611 0.545 0.711 0.738
Table 1
Results for binary misogyny detection on all datasets
Inters8 datasets. The models were just trained on the com- most challenging datasets. This is especially true when
ments and were not given any other extra-information observing the average f1-score on the positive label with
such as video transcriptions. The only label we analyzed the score being in the [0.2, 0.3] range, compared to much
was misogyny and all datasets were divided in training, higher scores for PejorativITy and especially AMI. These
validation and, test sets following a 60%, 20% and, 20% trends indicate how misogyny detection is a much harder
split. We used the existing splits when provided in the task when considering datasets that contain less explicit
papers9 , else, we randomly created them. All models are forms of hate (e.g., not gathered by keyword search of
binary classifiers created by fine-tuning BERT [30], in sexist slurs).
particular we used the Italian version AlBERTo [31]. Due In addition, we conducted a qualitative analysis on the
to the imbalanced nature of most corpora, the models errors of the various classifiers. We found that for each
were trained with a focal loss [32] setting the hyperpa- test set most classifiers misclassified the same type of
rameter 𝛾 = 2. Models were trained for 5 epochs but, examples. Models almost never recognized texts which
to avoid overfitting, we implemented an early stopping contained victim blaming and slut shaming in the GBV-
function which ends training after 2 epochs that report Maltesi Dataset. The errors made on Inters8 mostly coin-
an increase in validation loss. We tested all models on cide with examples that are also racist and Islamophobic.
their own test set and the other 3 corpora. The cases which proved to be more difficult in Pejora-
We want to underline that our goal is not to compare tivITy and AMI contain less explicit animal epithets like
performance of the different models between each other “cavalla” and nouns that refers to sex worker in a less
as they have different number of training sets and positive explicit way like “cortigiana”.
examples. Rather, we intend to focus on how different test
sets are more difficult compared to others which helps
us understand what the current challenges in misogyny 5. Conclusion and Future Works
detection are.
In this paper, we presented GBV-Maltesi which is the
In Table 1, we reported the positive label and the macro
first dataset regarding social reactions to GBV, in partic-
average f1-scores of all experiments. In addition, we also
ular to a femicide case. The topic was chosen to shed
calculated the average scores for each test set. The best
light on the importance of having misogyny corpora
scores achieved on a certain test set are in bold, mean-
that include forms of sexism that are more implicit and
while, we underlined the best scores for cross-dataset
complicated to detect compared to the existing ones
testing. As expected, we can observe that all models had
that focus on slurs and offensive terms. We also fo-
the highest score for their own set. Meanwhile, for cross-
cused on the intersectionality aspects to better explore
dataset testing, we can see that the models that tend to
online hate. GBV-Maltesi is composed of 2,934 com-
perform the best are BERT-PejorativITy and BERT-AMI.
ments all annotated by 5 annotators and it is available at
We suspect that this is caused by the dataset composi-
https://github.com/madeddumarco/GBV-Maltesi. In or-
tion as their training sets present more positive examples
der to overcome limitations of generic semantic schema,
compared to the others.
the corpus has been annotated following a new schema
Interestingly, we can observe that certain models
specifically created for cases of GBV. In the experimental
recorded higher scores on other test sets that were not
phase of our work, we created different misogyny binary
their own. This mostly happens when focusing on BERT-
classifiers and tested them in a cross-dataset way. We
Maltesi and BERT-Inters8, which record higher scores on
found that datasets gathered on keyword collection are
AMI and PejorativITy. Even PejorativITy increseases its
easier benchmarks as the model showed bias towards
scores when tested on AMI. Observing the average scores
slurs and not identifying more implicit cases of misog-
for each test, we can see that Maltesi and Inters8 are the
yny. This research on online discourse about GBV is
9
PejorativITy provides a training and test split, but analyzing the not meant to be exhaustive, as several questions are still
code we found that the test set was used as a validation set so we open.
decided to create a new one.
As future works, we intend to focus on how different References
framing of news can cause different online reactions, an-
alyzing the differences between video transcripts of femi- [1] M. L. Bonura, Che genere di violenza: conoscere
cide news and the comments collected, in terms of words e affrontare la violenza contro le donne, Edizioni
used, implicit references, attributions of guilt and descrip- Centro Studi Erickson, 2018.
tions of the people involved in the story. We also intend [2] C. Vagnoli, Maledetta sfortuna, Rizzoli, 2021.
to gather more annotated corpora regarding femicides [3] J. Posetti, K. Bontcheva, D. Maynard, N. Aboulez,
to explore how other characteristics of the victim (e.g., A. Lu, B. Gardiner, S. Torsner, J. Harrison,
origin or skin color) and time of the murder differently G. Daniels, F. Chawana, O. Douglas, A. Willis,
influence the online reactions. In this regard, we intend F. Martin, L. Barcia, A. Jehangir, J. Price, G. Gober,
to explore the question by investigating whether and J. Adams, N. Shabbir, The Chilling: A global study
how the discourse on misogyny changes depending on of online violence against women journalists, 2022.
whether it is addressed to living or dead women (i.e., Giu- [4] K. R. Blake, S. M. O’Dean, J. Lian, T. F. Denson,
lia Cecchettin femicide and abusive discourse against her Misogynistic tweets correlate with violence against
sister, Elena Cecchettin). Lastly, we would like to extend women, Psychological science 32 (2021) 315–325.
our research by following an intersectional approach, [5] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson,
considering all the dimensions and characteristics that H. Margetts, An expert annotated dataset for the
make up the identity of both victim and perpetrator. To detection of online misogyny, in: P. Merlo, J. Tiede-
conclude, we strongly advocate the importance of write mann, R. Tsarfaty (Eds.), Proceedings of the 16th
the news correctly, as this has deep consequences on the Conference of the European Chapter of the Associ-
readers’ perception and the way they talk about it. ation for Computational Linguistics: Main Volume,
Association for Computational Linguistics, On-
line, 2021, pp. 1336–1350. URL: https://aclanthology.
Ethics Statement org/2021.eacl-main.114. doi:10.18653/v1/2021.
eacl-main.114.
The dataset was created in accordance with YouTube’s [6] E. Fersini, D. Nozza, P. Rosso, Overview of the
Terms of Service. Considering the large number of users evalita 2018 task on automatic misogyny identifica-
writing comments collected in the dataset, it was not tion (ami), in: EVALITA@CLiC-it, 2018. URL: https:
possible to explicitly ask for their consent. No sensitive //api.semanticscholar.org/CorpusID:56483156.
data are provided in the dataset and users’ mentions have [7] S. Gemelli, G. Minnema, Manosphrames: explor-
been anonymized to protect their privacy. ing an Italian incel community through the lens
All the annotators involved in this research were free of NLP and frame semantics, in: P. Sommerauer,
to participate without pressure or obligation. From the T. Caselli, M. Nissim, L. Remijnse, P. Vossen (Eds.),
initial stages, they were aware of being free to leave at Proceedings of the First Workshop on Reference,
any time without negative consequences. During the Framing, and Perspective @ LREC-COLING 2024,
annotation phase, we met several times to make sure that ELRA and ICCL, Torino, Italia, 2024, pp. 28–39. URL:
the topic did not disturb them psychologically or emo- https://aclanthology.org/2024.rfp-1.4.
tionally. We informed them to take their time, doing the [8] I. Spada, M. Lai, V. Patti, Inters8: A corpus to
annotation only when they felt like it and to contact us study misogyny and intersectionality on twitter.,
for support. This approach continued for all the research in: CLiC-it, 2023.
stages. [9] P. Lalli, L’amore non uccide. Femminicidio e dis-
corso pubblico: cronaca, tribunali, politiche, Il
Acknowledgements Mulino, 2020.
[10] A. Tontodimamma, E. Nissi, A. Sarra, L. Fontanella,
We would like to thank Chiara Zanchi for discussing with Thirty years of research into hate speech: topics
us the direction of this work in its early stages. In addi- of interest and their evolution, Scientometrics 126
tion, we would like to thank Sara Gemelli and Andrea (2021) 157–179.
Marra for their contribution to the creation of the anno- [11] A. Ollagnier, E. Cabrio, S. Villata, Unsupervised
tation scheme and guidelines. Also, we reiterate our grat- fine-grained hate speech target community detec-
itude to the annotators who professionally worked on a tion and characterisation on social media, Social
difficult topic like GBV. This work was also partially sup- Network Analysis and Mining 13 (2023) 58.
ported by “HARMONIA” project - M4-C2, I1.3 Partenar- [12] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco,
iati Estesi - Cascade Call - FAIR - CUP C63C22000770006 - V. Patti, Resources and benchmark corpora
PE PE0000013 under the NextGenerationEU programme. for hate speech detection: a systematic review,
Lang. Resour. Evaluation 55 (2021) 477–523. URL:
https://doi.org/10.1007/s10579-020-09502-8. doi:10. NLP in italian: Analyzing responsibility framing in
1007/S10579-020-09502-8. femicide news reports, in: E. Fersini, M. Passarotti,
[13] K. W. Crenshaw, Mapping the margins: Inter- V. Patti (Eds.), Proceedings of the Eighth Italian
sectionality, identity politics, and violence against Conference on Computational Linguistics, CLiC-it
women of color, in: The public nature of private 2021, Milan, Italy, January 26-28, 2022, volume 3033
violence, Routledge, 2013, pp. 93–118. of CEUR Workshop Proceedings, CEUR-WS.org, 2021.
[14] K. Manne, Down Girl: The Logic of Misogyny, URL: https://ceur-ws.org/Vol-3033/paper32.pdf.
Oxford University Press, 2018. URL: https://books. [24] G. Minnema, S. Gemelli, C. Zanchi, T. Caselli,
google.it/books?id=LqPoAQAACAAJ. M. Nissim, Dead or murdered? predicting respon-
[15] E. W. Pamungkas, A. T. Cignarella, V. Basile, V. Patti, sibility perception in femicide news reports, in:
et al., Automatic identification of misogyny in en- Y. He, H. Ji, S. Li, Y. Liu, C.-H. Chang (Eds.), Pro-
glish and italian tweets at evalita 2018 with a multi- ceedings of the 2nd Conference of the Asia-Pacific
lingual hate lexicon, in: CEUR Workshop Proceed- Chapter of the Association for Computational Lin-
ings, volume 2263, CEUR-WS, 2018, pp. 1–6. guistics and the 12th International Joint Confer-
[16] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. ence on Natural Language Processing (Volume 1:
Rangel Pardo, P. Rosso, M. Sanguinetti, SemEval- Long Papers), Association for Computational Lin-
2019 task 5: Multilingual detection of hate speech guistics, Online only, 2022, pp. 1078–1090. URL:
against immigrants and women in Twitter, in: https://aclanthology.org/2022.aacl-main.79.
J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidi- [25] G. Minnema, H. Lai, B. Muscato, M. Nis-
anaki, S. M. Mohammad (Eds.), Proceedings of the sim, Responsibility perspective transfer for Ital-
13th International Workshop on Semantic Evalu- ian femicide news, in: A. Rogers, J. Boyd-
ation, Association for Computational Linguistics, Graber, N. Okazaki (Eds.), Findings of the As-
Minneapolis, Minnesota, USA, 2019, pp. 54–63. URL: sociation for Computational Linguistics: ACL
https://aclanthology.org/S19-2007. doi:10.18653/ 2023, Association for Computational Linguistics,
v1/S19-2007. Toronto, Canada, 2023, pp. 7907–7918. URL: https:
[17] E. W. Pamungkas, V. Basile, V. Patti, Misogyny de- //aclanthology.org/2023.findings-acl.501. doi:10.
tection in twitter: a multilingual and cross-domain 18653/v1/2023.findings-acl.501.
study, Inf. Process. Manag. 57 (2020) 102360. URL: [26] V. Basile, N. Novielli, D. Croce, F. Barbieri, M. Nis-
https://doi.org/10.1016/j.ipm.2020.102360. doi:10. sim, V. Patti, Sentiment polarity classification at
1016/J.IPM.2020.102360. evalita: Lessons learned and open challenges, IEEE
[18] P. Zeinert, N. Inie, L. Derczynski, Annotating on- Transactions on Affective Computing 12 (2021) 466–
line misogyny, in: Proceedings of the 59th Annual 478.
Meeting of the Association for Computational Lin- [27] B. Vidgen, L. Derczynski, Directions in abusive lan-
guistics and the 11th International Joint Conference guage training data, a systematic review: Garbage
on Natural Language Processing (Volume 1: Long in, garbage out, Plos one 15 (2020) e0243300.
Papers), 2021, pp. 3181–3197. [28] J. Fleiss, Measuring nominal scale agreement
[19] F. Formato, Gender, discourse and ideology in Ital- among many raters, Psychological Bulletin 76
ian, Springer, 2019. (1971) 378–. doi:10.1037/h0031619.
[20] L. Busso, C. R. Combei, O. Tordini, Narrating gender [29] A. Muti, F. Ruggeri, C. Toraman, L. Musetti,
violence a corpus-based study on the representa- S. Algherini, S. Ronchi, G. Saretto, C. Zapparoli,
tion of gender-based violence in italian media, in: A. Barrón-Cedeño, Pejorativity: Disambiguating
Language, Gender and Hate Speech: A Multidisci- pejorative epithets to improve misogyny detection
plinary Approach, 2020. in italian tweets, arXiv preprint arXiv:2404.02681
[21] N. Mandolini, Femminicidio, prima e dopo. (2024).
un’analisi qualitativa della copertura giornalistica [30] J. Devlin, M. Chang, K. Lee, K. Toutanova,
dei casi stefania noce (2011) e sara di pietrantonio BERT: pre-training of deep bidirectional trans-
(2016), Problemi dell’informazione 45 (2020) 247– formers for language understanding, CoRR
277. abs/1810.04805 (2018). URL: http://arxiv.org/abs/
[22] E. Pinelli, C. Zanchi, Gender-Based Violence in 1810.04805. arXiv:1810.04805.
Italian Local Newspapers: How Argument Struc- [31] M. Polignano, P. Basile, M. de Gemmis, G. Semer-
ture Constructions Can Diminish a Perpetrator’s aro, V. Basile, AlBERTo: Italian BERT Language
Responsibility, 2021, pp. 117–143. doi:10.1007/ Understanding Model for NLP Challenging Tasks
978-3-030-70091-1_6. Based on Tweets, in: Proceedings of the Sixth
[23] G. Minnema, S. Gemelli, C. Zanchi, V. Patti, Italian Conference on Computational Linguistics
T. Caselli, M. Nissim, Frame semantics for social (CLiC-it 2019), volume 2481, CEUR, 2019. URL:
Dimension Yes % No % Dimension Fleiss’ kappa
Subjectivity 70.48% 29.52% Misoginy 0.56
Misogyny 3.76% 96.24% Target 0.48
Polarity-Negative 51.89% 48.11% Intersectionality 0.32
Polarity-Positive 4.93% 95.07% Aggressiveness 0.53
Aggressiveness 24.02% 75.98% Agg. Perpetrator 0.69
Irony 7.09% 92.91% Agg. Victim 0.28
Context 81.48% 18.52% Agg. Social Network 0.23
Agg. Media 0.40
Table 2 Agg. Rape Culture 0.10
Distribution of the dimensions for the DiPietrantonio Dataset Responsibility 0.21
Resp. Perpatrator 0.25
Resp. Victim 0.55
https://www.scopus.com/inward/record.uri? Resp. Social Network 0.13
eid=2-s2.0-85074851349&partnerID=40&md5= Resp. Media 0.23
7abed946e06f76b3825ae5e294ffac14. Resp. Rape Culture 0.19
Empathy towards the event 0.69
[32] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Fo-
Humor 0.45
cal loss for dense object detection, in: Proceedings Macabre 0.49
of the IEEE international conference on computer Context -0.11
vision, 2017, pp. 2980–2988.
[33] F. Barbieri, V. Basile, D. Croce, M. Nissim, Table 3
N. Novielli, V. Patti, et al., Overview of the evalita Agreement of the Maltesi Dataset
2016 sentiment polarity classification task, in:
CEUR Workshop Proceedings, volume 1749, CEUR- Dimension Yes % No %
WS, 2016.
Misoginy 9.03% 90.97%
Intersectionality 4.63% 95.36%
Aggressiveness 24% 76%
A. Details about the Di Agg. Perpetrator 19.19% 80.81%
Agg. Victim 1.23% 98.77%
Pietrantonio Dataset Agg. Social Network 0.88% 99.11%
Agg. Media 2.73% 97.27%
The dataset GBV-DiPietrantonio is composed of 691 Agg. Rape Culture 0.41% 99.59%
tweets fully annotated by 3 annotators, 2 of which self- Responsibility 32.89% 67.11%
identified as women and 1 as a man. The tweets were Resp. Perpetrator 22.09% 77.91%
collected by gathering responses to news which covered Resp. Victim 6.55% 93.45%
the news of Di Pietrantonio femicide. The annotation Resp. Social Network 2.11% 97.89%
scheme is composed of the slightly modified SENTIPOLC Resp. Media 99.01% 0.99%
scheme[33] which consists of Subjectivity, Polarity (Posi- Resp. Rape Culture 4.06% 95.94%
tive, Negative) and Irony. In addition the semantic grid Empathy towards the event 28.25% 71.75%
Humor 3.14% 96.86%
contained Misogyny, Aggressiveness and Target of Ag-
Macabre 3.27% 96.72%
gressiveness (towards Perpetrator, Victim, Other), Con- Context 97.51% 2.49%
text, and Notes.
The statistics of the gold standard for the Di Pietranto- Table 4
nio dataset are in Table 2. Distribution of the binary dimensions of the Maltesi Dataset
B. Agreement of the Maltesi C. Distributions of the Maltesi
Dataset Dataset
Table 3 contains the agreement values calcolated with
Table 4 contains the distribution of the binary labels in
Fleiss’ Kappa for all dimensions in the Maltesi dataset.
the Maltesi dataset. Table 5 contains the type of inter-
sectionality and table 6 contains the type of misogyny
target.
Dimension Percentage %
Work 96.32%
Age 0.73%
Work and Education 0.73%
Work and Gender 2.20%
Table 5
Distribution of the values for the types of intersectionality
selected
Dimension Percentage %
Individual 63.40%
Grooup 36.60%
Table 6
Distribution of the values for the types of misogyny target
selected
D. Distributions of the Misogyny
Dataset
Table 7 contains the details of the other existing misogyny
datasets used in the experimental phase.
Dataset Topic Num Examples Num Pos. Pos. %
Intersectional Hate focusing on
Inters8 Islamophobia in the case of hate towards 1,500 288 19.2%
Silvia Romano
Misogynistic slurs, attacks towards
important figures who expressed support
AMI 5,000 2,340 46.8%
for women rights and posts from
misogynistic account
Words that can be used as misogynistic
Pejorativity pejoratives in online discussion (e.g. 1,200 397 33%
Cavalla, cagna,...)
Table 7
Distribution of the Italian misogyny Dataset