Exploring YouTube Comments Reacting to Femicide News
                                in Italian
                                Chiara Ferrando1,*,† , Marco Madeddu1,*,† , Beatrice Antola2 , Sveva Silvia Pasini3 , Giulia Telari3 ,
                                Mirko Lai4 and Viviana Patti1
                                1
                                  Università di Torino, Italy
                                2
                                  Università di Padova, Italy
                                3
                                  Università di Pavia, Italy
                                4
                                  Università del Piemonte Orientale, Italy


                                                Abstract
                                                In recent years, the Gender Based Violence (GBV) has become an important issue in modern society and a central topic in
                                                different research areas due to its alarming spread. Several Natural Language Processing (NLP) studies, concerning Hate
                                                Speech directed against women, have focused on misogynistic behaviours, slurs or incel communities. The main contribution
                                                of our work is the creation of the first dataset on social media comments to GBV, in particular to a femicide event. Our dataset,
                                                named GBV-Maltesi, contains 2,934 YouTube comments annotated following a new schema that we developed in order to
                                                study GBV and misogyny with an intersectional approach. During the experimental phase, we trained models on different
                                                corpora for binary misogyny detection and found that datasets that mostly include explicit expressions of misogyny are an
                                                easier challenge, compared to more implicit forms of misogyny contained in GBV-Maltesi.
                                                Warning: This paper contains examples of offensive content.

                                                Keywords
                                                Hate Speech, Misoginy Detection, Femicide, Social media, News, Responsibility framing


                                1. Introduction                                                                                         statistics become even more alarming when we consider
                                                                                                                                        studies that show the correlation between misogynistic
                                Nowadays, the term Gender Based Violence (GBV) is online posts and GBV [4].
                                used to identify all forms of abuse based on gender hatred Like other countries, Italy is affected by GBV, with the
                                and sexist discrimination [1]. Scholars in social science national observatory managed by the “Non Una di Meno”
                                have defined as “rape culture” the society that normalizes association reporting 117 femicides in 2022, 120 in 2023
                                sexist behaviours: from more common occurrences like and more than 40 until June 20243 .
                                victim blaming, slut shaming and gender pay gap to the                                                     Several studies about Hate Speech (HS) directed to-
                                apex of violence with femicide [2]. While general vio- wards women often focus on developing taxonomies [5]
                                lent crimes decreased over time, GBV did not, alarming rather than investigating low resource subjects in com-
                                various bodies in modern society1 . A report from the EU putational linguistics like GBV. These works often gather
                                commission2 states that 31%, 5% and 43% of European corpora by keyword search of gender slurs [6], retrieving
                                women suffered respectively from physical, sexual and comments left on misogynistic spaces like incel blogs
                                psychological violence. Regarding the Internet sphere, a [5, 7] or considering messages directed towards popular
                                survey found that 73% of women journalists experienced women figures highly debated on social media [8].
                                online violence (threats, belittling, shaming,...) [3]. These                                              As GBV is a broad topic, we want to clarify that we fo-
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
                                                                                                                                        cus  on GBV in Western societies, particularly in Italy. The
                                Dec 04 — 06, 2024, Pisa, Italy                                                                          main goal of this project is to show what is the current
                                *
                                  Corresponding authors.                                                                                perception of femicides expressed through comments on
                                †
                                  These authors contributed equally.                                                                    social media, focusing on the specific case of Carol Mal-
                                $ chiara.ferrando@unito.it (C. Ferrando);                                                               tesi. We chose this femicide because the victim was a
                                marco.madeddu@unito.it (M. Madeddu);                                                                    sex worker, meaning that she presented an intersectional
                                beatrice.antola@studenti.unipd.it (B. Antola);
                                svevasilvia.pasini01@universitadipavia.it (S. S. Pasini);
                                                                                                                                        trait, and it was a popular case in the media, enabling
                                giulia.telari01@universitadipavia.it (G. Telari); mirko.lai@uniupo.it us to select enough material for the study. Further, we
                                (M. Lai); viviana.patti@unito.it (V. Patti)                                                             want to highlight how the socio-demographic character-
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                           Attribution 4.0 International (CC BY 4.0).                                                   istics of the victims determine the way they are described
                                1
                                  https://www.interno.gov.it/it/stampa-e-comunicazione/                                                 and how this influences the perception of the news. For
                                  dati-e-statistiche/omicidi-volontari-e-violenza-genere
                                2
                                  https://commission.europa.eu/strategy-and-policy/
                                                                                                                                        instance, victim’s features such as age, job, origin, skin
                                    policies/justice-and-fundamental-rights/gender-equality/
                                                                                                                   3
                                    gender-based-violence/what-gender-based-violence_en                                https://osservatorionazionale.nonunadimeno.net/anno/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
color, nationality, religion have different weight and de-         Misogyny has become a pervasive phenomenon,
termine the lesser or greater spread of the news [9]. To        widespread in very different spheres and expressed in
overcome the cited issues in current literature, in this        both explicit and implicit forms [5, 18]. For this reason,
research we considered the phenomenon by focusing on            even in online conversation about a dramatic act such
users’ reactions in social media to news about femicides.       as femicide, it is possible to find examples of veiled or
We collected YouTube comments in response to videos             explicit hostility towards the victims. The femicide phe-
talking about a specific case. In order to overcome the         nomenon has been studied from different points of view.
constraints of traditional sentiment analysis schemas, we       Several studies focused on GBV representation in Italian
annotated the data following a new semantic grid that           media [19, 20]. In 2020, Mandolini focused on the journal-
can be used as a standard for comments regarding GBV.           istic narratives of femicide in newspapers by means of a
   In the experimental phase of this work, we created           qualitative discourse analysis on two specific case studies
models based on different Italian misogyny datasets (in-        [21]. The researcher attempted to describe changes in
cluding ours). The goal of such experiments is to analyze       attitudes in the portrayal of femicide, focusing on dis-
the different features of these corpora and what forms          cursive strategies that (directly or indirectly) blame the
of misogyny are harder to detect. We performed both a           victim and implicitly excuse the perpetrator, referring to
quantitative and qualitative analysis of the results.           gender stereotypes and romantic love rhetoric.
   In the next sections, we describe: related work on hate         Other studies focused on the responsibility framing
speech and misogyny detection(Section 2), the annota-           in femicides news, by conducting an experiment where
tion scheme and both a quantitative and qualitative anal-       annotators rated excerpts from local newspapers on how
ysis of the dataset (Section 3), and the results obtained       much responsibility was given to the perpetrator [22].
in our experiments (Section 4). Lastly, we present some         As far as we know, there is only one line of work in NLP
conclusions and delineate possible future developments          on GBV [23, 24, 25], which focuses on reader’s percep-
(Section 5).                                                    tion of femicide news headlines and analyses the percep-
                                                                tion of responsibility attributed to victim and perpetra-
                                                                tor; whereas, to our knowledge, there is no other study
2. Related Work                                                 analysing social media reactions to GBV cases.
In recent times, the creation and dissemination of hate
speech are increasingly pervasive on online platforms,          3. Dataset
making social media a fertile ground for hateful discus-
sions [10]. The escalation of offensive and abusive lan-        3.1. Corpus Background
guage, understood as content that discriminates a per-
son or group on the basis of specific characteristics such      In a preliminary phase of our work, we conducted a
as ethnicity, gender, sexual orientation, and more has          research on the femicide case of Sara Di Pietrantonio4 , 22
aroused considerable interest in various fields. In fact,       years old, a white Italian student, from a wealthy family,
over the last decade, a large number of computational           murdered by her ex boyfriend on May 2016 [21]. In this
methods involving NLP and Machine Learning have been            preliminary research we set out to develop a corpus by
proposed for automatic online hate speech detection             collecting Twitter users’ comments to femicide news on
[11, 12]. Most of prior works have mainly considered            newspapers published online 5 . We created an annotation
hate speech as a classification task, by distinguishing         scheme for the data corpus consisting of two layers: the
between hate and non-hate speech. Hate speech takes             first focused on the dimensions of sentiment analysis
on different nuances depending on the target groups at          and composed of three subtasks (subjectivity, polarity
which it is directed, i.e. depending on the specific features   and irony), relevant for the detection of sentiment in
that the target group have in common. Moreover, in some         social media [26]; the second focused on hate speech
cases, these traits may intersect with each other, leading      detection, including labels for misogyny, aggressiveness
to different degrees of discrimination. This concept takes      and its target. For more details on the annotation scheme
the name of intersectionality [13].                             and corpus description, please read below Appendix A.
   Among abusive languages, misogyny, considered as a              Observing the results of the preliminary study, we
specific offensive language against women, has become           discovered how the victim’s characteristics influence
a contemporary research topic [14]. In automatic hate           the way newspapers present her femicide and users talk
speech detection field, the Automatic Misogyny Identi-          about it on social media. In fact, analyzing Di Pietranto-
fication (AMI) [15] series of shared tasks launched in          nio’s case, as she was a young, white, wealthy and Italian
EVALITA [6] and the SemEval-2019 HatEval challenge              4
                                                                  https://www.agi.it/cronaca/news/2019-09-11/sara_di_
[16] have produced evaluation frameworks to identify              pietrantonio_processo_tappe-6170806/
                                                                5
misogynous tweets in English, Italian and Spanish [17].           the dataset is available at https://github.com/madeddumarco/
                                                                  GBV-Maltesi
student, we found very few examples of misogyny and,            voluntarily participated to the project. The annotation
in most cases, the aggressiveness was directed against          guidelines were decided with the annotators after a pi-
the perpetrator. Furthermore, the scheme was not con-           lot study and a subsequent group discussion where the
sidered sufficiently suitable for bringing out important        raters pointed out the main faults of the schema. Each
elements of femicide cases. In fact, the annotators ex-         annotator analyzed all the comments according to the
pressed their difficulties caused by the scheme developed       following guidelines:
as it was deficient and too simplistic to recognise com-
                                                                      • Non classifiable: if the comment cannot be anal-
plex features of femicide events. In order to solve these
                                                                        ysed because it is not written in Italian, because
issues, we decided to direct our efforts on another case
                                                                        it consists only of emojis, because it is not com-
study in which the victim exhibits intersectionality traits,
                                                                        prehensible or not relevant to the topic (any com-
which we assume may lead to more misogynistic content.
                                                                        ment that was marked as NC from at least 1 an-
In addition, we developed new schema and guidelines
                                                                        notator was removed from the corpus);
to have more accurate annotations specifically related to
                                                                      • Empathy: whether, in the comment, there are ex-
the femicide domain.
                                                                        pressions of empathy in support of the victim, her
                                                                        family or the event in general (i.e., condolences);
3.2. Data Collection                                                  • Misogyny: whether, in the comment, there is a
In this section we provide a description of the new dataset             presence of discriminatory expression against
built and the methodology used.                                         women, including blaming, objectifying, discrim-
   As mentioned above, we focused our research on the                   inatory and sexist practices used towards them
femicide of Carol Maltesi6 , a 26 years old, white Italian              and their life choices. If misogyny is present, we
woman, mother and online sex worker, who was bru-                       asked annotators to indicate its target (group or
tally murdered in January 2022 by her ex partner, Davide                individual) based on [16]. Moreover, we asked to
Fontana, a 44 years old white Italian bank employee.                    specify if the expressed misogyny contained in-
   With the aim of collecting users’ responses to femicide,             tersectionality traits and to select from a list what
we chose to collect comments using YouTube Data API,                    other dimensions were involved: age, religion,
as it is freely available and allows us to easily access com-           job, nationality, skin color, class, sexual orien-
ments focused on specific news. The process of obtaining                tation, gender, physical condition, educational
data followed several steps: first, we selected the 31 most             background, language and culture;
popular YouTube videos based on number of views and                   • Aggressiveness: whether there is aggressiveness
comments. We chose videos about Maltesi femicide from                   in the comment and to whom it is directed (allow-
different types of sources: national (mainly the Italian                ing multiple choices): victim, perpetrator, social
broadcaster RAI) and local news. The selection of videos                network (family, friends, colleagues), media, rape
is diachronic spanning from March 2022 to June 2023;                    culture;
this was done because the various media channels cov-                 • Responsibility: if there is explicit attribution of
ered the story as it evolved starting from the discovery of             responsibility for the murder in the text, state
the nameless body and ending with the sentence given                    who is blamed (allowing multiple choices): vic-
to the perpetrator. Afterwards, we collected comments                   tim, perpetrator, social network (family, friends,
from all the videos selected. Due to the API policy, we                 colleagues), media, rape culture;
were restricted to collect only first-level comments and              • Humor: specify whether the text conveys humor-
at most 5 oldest responses to them. In total, we retrieved              ous content through irony, sarcasm, word games
3,821 comments.                                                         or hyperbole;
                                                                      • Macabre: specify whether there are macabre as-
                                                                        pects detailing how the victim was killed;
3.3. Annotation Scheme
                                                                      • Context: indicate whether the context was help-
From the previous experience of the Di Pietrantonio cor-                ful to better understand the meaning of the com-
pus, we decided that a generic sentiment analysis schema                ments;
proved to be too rigid to understand such a complex phe-              • Notes: free space for suggestions, observations or
nomenon. We created an annotation scheme and a new                      doubts.
online platform to facilitate the raters work. We involved
5 annotators, 4 of them self-identified as women and 1 3.4. Dataset Analysis
as a man, all interested in the topic and mostly coming
                                                                                     7
from humanistic background. They were all students and The dataset, GBV-Maltesi , is composed of 2,934 com-
6
                                                           ments annotated on all categories by all annotators. We
https://www.agi.it/cronaca/news/2024-02-21/
omicidio-maltesi-condannato-ergastolo-ex-davide-fontana-25397937/ 7 https://github.com/madeddumarco/GBV-Maltesi
(a) Distribution of the misogyny label     (b) Distribution of the aggressiveness           (c) Distribution of the responsibility la-
    and its subcategories                      label                                            bel

Figure 1: Histograms for distributions of relevant labels


aggregated dimensions through majority voting. As our          as they lacked ambiguity. On the other hand, we can see
schema is composed by many different labels, we will           that aggressiveness towards the victim is much lower
focus only on the dimensions that we consider the most         (0.28). In our discussions with the raters, it emerged how
relevant, but all statistics can be found in Appendix C.       attacks towards the victim were harder to identify as
   Starting from misogyny, in Appendix C and in Figure         they were more subtle leading to disagreement among
1a, we can see that 9.03% of cases are positive. This un-      annotators.
balance is typical of hate speech datasets [27] and we
consider it surprisingly high if we take into account the
tragic theme of GBV. It is very interesting that intersec-     4. Experiments
tionality represents over 50% of misogynous examples
                                                               We conducted experiments to validate our resource and
indicating how the personal traits of the victim affect
                                                               to gain more insight into the difficulty of the misogyny
the perception of the users commenting. Unsurprisingly,
                                                               detection task. The goal of this analysis is to understand
as the victim was a sex worker, ‘work’ is almost always
                                                               how the presence of different forms of misogyny (implicit
the category chosen by the annotators. The target of
                                                               and explicit) affect the evaluation of modern classifica-
misogyny was mostly individual, confirming the findings
                                                               tion models. We consider as explicit misogyny discourses
of SemEval-2019 Task 5 [16]. The annotators explained
                                                               that intentionally spread hate towards women mostly
to us how the misogyny target was a difficult category
                                                               through slurs and other aggressive behaviors. Mean-
to annotate as often comments used the victim as an
                                                               while, we intend implicit misogyny as more subtle and
example to offend the broader group of women and sex
                                                               less conscious practices like victim blaming, slut sham-
workers.
                                                               ing, de-responsibilization of the perpetrator and more. In
   Aggressiveness is more present than misogyny in our
                                                               addition to our corpus, we used 3 other datasets regard-
dataset, with 24% positive examples mostly directed to-
                                                               ing the topic in Italian: AMI [6], PejorativITy [29] and
wards the perpetrator. Responsibility follows a similar
                                                               Inters8 [8]. The former two have been mainly gathered
trend with 32.89% positive examples most directed to-
                                                               by keyword search of sexist terms8 , meanwhile, Inters8
wards the perpetrator. Unlike aggressiveness, we can
                                                               and our corpus are focused on more implicit forms of
see a significant amount of comments holding the victim
                                                               sexist hate directed towards a specific woman (i.e., Silvia
responsible (6.55%).
                                                               Romano and Carol Maltesi). Details about all the datasets
   In Appendix B, we reported the inter-annotator agree-
                                                               can be found in Appendix D.
ment (IAA) scores for all dimensions. As our dataset is
                                                                  To explore the potential bias of models towards explicit
fully annotated by multiple people, the metric we chose
                                                               forms of misogyny, we created 4 different models for
is Fleiss’ Kappa [28]. The metric has a possible range of
                                                               binary misogyny detection: BERT-Maltesi, BERT-AMI,
[-1,1], with 1 indicating perfect agreement, and any value
                                                               BERT-PejorativITy and, BERT-Inters8 that were respec-
of 𝜅 ≤ 0 indicates more disagreement between the an-
                                                               tively trained on the GBV-Maltesi, AMI, PejorativITy and
notators than expected by chance. We can see that most
dimensions have a 𝜅 in the [0.2, 0.7] range, indicating
variable levels of agreement depending on the label. The       8
                                                                   AMI is created following an hybrid approach selecting also com-
dimensions with the highest agreement at 0.69 are em-              ments from known misogynistic accounts and responses directed
                                                                   to feminist public figures. We conducted a qualitative analysis and
pathy towards the event and aggressiveness towards the
                                                                   we found that the misogyny contained is almost always explicit
perpetrator. In fact, annotators explained to us that these        and depending on slurs. This lead us to place it in the keyword
two categories were the easiest phenomena to annotate              category.
                                 Maltesi Test                 Intes8 Test            PejorativITy Test             AMI Test
     Model
                           F1 Macro F1 1-Label         F1 Macro F1 1-Label        F1 Macro F1 1-Label      F1 Macro F1 1-Label
     BERT-Maltesi            0.611         0.351         0.512         0.174        0.571        0.436       0.633        0.611
     BERT-Inters8            0.377         0.169         0.621          0.49         0.55        0.538       0.659        0.725
     BERT-PejorativITy       0.528         0.226         0.483         0.128         0.67        0.604       0.675        0.732
     BERT-AMI                0.494         0.155          0.59         0.299        0.654        0.601       0.877        0.886
     Average                 0.502         0.225         0.551         0.273        0.611        0.545       0.711        0.738

Table 1
Results for binary misogyny detection on all datasets


Inters8 datasets. The models were just trained on the com-               most challenging datasets. This is especially true when
ments and were not given any other extra-information                     observing the average f1-score on the positive label with
such as video transcriptions. The only label we analyzed                 the score being in the [0.2, 0.3] range, compared to much
was misogyny and all datasets were divided in training,                  higher scores for PejorativITy and especially AMI. These
validation and, test sets following a 60%, 20% and, 20%                  trends indicate how misogyny detection is a much harder
split. We used the existing splits when provided in the                  task when considering datasets that contain less explicit
papers9 , else, we randomly created them. All models are                 forms of hate (e.g., not gathered by keyword search of
binary classifiers created by fine-tuning BERT [30], in                  sexist slurs).
particular we used the Italian version AlBERTo [31]. Due                    In addition, we conducted a qualitative analysis on the
to the imbalanced nature of most corpora, the models                     errors of the various classifiers. We found that for each
were trained with a focal loss [32] setting the hyperpa-                 test set most classifiers misclassified the same type of
rameter 𝛾 = 2. Models were trained for 5 epochs but,                     examples. Models almost never recognized texts which
to avoid overfitting, we implemented an early stopping                   contained victim blaming and slut shaming in the GBV-
function which ends training after 2 epochs that report                  Maltesi Dataset. The errors made on Inters8 mostly coin-
an increase in validation loss. We tested all models on                  cide with examples that are also racist and Islamophobic.
their own test set and the other 3 corpora.                              The cases which proved to be more difficult in Pejora-
   We want to underline that our goal is not to compare                  tivITy and AMI contain less explicit animal epithets like
performance of the different models between each other                   “cavalla” and nouns that refers to sex worker in a less
as they have different number of training sets and positive              explicit way like “cortigiana”.
examples. Rather, we intend to focus on how different test
sets are more difficult compared to others which helps
us understand what the current challenges in misogyny                    5. Conclusion and Future Works
detection are.
                                                                         In this paper, we presented GBV-Maltesi which is the
   In Table 1, we reported the positive label and the macro
                                                                         first dataset regarding social reactions to GBV, in partic-
average f1-scores of all experiments. In addition, we also
                                                                         ular to a femicide case. The topic was chosen to shed
calculated the average scores for each test set. The best
                                                                         light on the importance of having misogyny corpora
scores achieved on a certain test set are in bold, mean-
                                                                         that include forms of sexism that are more implicit and
while, we underlined the best scores for cross-dataset
                                                                         complicated to detect compared to the existing ones
testing. As expected, we can observe that all models had
                                                                         that focus on slurs and offensive terms. We also fo-
the highest score for their own set. Meanwhile, for cross-
                                                                         cused on the intersectionality aspects to better explore
dataset testing, we can see that the models that tend to
                                                                         online hate. GBV-Maltesi is composed of 2,934 com-
perform the best are BERT-PejorativITy and BERT-AMI.
                                                                         ments all annotated by 5 annotators and it is available at
We suspect that this is caused by the dataset composi-
                                                                         https://github.com/madeddumarco/GBV-Maltesi. In or-
tion as their training sets present more positive examples
                                                                         der to overcome limitations of generic semantic schema,
compared to the others.
                                                                         the corpus has been annotated following a new schema
   Interestingly, we can observe that certain models
                                                                         specifically created for cases of GBV. In the experimental
recorded higher scores on other test sets that were not
                                                                         phase of our work, we created different misogyny binary
their own. This mostly happens when focusing on BERT-
                                                                         classifiers and tested them in a cross-dataset way. We
Maltesi and BERT-Inters8, which record higher scores on
                                                                         found that datasets gathered on keyword collection are
AMI and PejorativITy. Even PejorativITy increseases its
                                                                         easier benchmarks as the model showed bias towards
scores when tested on AMI. Observing the average scores
                                                                         slurs and not identifying more implicit cases of misog-
for each test, we can see that Maltesi and Inters8 are the
                                                                         yny. This research on online discourse about GBV is
9
    PejorativITy provides a training and test split, but analyzing the   not meant to be exhaustive, as several questions are still
    code we found that the test set was used as a validation set so we   open.
    decided to create a new one.
   As future works, we intend to focus on how different         References
framing of news can cause different online reactions, an-
alyzing the differences between video transcripts of femi-       [1] M. L. Bonura, Che genere di violenza: conoscere
cide news and the comments collected, in terms of words              e affrontare la violenza contro le donne, Edizioni
used, implicit references, attributions of guilt and descrip-        Centro Studi Erickson, 2018.
tions of the people involved in the story. We also intend        [2] C. Vagnoli, Maledetta sfortuna, Rizzoli, 2021.
to gather more annotated corpora regarding femicides             [3] J. Posetti, K. Bontcheva, D. Maynard, N. Aboulez,
to explore how other characteristics of the victim (e.g.,            A. Lu, B. Gardiner, S. Torsner, J. Harrison,
origin or skin color) and time of the murder differently             G. Daniels, F. Chawana, O. Douglas, A. Willis,
influence the online reactions. In this regard, we intend            F. Martin, L. Barcia, A. Jehangir, J. Price, G. Gober,
to explore the question by investigating whether and                 J. Adams, N. Shabbir, The Chilling: A global study
how the discourse on misogyny changes depending on                   of online violence against women journalists, 2022.
whether it is addressed to living or dead women (i.e., Giu-      [4] K. R. Blake, S. M. O’Dean, J. Lian, T. F. Denson,
lia Cecchettin femicide and abusive discourse against her            Misogynistic tweets correlate with violence against
sister, Elena Cecchettin). Lastly, we would like to extend           women, Psychological science 32 (2021) 315–325.
our research by following an intersectional approach,            [5] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson,
considering all the dimensions and characteristics that              H. Margetts, An expert annotated dataset for the
make up the identity of both victim and perpetrator. To              detection of online misogyny, in: P. Merlo, J. Tiede-
conclude, we strongly advocate the importance of write               mann, R. Tsarfaty (Eds.), Proceedings of the 16th
the news correctly, as this has deep consequences on the             Conference of the European Chapter of the Associ-
readers’ perception and the way they talk about it.                  ation for Computational Linguistics: Main Volume,
                                                                     Association for Computational Linguistics, On-
                                                                     line, 2021, pp. 1336–1350. URL: https://aclanthology.
Ethics Statement                                                     org/2021.eacl-main.114. doi:10.18653/v1/2021.
                                                                     eacl-main.114.
The dataset was created in accordance with YouTube’s             [6] E. Fersini, D. Nozza, P. Rosso, Overview of the
Terms of Service. Considering the large number of users              evalita 2018 task on automatic misogyny identifica-
writing comments collected in the dataset, it was not                tion (ami), in: EVALITA@CLiC-it, 2018. URL: https:
possible to explicitly ask for their consent. No sensitive           //api.semanticscholar.org/CorpusID:56483156.
data are provided in the dataset and users’ mentions have        [7] S. Gemelli, G. Minnema, Manosphrames: explor-
been anonymized to protect their privacy.                            ing an Italian incel community through the lens
   All the annotators involved in this research were free            of NLP and frame semantics, in: P. Sommerauer,
to participate without pressure or obligation. From the              T. Caselli, M. Nissim, L. Remijnse, P. Vossen (Eds.),
initial stages, they were aware of being free to leave at            Proceedings of the First Workshop on Reference,
any time without negative consequences. During the                   Framing, and Perspective @ LREC-COLING 2024,
annotation phase, we met several times to make sure that             ELRA and ICCL, Torino, Italia, 2024, pp. 28–39. URL:
the topic did not disturb them psychologically or emo-               https://aclanthology.org/2024.rfp-1.4.
tionally. We informed them to take their time, doing the         [8] I. Spada, M. Lai, V. Patti, Inters8: A corpus to
annotation only when they felt like it and to contact us             study misogyny and intersectionality on twitter.,
for support. This approach continued for all the research            in: CLiC-it, 2023.
stages.                                                          [9] P. Lalli, L’amore non uccide. Femminicidio e dis-
                                                                     corso pubblico: cronaca, tribunali, politiche, Il
Acknowledgements                                                     Mulino, 2020.
                                                                [10] A. Tontodimamma, E. Nissi, A. Sarra, L. Fontanella,
We would like to thank Chiara Zanchi for discussing with             Thirty years of research into hate speech: topics
us the direction of this work in its early stages. In addi-          of interest and their evolution, Scientometrics 126
tion, we would like to thank Sara Gemelli and Andrea                 (2021) 157–179.
Marra for their contribution to the creation of the anno-       [11] A. Ollagnier, E. Cabrio, S. Villata, Unsupervised
tation scheme and guidelines. Also, we reiterate our grat-           fine-grained hate speech target community detec-
itude to the annotators who professionally worked on a               tion and characterisation on social media, Social
difficult topic like GBV. This work was also partially sup-          Network Analysis and Mining 13 (2023) 58.
ported by “HARMONIA” project - M4-C2, I1.3 Partenar-            [12] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco,
iati Estesi - Cascade Call - FAIR - CUP C63C22000770006 -            V. Patti,     Resources and benchmark corpora
PE PE0000013 under the NextGenerationEU programme.                   for hate speech detection: a systematic review,
                                                                     Lang. Resour. Evaluation 55 (2021) 477–523. URL:
      https://doi.org/10.1007/s10579-020-09502-8. doi:10.              NLP in italian: Analyzing responsibility framing in
      1007/S10579-020-09502-8.                                         femicide news reports, in: E. Fersini, M. Passarotti,
[13] K. W. Crenshaw, Mapping the margins: Inter-                       V. Patti (Eds.), Proceedings of the Eighth Italian
     sectionality, identity politics, and violence against             Conference on Computational Linguistics, CLiC-it
     women of color, in: The public nature of private                  2021, Milan, Italy, January 26-28, 2022, volume 3033
     violence, Routledge, 2013, pp. 93–118.                            of CEUR Workshop Proceedings, CEUR-WS.org, 2021.
[14] K. Manne, Down Girl: The Logic of Misogyny,                       URL: https://ceur-ws.org/Vol-3033/paper32.pdf.
     Oxford University Press, 2018. URL: https://books.           [24] G. Minnema, S. Gemelli, C. Zanchi, T. Caselli,
     google.it/books?id=LqPoAQAACAAJ.                                  M. Nissim, Dead or murdered? predicting respon-
[15] E. W. Pamungkas, A. T. Cignarella, V. Basile, V. Patti,           sibility perception in femicide news reports, in:
     et al., Automatic identification of misogyny in en-               Y. He, H. Ji, S. Li, Y. Liu, C.-H. Chang (Eds.), Pro-
     glish and italian tweets at evalita 2018 with a multi-            ceedings of the 2nd Conference of the Asia-Pacific
     lingual hate lexicon, in: CEUR Workshop Proceed-                  Chapter of the Association for Computational Lin-
     ings, volume 2263, CEUR-WS, 2018, pp. 1–6.                        guistics and the 12th International Joint Confer-
[16] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M.        ence on Natural Language Processing (Volume 1:
     Rangel Pardo, P. Rosso, M. Sanguinetti, SemEval-                  Long Papers), Association for Computational Lin-
     2019 task 5: Multilingual detection of hate speech                guistics, Online only, 2022, pp. 1078–1090. URL:
     against immigrants and women in Twitter, in:                      https://aclanthology.org/2022.aacl-main.79.
     J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidi-           [25] G. Minnema, H. Lai, B. Muscato, M. Nis-
     anaki, S. M. Mohammad (Eds.), Proceedings of the                  sim, Responsibility perspective transfer for Ital-
     13th International Workshop on Semantic Evalu-                    ian femicide news,         in: A. Rogers, J. Boyd-
     ation, Association for Computational Linguistics,                 Graber, N. Okazaki (Eds.), Findings of the As-
     Minneapolis, Minnesota, USA, 2019, pp. 54–63. URL:                sociation for Computational Linguistics: ACL
     https://aclanthology.org/S19-2007. doi:10.18653/                  2023, Association for Computational Linguistics,
     v1/S19-2007.                                                      Toronto, Canada, 2023, pp. 7907–7918. URL: https:
[17] E. W. Pamungkas, V. Basile, V. Patti, Misogyny de-                //aclanthology.org/2023.findings-acl.501. doi:10.
     tection in twitter: a multilingual and cross-domain               18653/v1/2023.findings-acl.501.
     study, Inf. Process. Manag. 57 (2020) 102360. URL:           [26] V. Basile, N. Novielli, D. Croce, F. Barbieri, M. Nis-
     https://doi.org/10.1016/j.ipm.2020.102360. doi:10.                sim, V. Patti, Sentiment polarity classification at
     1016/J.IPM.2020.102360.                                           evalita: Lessons learned and open challenges, IEEE
[18] P. Zeinert, N. Inie, L. Derczynski, Annotating on-                Transactions on Affective Computing 12 (2021) 466–
     line misogyny, in: Proceedings of the 59th Annual                 478.
     Meeting of the Association for Computational Lin-            [27] B. Vidgen, L. Derczynski, Directions in abusive lan-
     guistics and the 11th International Joint Conference              guage training data, a systematic review: Garbage
     on Natural Language Processing (Volume 1: Long                    in, garbage out, Plos one 15 (2020) e0243300.
     Papers), 2021, pp. 3181–3197.                                [28] J. Fleiss, Measuring nominal scale agreement
[19] F. Formato, Gender, discourse and ideology in Ital-               among many raters, Psychological Bulletin 76
     ian, Springer, 2019.                                              (1971) 378–. doi:10.1037/h0031619.
[20] L. Busso, C. R. Combei, O. Tordini, Narrating gender         [29] A. Muti, F. Ruggeri, C. Toraman, L. Musetti,
     violence a corpus-based study on the representa-                  S. Algherini, S. Ronchi, G. Saretto, C. Zapparoli,
     tion of gender-based violence in italian media, in:               A. Barrón-Cedeño, Pejorativity: Disambiguating
     Language, Gender and Hate Speech: A Multidisci-                   pejorative epithets to improve misogyny detection
     plinary Approach, 2020.                                           in italian tweets, arXiv preprint arXiv:2404.02681
[21] N. Mandolini,        Femminicidio, prima e dopo.                  (2024).
     un’analisi qualitativa della copertura giornalistica         [30] J. Devlin, M. Chang, K. Lee, K. Toutanova,
     dei casi stefania noce (2011) e sara di pietrantonio              BERT: pre-training of deep bidirectional trans-
     (2016), Problemi dell’informazione 45 (2020) 247–                 formers for language understanding,             CoRR
     277.                                                              abs/1810.04805 (2018). URL: http://arxiv.org/abs/
[22] E. Pinelli, C. Zanchi, Gender-Based Violence in                   1810.04805. arXiv:1810.04805.
     Italian Local Newspapers: How Argument Struc-                [31] M. Polignano, P. Basile, M. de Gemmis, G. Semer-
     ture Constructions Can Diminish a Perpetrator’s                   aro, V. Basile, AlBERTo: Italian BERT Language
     Responsibility, 2021, pp. 117–143. doi:10.1007/                   Understanding Model for NLP Challenging Tasks
     978-3-030-70091-1_6.                                              Based on Tweets, in: Proceedings of the Sixth
[23] G. Minnema, S. Gemelli, C. Zanchi, V. Patti,                      Italian Conference on Computational Linguistics
     T. Caselli, M. Nissim, Frame semantics for social                 (CLiC-it 2019), volume 2481, CEUR, 2019. URL:
           Dimension             Yes %     No %                        Dimension                    Fleiss’ kappa
           Subjectivity         70.48%     29.52%                      Misoginy                         0.56
           Misogyny             3.76%      96.24%                      Target                           0.48
           Polarity-Negative    51.89%     48.11%                      Intersectionality                0.32
           Polarity-Positive    4.93%      95.07%                      Aggressiveness                   0.53
           Aggressiveness       24.02%     75.98%                      Agg. Perpetrator                 0.69
           Irony                7.09%      92.91%                      Agg. Victim                      0.28
           Context              81.48%     18.52%                      Agg. Social Network              0.23
                                                                       Agg. Media                       0.40
Table 2                                                                Agg. Rape Culture                0.10
Distribution of the dimensions for the DiPietrantonio Dataset          Responsibility                   0.21
                                                                       Resp. Perpatrator                0.25
                                                                       Resp. Victim                     0.55
     https://www.scopus.com/inward/record.uri?                         Resp. Social Network             0.13
     eid=2-s2.0-85074851349&partnerID=40&md5=                          Resp. Media                      0.23
     7abed946e06f76b3825ae5e294ffac14.                                 Resp. Rape Culture               0.19
                                                                       Empathy towards the event        0.69
[32] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Fo-
                                                                       Humor                            0.45
     cal loss for dense object detection, in: Proceedings              Macabre                          0.49
     of the IEEE international conference on computer                  Context                          -0.11
     vision, 2017, pp. 2980–2988.
[33] F. Barbieri, V. Basile, D. Croce, M. Nissim, Table 3
     N. Novielli, V. Patti, et al., Overview of the evalita Agreement of the Maltesi Dataset
     2016 sentiment polarity classification task, in:
     CEUR Workshop Proceedings, volume 1749, CEUR-               Dimension                         Yes %     No %
     WS, 2016.
                                                                     Misoginy                      9.03%    90.97%
                                                                     Intersectionality             4.63%    95.36%
                                                                     Aggressiveness                 24%       76%
A. Details about the Di                                              Agg. Perpetrator              19.19%   80.81%
                                                                     Agg. Victim                   1.23%    98.77%
   Pietrantonio Dataset                                              Agg. Social Network           0.88%    99.11%
                                                                     Agg. Media                    2.73%    97.27%
The dataset GBV-DiPietrantonio is composed of 691                    Agg. Rape Culture             0.41%    99.59%
tweets fully annotated by 3 annotators, 2 of which self-             Responsibility                32.89%   67.11%
identified as women and 1 as a man. The tweets were                  Resp. Perpetrator             22.09%   77.91%
collected by gathering responses to news which covered               Resp. Victim                  6.55%    93.45%
the news of Di Pietrantonio femicide. The annotation                 Resp. Social Network          2.11%    97.89%
scheme is composed of the slightly modified SENTIPOLC                Resp. Media                   99.01%    0.99%
scheme[33] which consists of Subjectivity, Polarity (Posi-           Resp. Rape Culture            4.06%    95.94%
tive, Negative) and Irony. In addition the semantic grid             Empathy towards the event     28.25%   71.75%
                                                                     Humor                         3.14%    96.86%
contained Misogyny, Aggressiveness and Target of Ag-
                                                                     Macabre                       3.27%    96.72%
gressiveness (towards Perpetrator, Victim, Other), Con-              Context                       97.51%    2.49%
text, and Notes.
   The statistics of the gold standard for the Di Pietranto- Table 4
nio dataset are in Table 2.                                  Distribution of the binary dimensions of the Maltesi Dataset


B. Agreement of the Maltesi                                     C. Distributions of the Maltesi
   Dataset                                                         Dataset
Table 3 contains the agreement values calcolated with
                                                                Table 4 contains the distribution of the binary labels in
Fleiss’ Kappa for all dimensions in the Maltesi dataset.
                                                                the Maltesi dataset. Table 5 contains the type of inter-
                                                                sectionality and table 6 contains the type of misogyny
                                                                target.
           Dimension               Percentage %
           Work                        96.32%
           Age                         0.73%
           Work and Education          0.73%
           Work and Gender             2.20%

Table 5
Distribution of the values for the types of intersectionality
selected


               Dimension       Percentage %
               Individual          63.40%
               Grooup              36.60%

Table 6
Distribution of the values for the types of misogyny target
selected


D. Distributions of the Misogyny
   Dataset
Table 7 contains the details of the other existing misogyny
datasets used in the experimental phase.
          Dataset        Topic                                      Num Examples   Num Pos.   Pos. %
                         Intersectional Hate focusing on
          Inters8        Islamophobia in the case of hate towards       1,500        288      19.2%
                         Silvia Romano
                         Misogynistic slurs, attacks towards
                         important figures who expressed support
          AMI                                                           5,000        2,340    46.8%
                         for women rights and posts from
                         misogynistic account
                         Words that can be used as misogynistic
          Pejorativity   pejoratives in online discussion (e.g.         1,200        397       33%
                         Cavalla, cagna,...)

Table 7
Distribution of the Italian misogyny Dataset