Exploring YouTube Comments Reacting to Femicide News in Italian Chiara Ferrando1,*,† , Marco Madeddu1,*,† , Beatrice Antola2 , Sveva Silvia Pasini3 , Giulia Telari3 , Mirko Lai4 and Viviana Patti1 1 Università di Torino, Italy 2 Università di Padova, Italy 3 Università di Pavia, Italy 4 Università del Piemonte Orientale, Italy Abstract In recent years, the Gender Based Violence (GBV) has become an important issue in modern society and a central topic in different research areas due to its alarming spread. Several Natural Language Processing (NLP) studies, concerning Hate Speech directed against women, have focused on misogynistic behaviours, slurs or incel communities. The main contribution of our work is the creation of the first dataset on social media comments to GBV, in particular to a femicide event. Our dataset, named GBV-Maltesi, contains 2,934 YouTube comments annotated following a new schema that we developed in order to study GBV and misogyny with an intersectional approach. During the experimental phase, we trained models on different corpora for binary misogyny detection and found that datasets that mostly include explicit expressions of misogyny are an easier challenge, compared to more implicit forms of misogyny contained in GBV-Maltesi. Warning: This paper contains examples of offensive content. Keywords Hate Speech, Misoginy Detection, Femicide, Social media, News, Responsibility framing 1. Introduction statistics become even more alarming when we consider studies that show the correlation between misogynistic Nowadays, the term Gender Based Violence (GBV) is online posts and GBV [4]. used to identify all forms of abuse based on gender hatred Like other countries, Italy is affected by GBV, with the and sexist discrimination [1]. Scholars in social science national observatory managed by the “Non Una di Meno” have defined as “rape culture” the society that normalizes association reporting 117 femicides in 2022, 120 in 2023 sexist behaviours: from more common occurrences like and more than 40 until June 20243 . victim blaming, slut shaming and gender pay gap to the Several studies about Hate Speech (HS) directed to- apex of violence with femicide [2]. While general vio- wards women often focus on developing taxonomies [5] lent crimes decreased over time, GBV did not, alarming rather than investigating low resource subjects in com- various bodies in modern society1 . A report from the EU putational linguistics like GBV. These works often gather commission2 states that 31%, 5% and 43% of European corpora by keyword search of gender slurs [6], retrieving women suffered respectively from physical, sexual and comments left on misogynistic spaces like incel blogs psychological violence. Regarding the Internet sphere, a [5, 7] or considering messages directed towards popular survey found that 73% of women journalists experienced women figures highly debated on social media [8]. online violence (threats, belittling, shaming,...) [3]. These As GBV is a broad topic, we want to clarify that we fo- CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, cus on GBV in Western societies, particularly in Italy. The Dec 04 — 06, 2024, Pisa, Italy main goal of this project is to show what is the current * Corresponding authors. perception of femicides expressed through comments on † These authors contributed equally. social media, focusing on the specific case of Carol Mal- $ chiara.ferrando@unito.it (C. Ferrando); tesi. We chose this femicide because the victim was a marco.madeddu@unito.it (M. Madeddu); sex worker, meaning that she presented an intersectional beatrice.antola@studenti.unipd.it (B. Antola); svevasilvia.pasini01@universitadipavia.it (S. S. Pasini); trait, and it was a popular case in the media, enabling giulia.telari01@universitadipavia.it (G. Telari); mirko.lai@uniupo.it us to select enough material for the study. Further, we (M. Lai); viviana.patti@unito.it (V. Patti) want to highlight how the socio-demographic character- © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). istics of the victims determine the way they are described 1 https://www.interno.gov.it/it/stampa-e-comunicazione/ and how this influences the perception of the news. For dati-e-statistiche/omicidi-volontari-e-violenza-genere 2 https://commission.europa.eu/strategy-and-policy/ instance, victim’s features such as age, job, origin, skin policies/justice-and-fundamental-rights/gender-equality/ 3 gender-based-violence/what-gender-based-violence_en https://osservatorionazionale.nonunadimeno.net/anno/ CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings color, nationality, religion have different weight and de- Misogyny has become a pervasive phenomenon, termine the lesser or greater spread of the news [9]. To widespread in very different spheres and expressed in overcome the cited issues in current literature, in this both explicit and implicit forms [5, 18]. For this reason, research we considered the phenomenon by focusing on even in online conversation about a dramatic act such users’ reactions in social media to news about femicides. as femicide, it is possible to find examples of veiled or We collected YouTube comments in response to videos explicit hostility towards the victims. The femicide phe- talking about a specific case. In order to overcome the nomenon has been studied from different points of view. constraints of traditional sentiment analysis schemas, we Several studies focused on GBV representation in Italian annotated the data following a new semantic grid that media [19, 20]. In 2020, Mandolini focused on the journal- can be used as a standard for comments regarding GBV. istic narratives of femicide in newspapers by means of a In the experimental phase of this work, we created qualitative discourse analysis on two specific case studies models based on different Italian misogyny datasets (in- [21]. The researcher attempted to describe changes in cluding ours). The goal of such experiments is to analyze attitudes in the portrayal of femicide, focusing on dis- the different features of these corpora and what forms cursive strategies that (directly or indirectly) blame the of misogyny are harder to detect. We performed both a victim and implicitly excuse the perpetrator, referring to quantitative and qualitative analysis of the results. gender stereotypes and romantic love rhetoric. In the next sections, we describe: related work on hate Other studies focused on the responsibility framing speech and misogyny detection(Section 2), the annota- in femicides news, by conducting an experiment where tion scheme and both a quantitative and qualitative anal- annotators rated excerpts from local newspapers on how ysis of the dataset (Section 3), and the results obtained much responsibility was given to the perpetrator [22]. in our experiments (Section 4). Lastly, we present some As far as we know, there is only one line of work in NLP conclusions and delineate possible future developments on GBV [23, 24, 25], which focuses on reader’s percep- (Section 5). tion of femicide news headlines and analyses the percep- tion of responsibility attributed to victim and perpetra- tor; whereas, to our knowledge, there is no other study 2. Related Work analysing social media reactions to GBV cases. In recent times, the creation and dissemination of hate speech are increasingly pervasive on online platforms, 3. Dataset making social media a fertile ground for hateful discus- sions [10]. The escalation of offensive and abusive lan- 3.1. Corpus Background guage, understood as content that discriminates a per- son or group on the basis of specific characteristics such In a preliminary phase of our work, we conducted a as ethnicity, gender, sexual orientation, and more has research on the femicide case of Sara Di Pietrantonio4 , 22 aroused considerable interest in various fields. In fact, years old, a white Italian student, from a wealthy family, over the last decade, a large number of computational murdered by her ex boyfriend on May 2016 [21]. In this methods involving NLP and Machine Learning have been preliminary research we set out to develop a corpus by proposed for automatic online hate speech detection collecting Twitter users’ comments to femicide news on [11, 12]. Most of prior works have mainly considered newspapers published online 5 . We created an annotation hate speech as a classification task, by distinguishing scheme for the data corpus consisting of two layers: the between hate and non-hate speech. Hate speech takes first focused on the dimensions of sentiment analysis on different nuances depending on the target groups at and composed of three subtasks (subjectivity, polarity which it is directed, i.e. depending on the specific features and irony), relevant for the detection of sentiment in that the target group have in common. Moreover, in some social media [26]; the second focused on hate speech cases, these traits may intersect with each other, leading detection, including labels for misogyny, aggressiveness to different degrees of discrimination. This concept takes and its target. For more details on the annotation scheme the name of intersectionality [13]. and corpus description, please read below Appendix A. Among abusive languages, misogyny, considered as a Observing the results of the preliminary study, we specific offensive language against women, has become discovered how the victim’s characteristics influence a contemporary research topic [14]. In automatic hate the way newspapers present her femicide and users talk speech detection field, the Automatic Misogyny Identi- about it on social media. In fact, analyzing Di Pietranto- fication (AMI) [15] series of shared tasks launched in nio’s case, as she was a young, white, wealthy and Italian EVALITA [6] and the SemEval-2019 HatEval challenge 4 https://www.agi.it/cronaca/news/2019-09-11/sara_di_ [16] have produced evaluation frameworks to identify pietrantonio_processo_tappe-6170806/ 5 misogynous tweets in English, Italian and Spanish [17]. the dataset is available at https://github.com/madeddumarco/ GBV-Maltesi student, we found very few examples of misogyny and, voluntarily participated to the project. The annotation in most cases, the aggressiveness was directed against guidelines were decided with the annotators after a pi- the perpetrator. Furthermore, the scheme was not con- lot study and a subsequent group discussion where the sidered sufficiently suitable for bringing out important raters pointed out the main faults of the schema. Each elements of femicide cases. In fact, the annotators ex- annotator analyzed all the comments according to the pressed their difficulties caused by the scheme developed following guidelines: as it was deficient and too simplistic to recognise com- • Non classifiable: if the comment cannot be anal- plex features of femicide events. In order to solve these ysed because it is not written in Italian, because issues, we decided to direct our efforts on another case it consists only of emojis, because it is not com- study in which the victim exhibits intersectionality traits, prehensible or not relevant to the topic (any com- which we assume may lead to more misogynistic content. ment that was marked as NC from at least 1 an- In addition, we developed new schema and guidelines notator was removed from the corpus); to have more accurate annotations specifically related to • Empathy: whether, in the comment, there are ex- the femicide domain. pressions of empathy in support of the victim, her family or the event in general (i.e., condolences); 3.2. Data Collection • Misogyny: whether, in the comment, there is a In this section we provide a description of the new dataset presence of discriminatory expression against built and the methodology used. women, including blaming, objectifying, discrim- As mentioned above, we focused our research on the inatory and sexist practices used towards them femicide of Carol Maltesi6 , a 26 years old, white Italian and their life choices. If misogyny is present, we woman, mother and online sex worker, who was bru- asked annotators to indicate its target (group or tally murdered in January 2022 by her ex partner, Davide individual) based on [16]. Moreover, we asked to Fontana, a 44 years old white Italian bank employee. specify if the expressed misogyny contained in- With the aim of collecting users’ responses to femicide, tersectionality traits and to select from a list what we chose to collect comments using YouTube Data API, other dimensions were involved: age, religion, as it is freely available and allows us to easily access com- job, nationality, skin color, class, sexual orien- ments focused on specific news. The process of obtaining tation, gender, physical condition, educational data followed several steps: first, we selected the 31 most background, language and culture; popular YouTube videos based on number of views and • Aggressiveness: whether there is aggressiveness comments. We chose videos about Maltesi femicide from in the comment and to whom it is directed (allow- different types of sources: national (mainly the Italian ing multiple choices): victim, perpetrator, social broadcaster RAI) and local news. The selection of videos network (family, friends, colleagues), media, rape is diachronic spanning from March 2022 to June 2023; culture; this was done because the various media channels cov- • Responsibility: if there is explicit attribution of ered the story as it evolved starting from the discovery of responsibility for the murder in the text, state the nameless body and ending with the sentence given who is blamed (allowing multiple choices): vic- to the perpetrator. Afterwards, we collected comments tim, perpetrator, social network (family, friends, from all the videos selected. Due to the API policy, we colleagues), media, rape culture; were restricted to collect only first-level comments and • Humor: specify whether the text conveys humor- at most 5 oldest responses to them. In total, we retrieved ous content through irony, sarcasm, word games 3,821 comments. or hyperbole; • Macabre: specify whether there are macabre as- pects detailing how the victim was killed; 3.3. Annotation Scheme • Context: indicate whether the context was help- From the previous experience of the Di Pietrantonio cor- ful to better understand the meaning of the com- pus, we decided that a generic sentiment analysis schema ments; proved to be too rigid to understand such a complex phe- • Notes: free space for suggestions, observations or nomenon. We created an annotation scheme and a new doubts. online platform to facilitate the raters work. We involved 5 annotators, 4 of them self-identified as women and 1 3.4. Dataset Analysis as a man, all interested in the topic and mostly coming 7 from humanistic background. They were all students and The dataset, GBV-Maltesi , is composed of 2,934 com- 6 ments annotated on all categories by all annotators. We https://www.agi.it/cronaca/news/2024-02-21/ omicidio-maltesi-condannato-ergastolo-ex-davide-fontana-25397937/ 7 https://github.com/madeddumarco/GBV-Maltesi (a) Distribution of the misogyny label (b) Distribution of the aggressiveness (c) Distribution of the responsibility la- and its subcategories label bel Figure 1: Histograms for distributions of relevant labels aggregated dimensions through majority voting. As our as they lacked ambiguity. On the other hand, we can see schema is composed by many different labels, we will that aggressiveness towards the victim is much lower focus only on the dimensions that we consider the most (0.28). In our discussions with the raters, it emerged how relevant, but all statistics can be found in Appendix C. attacks towards the victim were harder to identify as Starting from misogyny, in Appendix C and in Figure they were more subtle leading to disagreement among 1a, we can see that 9.03% of cases are positive. This un- annotators. balance is typical of hate speech datasets [27] and we consider it surprisingly high if we take into account the tragic theme of GBV. It is very interesting that intersec- 4. Experiments tionality represents over 50% of misogynous examples We conducted experiments to validate our resource and indicating how the personal traits of the victim affect to gain more insight into the difficulty of the misogyny the perception of the users commenting. Unsurprisingly, detection task. The goal of this analysis is to understand as the victim was a sex worker, ‘work’ is almost always how the presence of different forms of misogyny (implicit the category chosen by the annotators. The target of and explicit) affect the evaluation of modern classifica- misogyny was mostly individual, confirming the findings tion models. We consider as explicit misogyny discourses of SemEval-2019 Task 5 [16]. The annotators explained that intentionally spread hate towards women mostly to us how the misogyny target was a difficult category through slurs and other aggressive behaviors. Mean- to annotate as often comments used the victim as an while, we intend implicit misogyny as more subtle and example to offend the broader group of women and sex less conscious practices like victim blaming, slut sham- workers. ing, de-responsibilization of the perpetrator and more. In Aggressiveness is more present than misogyny in our addition to our corpus, we used 3 other datasets regard- dataset, with 24% positive examples mostly directed to- ing the topic in Italian: AMI [6], PejorativITy [29] and wards the perpetrator. Responsibility follows a similar Inters8 [8]. The former two have been mainly gathered trend with 32.89% positive examples most directed to- by keyword search of sexist terms8 , meanwhile, Inters8 wards the perpetrator. Unlike aggressiveness, we can and our corpus are focused on more implicit forms of see a significant amount of comments holding the victim sexist hate directed towards a specific woman (i.e., Silvia responsible (6.55%). Romano and Carol Maltesi). Details about all the datasets In Appendix B, we reported the inter-annotator agree- can be found in Appendix D. ment (IAA) scores for all dimensions. As our dataset is To explore the potential bias of models towards explicit fully annotated by multiple people, the metric we chose forms of misogyny, we created 4 different models for is Fleiss’ Kappa [28]. The metric has a possible range of binary misogyny detection: BERT-Maltesi, BERT-AMI, [-1,1], with 1 indicating perfect agreement, and any value BERT-PejorativITy and, BERT-Inters8 that were respec- of 𝜅 ≤ 0 indicates more disagreement between the an- tively trained on the GBV-Maltesi, AMI, PejorativITy and notators than expected by chance. We can see that most dimensions have a 𝜅 in the [0.2, 0.7] range, indicating variable levels of agreement depending on the label. The 8 AMI is created following an hybrid approach selecting also com- dimensions with the highest agreement at 0.69 are em- ments from known misogynistic accounts and responses directed to feminist public figures. We conducted a qualitative analysis and pathy towards the event and aggressiveness towards the we found that the misogyny contained is almost always explicit perpetrator. In fact, annotators explained to us that these and depending on slurs. This lead us to place it in the keyword two categories were the easiest phenomena to annotate category. Maltesi Test Intes8 Test PejorativITy Test AMI Test Model F1 Macro F1 1-Label F1 Macro F1 1-Label F1 Macro F1 1-Label F1 Macro F1 1-Label BERT-Maltesi 0.611 0.351 0.512 0.174 0.571 0.436 0.633 0.611 BERT-Inters8 0.377 0.169 0.621 0.49 0.55 0.538 0.659 0.725 BERT-PejorativITy 0.528 0.226 0.483 0.128 0.67 0.604 0.675 0.732 BERT-AMI 0.494 0.155 0.59 0.299 0.654 0.601 0.877 0.886 Average 0.502 0.225 0.551 0.273 0.611 0.545 0.711 0.738 Table 1 Results for binary misogyny detection on all datasets Inters8 datasets. The models were just trained on the com- most challenging datasets. This is especially true when ments and were not given any other extra-information observing the average f1-score on the positive label with such as video transcriptions. The only label we analyzed the score being in the [0.2, 0.3] range, compared to much was misogyny and all datasets were divided in training, higher scores for PejorativITy and especially AMI. These validation and, test sets following a 60%, 20% and, 20% trends indicate how misogyny detection is a much harder split. We used the existing splits when provided in the task when considering datasets that contain less explicit papers9 , else, we randomly created them. All models are forms of hate (e.g., not gathered by keyword search of binary classifiers created by fine-tuning BERT [30], in sexist slurs). particular we used the Italian version AlBERTo [31]. Due In addition, we conducted a qualitative analysis on the to the imbalanced nature of most corpora, the models errors of the various classifiers. We found that for each were trained with a focal loss [32] setting the hyperpa- test set most classifiers misclassified the same type of rameter 𝛾 = 2. Models were trained for 5 epochs but, examples. Models almost never recognized texts which to avoid overfitting, we implemented an early stopping contained victim blaming and slut shaming in the GBV- function which ends training after 2 epochs that report Maltesi Dataset. The errors made on Inters8 mostly coin- an increase in validation loss. We tested all models on cide with examples that are also racist and Islamophobic. their own test set and the other 3 corpora. The cases which proved to be more difficult in Pejora- We want to underline that our goal is not to compare tivITy and AMI contain less explicit animal epithets like performance of the different models between each other “cavalla” and nouns that refers to sex worker in a less as they have different number of training sets and positive explicit way like “cortigiana”. examples. Rather, we intend to focus on how different test sets are more difficult compared to others which helps us understand what the current challenges in misogyny 5. Conclusion and Future Works detection are. In this paper, we presented GBV-Maltesi which is the In Table 1, we reported the positive label and the macro first dataset regarding social reactions to GBV, in partic- average f1-scores of all experiments. In addition, we also ular to a femicide case. The topic was chosen to shed calculated the average scores for each test set. The best light on the importance of having misogyny corpora scores achieved on a certain test set are in bold, mean- that include forms of sexism that are more implicit and while, we underlined the best scores for cross-dataset complicated to detect compared to the existing ones testing. As expected, we can observe that all models had that focus on slurs and offensive terms. We also fo- the highest score for their own set. Meanwhile, for cross- cused on the intersectionality aspects to better explore dataset testing, we can see that the models that tend to online hate. GBV-Maltesi is composed of 2,934 com- perform the best are BERT-PejorativITy and BERT-AMI. ments all annotated by 5 annotators and it is available at We suspect that this is caused by the dataset composi- https://github.com/madeddumarco/GBV-Maltesi. In or- tion as their training sets present more positive examples der to overcome limitations of generic semantic schema, compared to the others. the corpus has been annotated following a new schema Interestingly, we can observe that certain models specifically created for cases of GBV. In the experimental recorded higher scores on other test sets that were not phase of our work, we created different misogyny binary their own. This mostly happens when focusing on BERT- classifiers and tested them in a cross-dataset way. We Maltesi and BERT-Inters8, which record higher scores on found that datasets gathered on keyword collection are AMI and PejorativITy. Even PejorativITy increseases its easier benchmarks as the model showed bias towards scores when tested on AMI. Observing the average scores slurs and not identifying more implicit cases of misog- for each test, we can see that Maltesi and Inters8 are the yny. This research on online discourse about GBV is 9 PejorativITy provides a training and test split, but analyzing the not meant to be exhaustive, as several questions are still code we found that the test set was used as a validation set so we open. decided to create a new one. As future works, we intend to focus on how different References framing of news can cause different online reactions, an- alyzing the differences between video transcripts of femi- [1] M. L. Bonura, Che genere di violenza: conoscere cide news and the comments collected, in terms of words e affrontare la violenza contro le donne, Edizioni used, implicit references, attributions of guilt and descrip- Centro Studi Erickson, 2018. tions of the people involved in the story. We also intend [2] C. Vagnoli, Maledetta sfortuna, Rizzoli, 2021. to gather more annotated corpora regarding femicides [3] J. Posetti, K. Bontcheva, D. Maynard, N. Aboulez, to explore how other characteristics of the victim (e.g., A. Lu, B. Gardiner, S. Torsner, J. Harrison, origin or skin color) and time of the murder differently G. Daniels, F. Chawana, O. Douglas, A. Willis, influence the online reactions. In this regard, we intend F. Martin, L. Barcia, A. Jehangir, J. Price, G. Gober, to explore the question by investigating whether and J. Adams, N. Shabbir, The Chilling: A global study how the discourse on misogyny changes depending on of online violence against women journalists, 2022. whether it is addressed to living or dead women (i.e., Giu- [4] K. R. Blake, S. M. O’Dean, J. Lian, T. F. Denson, lia Cecchettin femicide and abusive discourse against her Misogynistic tweets correlate with violence against sister, Elena Cecchettin). Lastly, we would like to extend women, Psychological science 32 (2021) 315–325. our research by following an intersectional approach, [5] E. Guest, B. Vidgen, A. Mittos, N. Sastry, G. Tyson, considering all the dimensions and characteristics that H. Margetts, An expert annotated dataset for the make up the identity of both victim and perpetrator. To detection of online misogyny, in: P. Merlo, J. Tiede- conclude, we strongly advocate the importance of write mann, R. Tsarfaty (Eds.), Proceedings of the 16th the news correctly, as this has deep consequences on the Conference of the European Chapter of the Associ- readers’ perception and the way they talk about it. ation for Computational Linguistics: Main Volume, Association for Computational Linguistics, On- line, 2021, pp. 1336–1350. URL: https://aclanthology. Ethics Statement org/2021.eacl-main.114. doi:10.18653/v1/2021. eacl-main.114. The dataset was created in accordance with YouTube’s [6] E. Fersini, D. Nozza, P. Rosso, Overview of the Terms of Service. Considering the large number of users evalita 2018 task on automatic misogyny identifica- writing comments collected in the dataset, it was not tion (ami), in: EVALITA@CLiC-it, 2018. URL: https: possible to explicitly ask for their consent. No sensitive //api.semanticscholar.org/CorpusID:56483156. data are provided in the dataset and users’ mentions have [7] S. Gemelli, G. Minnema, Manosphrames: explor- been anonymized to protect their privacy. ing an Italian incel community through the lens All the annotators involved in this research were free of NLP and frame semantics, in: P. Sommerauer, to participate without pressure or obligation. From the T. Caselli, M. Nissim, L. Remijnse, P. Vossen (Eds.), initial stages, they were aware of being free to leave at Proceedings of the First Workshop on Reference, any time without negative consequences. During the Framing, and Perspective @ LREC-COLING 2024, annotation phase, we met several times to make sure that ELRA and ICCL, Torino, Italia, 2024, pp. 28–39. URL: the topic did not disturb them psychologically or emo- https://aclanthology.org/2024.rfp-1.4. tionally. We informed them to take their time, doing the [8] I. Spada, M. Lai, V. Patti, Inters8: A corpus to annotation only when they felt like it and to contact us study misogyny and intersectionality on twitter., for support. This approach continued for all the research in: CLiC-it, 2023. stages. [9] P. Lalli, L’amore non uccide. Femminicidio e dis- corso pubblico: cronaca, tribunali, politiche, Il Acknowledgements Mulino, 2020. [10] A. Tontodimamma, E. Nissi, A. Sarra, L. Fontanella, We would like to thank Chiara Zanchi for discussing with Thirty years of research into hate speech: topics us the direction of this work in its early stages. In addi- of interest and their evolution, Scientometrics 126 tion, we would like to thank Sara Gemelli and Andrea (2021) 157–179. Marra for their contribution to the creation of the anno- [11] A. Ollagnier, E. Cabrio, S. Villata, Unsupervised tation scheme and guidelines. Also, we reiterate our grat- fine-grained hate speech target community detec- itude to the annotators who professionally worked on a tion and characterisation on social media, Social difficult topic like GBV. This work was also partially sup- Network Analysis and Mining 13 (2023) 58. ported by “HARMONIA” project - M4-C2, I1.3 Partenar- [12] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, iati Estesi - Cascade Call - FAIR - CUP C63C22000770006 - V. Patti, Resources and benchmark corpora PE PE0000013 under the NextGenerationEU programme. for hate speech detection: a systematic review, Lang. Resour. Evaluation 55 (2021) 477–523. URL: https://doi.org/10.1007/s10579-020-09502-8. doi:10. NLP in italian: Analyzing responsibility framing in 1007/S10579-020-09502-8. femicide news reports, in: E. Fersini, M. Passarotti, [13] K. W. Crenshaw, Mapping the margins: Inter- V. Patti (Eds.), Proceedings of the Eighth Italian sectionality, identity politics, and violence against Conference on Computational Linguistics, CLiC-it women of color, in: The public nature of private 2021, Milan, Italy, January 26-28, 2022, volume 3033 violence, Routledge, 2013, pp. 93–118. of CEUR Workshop Proceedings, CEUR-WS.org, 2021. [14] K. Manne, Down Girl: The Logic of Misogyny, URL: https://ceur-ws.org/Vol-3033/paper32.pdf. Oxford University Press, 2018. URL: https://books. [24] G. Minnema, S. Gemelli, C. Zanchi, T. Caselli, google.it/books?id=LqPoAQAACAAJ. M. Nissim, Dead or murdered? predicting respon- [15] E. W. Pamungkas, A. T. Cignarella, V. Basile, V. Patti, sibility perception in femicide news reports, in: et al., Automatic identification of misogyny in en- Y. He, H. Ji, S. Li, Y. Liu, C.-H. Chang (Eds.), Pro- glish and italian tweets at evalita 2018 with a multi- ceedings of the 2nd Conference of the Asia-Pacific lingual hate lexicon, in: CEUR Workshop Proceed- Chapter of the Association for Computational Lin- ings, volume 2263, CEUR-WS, 2018, pp. 1–6. guistics and the 12th International Joint Confer- [16] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. ence on Natural Language Processing (Volume 1: Rangel Pardo, P. Rosso, M. Sanguinetti, SemEval- Long Papers), Association for Computational Lin- 2019 task 5: Multilingual detection of hate speech guistics, Online only, 2022, pp. 1078–1090. URL: against immigrants and women in Twitter, in: https://aclanthology.org/2022.aacl-main.79. J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidi- [25] G. Minnema, H. Lai, B. Muscato, M. Nis- anaki, S. M. Mohammad (Eds.), Proceedings of the sim, Responsibility perspective transfer for Ital- 13th International Workshop on Semantic Evalu- ian femicide news, in: A. Rogers, J. Boyd- ation, Association for Computational Linguistics, Graber, N. Okazaki (Eds.), Findings of the As- Minneapolis, Minnesota, USA, 2019, pp. 54–63. URL: sociation for Computational Linguistics: ACL https://aclanthology.org/S19-2007. doi:10.18653/ 2023, Association for Computational Linguistics, v1/S19-2007. Toronto, Canada, 2023, pp. 7907–7918. URL: https: [17] E. W. Pamungkas, V. Basile, V. Patti, Misogyny de- //aclanthology.org/2023.findings-acl.501. doi:10. tection in twitter: a multilingual and cross-domain 18653/v1/2023.findings-acl.501. study, Inf. Process. Manag. 57 (2020) 102360. URL: [26] V. Basile, N. Novielli, D. Croce, F. Barbieri, M. Nis- https://doi.org/10.1016/j.ipm.2020.102360. doi:10. sim, V. Patti, Sentiment polarity classification at 1016/J.IPM.2020.102360. evalita: Lessons learned and open challenges, IEEE [18] P. Zeinert, N. Inie, L. Derczynski, Annotating on- Transactions on Affective Computing 12 (2021) 466– line misogyny, in: Proceedings of the 59th Annual 478. Meeting of the Association for Computational Lin- [27] B. Vidgen, L. Derczynski, Directions in abusive lan- guistics and the 11th International Joint Conference guage training data, a systematic review: Garbage on Natural Language Processing (Volume 1: Long in, garbage out, Plos one 15 (2020) e0243300. Papers), 2021, pp. 3181–3197. [28] J. Fleiss, Measuring nominal scale agreement [19] F. Formato, Gender, discourse and ideology in Ital- among many raters, Psychological Bulletin 76 ian, Springer, 2019. (1971) 378–. doi:10.1037/h0031619. [20] L. Busso, C. R. Combei, O. Tordini, Narrating gender [29] A. Muti, F. Ruggeri, C. Toraman, L. Musetti, violence a corpus-based study on the representa- S. Algherini, S. Ronchi, G. Saretto, C. Zapparoli, tion of gender-based violence in italian media, in: A. Barrón-Cedeño, Pejorativity: Disambiguating Language, Gender and Hate Speech: A Multidisci- pejorative epithets to improve misogyny detection plinary Approach, 2020. in italian tweets, arXiv preprint arXiv:2404.02681 [21] N. Mandolini, Femminicidio, prima e dopo. (2024). un’analisi qualitativa della copertura giornalistica [30] J. Devlin, M. Chang, K. Lee, K. Toutanova, dei casi stefania noce (2011) e sara di pietrantonio BERT: pre-training of deep bidirectional trans- (2016), Problemi dell’informazione 45 (2020) 247– formers for language understanding, CoRR 277. abs/1810.04805 (2018). URL: http://arxiv.org/abs/ [22] E. Pinelli, C. Zanchi, Gender-Based Violence in 1810.04805. arXiv:1810.04805. Italian Local Newspapers: How Argument Struc- [31] M. Polignano, P. Basile, M. de Gemmis, G. Semer- ture Constructions Can Diminish a Perpetrator’s aro, V. Basile, AlBERTo: Italian BERT Language Responsibility, 2021, pp. 117–143. doi:10.1007/ Understanding Model for NLP Challenging Tasks 978-3-030-70091-1_6. Based on Tweets, in: Proceedings of the Sixth [23] G. Minnema, S. Gemelli, C. Zanchi, V. Patti, Italian Conference on Computational Linguistics T. Caselli, M. Nissim, Frame semantics for social (CLiC-it 2019), volume 2481, CEUR, 2019. URL: Dimension Yes % No % Dimension Fleiss’ kappa Subjectivity 70.48% 29.52% Misoginy 0.56 Misogyny 3.76% 96.24% Target 0.48 Polarity-Negative 51.89% 48.11% Intersectionality 0.32 Polarity-Positive 4.93% 95.07% Aggressiveness 0.53 Aggressiveness 24.02% 75.98% Agg. Perpetrator 0.69 Irony 7.09% 92.91% Agg. Victim 0.28 Context 81.48% 18.52% Agg. Social Network 0.23 Agg. Media 0.40 Table 2 Agg. Rape Culture 0.10 Distribution of the dimensions for the DiPietrantonio Dataset Responsibility 0.21 Resp. Perpatrator 0.25 Resp. Victim 0.55 https://www.scopus.com/inward/record.uri? Resp. Social Network 0.13 eid=2-s2.0-85074851349&partnerID=40&md5= Resp. Media 0.23 7abed946e06f76b3825ae5e294ffac14. Resp. Rape Culture 0.19 Empathy towards the event 0.69 [32] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Fo- Humor 0.45 cal loss for dense object detection, in: Proceedings Macabre 0.49 of the IEEE international conference on computer Context -0.11 vision, 2017, pp. 2980–2988. [33] F. Barbieri, V. Basile, D. Croce, M. Nissim, Table 3 N. Novielli, V. Patti, et al., Overview of the evalita Agreement of the Maltesi Dataset 2016 sentiment polarity classification task, in: CEUR Workshop Proceedings, volume 1749, CEUR- Dimension Yes % No % WS, 2016. Misoginy 9.03% 90.97% Intersectionality 4.63% 95.36% Aggressiveness 24% 76% A. Details about the Di Agg. Perpetrator 19.19% 80.81% Agg. Victim 1.23% 98.77% Pietrantonio Dataset Agg. Social Network 0.88% 99.11% Agg. Media 2.73% 97.27% The dataset GBV-DiPietrantonio is composed of 691 Agg. Rape Culture 0.41% 99.59% tweets fully annotated by 3 annotators, 2 of which self- Responsibility 32.89% 67.11% identified as women and 1 as a man. The tweets were Resp. Perpetrator 22.09% 77.91% collected by gathering responses to news which covered Resp. Victim 6.55% 93.45% the news of Di Pietrantonio femicide. The annotation Resp. Social Network 2.11% 97.89% scheme is composed of the slightly modified SENTIPOLC Resp. Media 99.01% 0.99% scheme[33] which consists of Subjectivity, Polarity (Posi- Resp. Rape Culture 4.06% 95.94% tive, Negative) and Irony. In addition the semantic grid Empathy towards the event 28.25% 71.75% Humor 3.14% 96.86% contained Misogyny, Aggressiveness and Target of Ag- Macabre 3.27% 96.72% gressiveness (towards Perpetrator, Victim, Other), Con- Context 97.51% 2.49% text, and Notes. The statistics of the gold standard for the Di Pietranto- Table 4 nio dataset are in Table 2. Distribution of the binary dimensions of the Maltesi Dataset B. Agreement of the Maltesi C. Distributions of the Maltesi Dataset Dataset Table 3 contains the agreement values calcolated with Table 4 contains the distribution of the binary labels in Fleiss’ Kappa for all dimensions in the Maltesi dataset. the Maltesi dataset. Table 5 contains the type of inter- sectionality and table 6 contains the type of misogyny target. Dimension Percentage % Work 96.32% Age 0.73% Work and Education 0.73% Work and Gender 2.20% Table 5 Distribution of the values for the types of intersectionality selected Dimension Percentage % Individual 63.40% Grooup 36.60% Table 6 Distribution of the values for the types of misogyny target selected D. Distributions of the Misogyny Dataset Table 7 contains the details of the other existing misogyny datasets used in the experimental phase. Dataset Topic Num Examples Num Pos. Pos. % Intersectional Hate focusing on Inters8 Islamophobia in the case of hate towards 1,500 288 19.2% Silvia Romano Misogynistic slurs, attacks towards important figures who expressed support AMI 5,000 2,340 46.8% for women rights and posts from misogynistic account Words that can be used as misogynistic Pejorativity pejoratives in online discussion (e.g. 1,200 397 33% Cavalla, cagna,...) Table 7 Distribution of the Italian misogyny Dataset