1. Introduction

Women's Professions and Targeted Misogyny Online

Alessio Cascione

Aldo Cerulli

Marta Marchiori Manerba

Lucia C. Passaro

1 0 Dipartimento di Filologia, Letteratura e Linguistica, Università di Pisa , Via Santa Maria 36, Pisa, 56126 , Italy 1 Dipartimento di Informatica, Università di Pisa , Largo B. Pontecorvo 3, Pisa, 56127 , Italy

With the increasing popularity of social media platforms, the dissemination of misogynistic content has become more prevalent and challenging to address. In this paper, we investigate the phenomenon of online misogyny on Twitter through the lens of hurtfulness, qualifying its diferent manifestation in English tweets considering the profession of the targets of misogynistic attacks. By leveraging manual annotation and a BERTweet model trained for fine-grained misogyny identification, we find that specific types of misogynistic speech are more intensely directed towards particular professions. For example, derailing discourse predominantly targets authors and cultural figures, while dominance-oriented speech and sexual harassment are mainly directed at politicians and athletes. Additionally, we use the HurtLex lexicon and ItEM to assign hurtfulness scores to tweets based on diferent hate speech categories. Our analysis reveals that these scores align with the profession-based distribution of misogynistic speech, highlighting the targeted nature of such attacks.

eol>Abusive Language Online Misogyny Hurtfulness

1. Introduction

social media posts. By examining the correlation between the profession of ofended women and the prevalence Misogyny is a radical manifestation of sexism directed to- of misogynistic attitudes, we aim to shed light on the ward the female gender, which becomes subject of hatred. extent to which misogyny is perpetuated within specific Its efects are widespread and systematic, bearing severe professional domains. both social and individual consequences, such verbal and Fontanella et al. [ 6 ] highlight how research focusing physical violence, rape and femicide. Indeed, misogyny, on automatic detection of misogyny tends to show weak prejudice, and contempt towards women continue to per- connections with other conceptual areas addressing difsist in various forms in our society. While overt acts of ferent aspects of the phenomenon. The finding suggests discrimination and sexism have received attention, it is that current research has not yet adequately addressed crucial to acknowledge that misogyny often manifests the fine-grained manifestations of online misogynistic in subtle and nuanced ways [ 1, 2 ]. Moreover, with the attacks. Our contribution conducts novel analyses to increasing popularity of social media platforms, the dis- uncover and measure misogynistic attitudes within difsemination of misogynistic content has become more ferent professional fields. Specifically, we examine how prevalent and challenging to address [ 3, 4 ]. diferent types of misogyny are distributed across vari

From a socio-historical perspective, women have faced ous women’s professions and how the language used in numerous barriers that limited their access to certain pro- misogynistic posts varies across them. To explore this fessions, hindered their career progression, and subjected relationship, we expand the English misogyny identifithem to belittlement and ofense related to their work [ 5 ]. cation dataset introduced by Fersini et al. [ 7 ], known as These gendered biases not only perpetuate inequality but AMI, by incorporating the professions of the women taralso serve as breeding grounds for misogyny. geted. By adding professional categories to AMI, we en

In this paper, we focus on automated misogyny detec- able novel analyses on how misogynistic attacks against tion, specifically investigating whether diferent profes- women difer based on their profession. Our research is sional roles trigger varying degrees of hurtfulness across driven by the following research questions:

RQ1 How does misogyny distribute across professions? We analyze women’s profession according to the type of misogyny directed towards them.

RQ2 How does the language used in misogynistic tweets vary across diferent professions? We investigate how specific hurtful expressions are directed at specific professions more frequently than others.

To address our RQs, we proceed following the work

PRF Dataset politician 21.84% artist 28.69% athlete 31.05% author 18.42%

Tweets filtering Manual annotation

of profession Automatic annotation of misogyny type

AMI - PRF Dataset lfow depicted in Figure 1. We begin by utilizing a subset yny detection [ 7, 8, 9 ]. Indeed, it is a pressing need to of the AMI dataset, which contains ground-truth annota- develop systems for detecting emotive [ 10, 11 ] and oftions for misogyny. This subset is manually labeled with fensive word lexicons for harassment research [ 12 ], as the professions of the victims of misogynistic attacks, highlighted by Rezvan et al. [13]. Contributing to the field as detailed in Section 3.2. We then employ a misogyny of sexism categorization, Parikh et al. [14] provide a large classifier to automatically annotate with various types of dataset for multi-label classification of sexism. Chiril et al. misogyny a novel collection, the Profession (PRF) dataset, [15] explore the detection of sexist hate speech, examinwhich comprises 760 tweets labeled with professions. The ing the relationship between gender stereotype detection ifnal step involves combining the manually annotated and sexism classification. Similarly, Felmlee et al. [16] AMI subset with the automatically annotated PRF dataset, investigate online aggression towards women on social resulting in the AMI-PRF dataset1. This enriched dataset media platforms, focusing on the strategic nature of sexprovides a resource that enables a thorough investigation ist tweets and the reinforcement of stereotypes. of the phenomenon. Emphasizing the interaction and co-influence of so

The remainder of this paper is organized as follows. cial dimensions, like gender and profession, can assist Section 2 discusses previous works that closely related to in capturing complex social dynamics and informing the ours, while Section 3 details the enrichment of the AMI development of norms that promote equity and justice, dataset with professional categories. Section 4 reports as outlined by Hancock [17] and Dhamoon [18]. Specifithe experiments conducted to answer our RQs, whereas cally, previous social science research has examined hate Section 5 outlines conclusions, limitations, and future discourse directed at specific groups of women, such as directions of the work. politicians and celebrities. For example, Silva-Paredes and Ibarra Herrera [19] ofer a corpus-based analysis of gender-based aggression towards a Chilean right-wing 2. Related Work female politician, while Phipps and Montgomery [20] and Ritchie [21] focus on forms of hate speech in meIn recent years, the field of NLP has witnessed a grow- dia campaigns against Nancy Pelosi and Hillary Clining interest in detecting misogyny and sexist content ton, respectively. Specifically for tweets, Saluja and Thion social media platforms. Various works have signifi- laka [22] employ the Feminist Critical Discourse Theory cantly contributed to this area by publicly introducing to perform gender-specific inferences w.r.t. Twitter disdiverse datasets and evaluation tasks tailored for misog- course concerning Indian political leaders. On the other hand, Ghafari [23] analyzes 2000 user-generated posts 1Temheaidlaftraosmet tisheacacuetshsiobrlse. fToor rpersoetaerccthtphueripdoesnetsitbieysroeqfuthesetianfegctietdby focusing on American celebrity Lena Dunham, examinwomen, we chose to omit explicit references to profiles and original ing manifestations of hate and stereotypes. To the best tweet IDs from the dataset. of our knowledge, this is the first data-driven work that examines the relationship between women professional categories and types of misogynistic attacks on online platforms.

3. Data Exploration and Enrichment In this section, we detail the construction of our novel AMI-PRF dataset. 3.1. AMI Dataset We address the lack of misogynous data annotated w.r.t.

victims’ professions by enriching the AMI dataset2 [ 7 ]. by examining relevant job details in the tweet content The dataset includes a coarse-grained distinction between or on the profile page of the victim, if mentioned. For misogynistic and not-misogynistic tweets, as well as a such cases, a collaborative approach was taken during ifne-grained labeling for misogynistic tweets, categoriz- group meetings to share general insights, ensuring that ing them into five diferent types of misogynistic hate any disagreements were addressed through discussions speech: derailing (to justify women abuse), discredit and ultimately resolved through consensus. In absence (general slurring), dominance (to assert men superior- of clues regarding the profession, the tweet is simply ity), sexual harassment (sexual advances and violence) labeled as ‘generic’. and stereotype (oversimplification and objectification). Finally, we point out that not all tweets in the AMI

We enrich AMI by adding information about the pro- dataset have women as victims. In several cases, misogyfessions of the victims. This enrichment is performed nist language is used to insult men, companies or politithrough retrieving from Wikidata3 professional figures cal parties. Out of 5000 AMI tweets, we initially filtered that are subclasses of the person class. out those that were not directed at women. Among the

Our annotation of professions include four categories, remaining tweets, 2187 were labelled as misogynistic. namely ‘artist’, ‘author’, ‘athlete’, ‘politician (and ac- However, we were able to obtain professional categories tivist)’. We focus on these professions as they are repre- for only a subset of 380 of these tweets, highlighting the sented in the AMI dataset, based on the popular women need for additional data collection. referenced. Although the first two are both subclasses of creator, which is an immediate subclass of person, we 3.2. PRF Dataset keep them separate due to their diferent natures: the former encompasses visual and performing arts, the lat- To address the issue of having only a small number of ter intellectual activities. On the other hand, we choose tweets annotated for both misogyny and profession, we to group politicians and activists together to highlight crawl additional tweets. From the most common exprestheir shared involvement in public social activities, even sions in the misogynistic tweets of AMI, we derive a list though they are not directly related according to Wiki- of misogynistic keywords. For each of our target profesdata taxonomy. sions, we choose five representative popular women, col

As shown by Fig. 4 (Appendix A), each macro- lecting tweets containing a reference to them in the form profession initiates a potentially large set of nested sub- of a hashtag, mention and/or explicit name and surname. professions based on Wikidata subclass of relationship. As a result, we extract 760 tweets labeled with profes

We leverage these professions to manually label AMI sions, which have been posted before the beginning of misogynistic tweets that actually refer to women. In February 2023: we refer to this collection as the Profesorder to produce a consistent labeling, we establish the sion (PRF) dataset. Since these tweets are filtered using following conventions: if the tweet refers to a famous specific keywords and are directed at popular women, woman, we choose the first (or unique) occupation among we consider them inherently misogynistic, as a woman those appearing on her Wikidata page, tracing it back to is the primary target of hate speech. the appropriate macro-category. This approach mitigates To identify the type of misogyny in PRF, we leverannotation inconsistencies by leveraging an established age BERTweet4, a transformer-based [24] model trained external resource for labeling. When such information on the AMI multi-classification dataset. We opt for this is unavailable, we determine the professional category model since it is pre-trained on Twitter, and it achieves 2https://live.european-language-grid.eu/catalogue/corpus/7272 3https://www.wikidata.org/wiki/Wikidata:Main_Page 4https://github.com/VinAIResearch/BERTweet state-of-the-art performance in Twitter sentiment analysis tasks [25]. Before training, the AMI tweets are preprocessed with a TweetNormalizer function5 which maps emojis into text strings and substitutes user mentions and web/url links with @USER and HTTPURL placeholders. For model selection, we perform a stratified cross-validation with k = 5. We search for the best weight decay and learning rate in [1e-2,1e-5] and [1e-5,3e-5], respectively.

For each configuration, we set 10 epochs, 500 warm up steps and a train/validation batch of 16/8. The optimal performance is achieved with a learning rate of 3e-5 and a weight decay of 1e-2. Tab. 1 shows BERTweet performances for the multi-class misogyny detection task on AMI test set, comprising 1000 tweets (460 misogynistic).

For the multi-classification task, we focus only on misogynistic tweets. The evaluation metrics include Accuracy, as well as weighted and unweighted average Precision, Recall, and F1-score. We adopt this model to label our PRF dataset with types of misogyny.

AMI-PRF Dataset By combining the 380 tweets from

AMI, having ground-truth information regarding the type of misogyny, and the PRF dataset, labeled with our trained model, we obtain 1140 tweets featuring both misogyny type and professions. Such dataset, named AMI-PRF, is leveraged to investigate the relation between misogyny and professions.

4. Experiments and Data Analyses 4.1. Misogyny Type by Profession (RQ1) To address RQ1, we examine how diferent types of misog

ynistic speech are distributed across various professions in AMI-PRF. For each type of misogyny, we find how many tweets belonging to such class are directed towards a specific profession and qualitatively compare the results in Fig. 2.

Discussion We observe distinct patterns in the usage of misogynistic speech across professions: derailing discourse, which focuses on justifying women abuse and rejecting male responsibility, tends to primarily target authors compared to the other professions. This aligns with the nature of derailing speech, which seeks to rationalize mistreatment of women and deflect male accountability. Therefore, this kind of discourse can be expected to be commonly directed at public intellectuals or cultural ifgures. In contrast, dominance-oriented misogynistic discourse, aimed at asserting male superiority along with stereotypical negative speech, is predominantly directed at powerful figures such as politicians. This prevalence

5https://github.com/VinAIResearch/BERTweet/blob/master/

TweetNormalizer.py could be explained as an attempt to undermine the legitimacy and value of women holding relevant public roles. Sexual harassment is notably prevalent towards politicians and athletes, as expressions of intent to assert power over women through threats of violence.

4.2. Hurtfulness by Profession (RQ2) To address RQ2 – whether specific hurtful expressions target women in certain professions – we define a quantitative lexicon-based measure for assessing the hurtfulness of tweets.

Hurtfulness Evaluation To define a hurtfulness measure for tweets, we leverage the HurtLex lexicon, which compiles ofensive words and stereotyped expressions aimed at insulting and degrading marginalized individuals and groups [26]. HurtLex organizes words into 17 ifne-grained categories, each identifying a specific target or form of ofense.

Inspired by the work of Nozza et al. [ 12 ], where a harmful sentence completions indicator is defined for generative language models, we employ a subset of 9 HurtLex categories for our purposes: animals, prostitution, professions, negative connotations, homosexuality, male genitalia, female genitalia, derogatory terms, and crime6. The hurtfulness score for a tweet w.r.t. one of the 9 categories could be computed as the ratio of HurtLex lemmas7 from that category to the total HurtLex lemmas from any category present in the tweet. However, an approach relying solely on the HurtLex lexicon would not provide a suficiently comprehensive analysis, as HurtLex has low coverage of the vocabulary in the AMI-PRF dataset, with only 15.42% of the lemmas in a tweet occurring in HurtLex on average.

6For detailed descriptions of each category, we refer to Bassignana

et al. [26]. 7We retain only conservative-level lemmas. ∑︀∈t (, , ℎ) (2)

8https://github.com/Unipisa/ItEM/ 9https://github.com/FredericGodin/TwitterEmbeddings

To enhance our reference vocabulary, we leverage ItEM8, a methodology proposed by Passaro and Lenci [ 10 ]. For each lemma in the HurtLex subset, we obtain its vectorial representation using ItEM and the Word2vec where is the number of lemmas in t which occur in Twitter embeddings9, following Godin [27]. For each . This allows us to obtain, for each tweet-category pair, category, we compute a centroid embedding by averag- a score between [ 0, 1 ], indicating the tweet hurtfulness ing the vectors associated with each lemma in that cate- tendency. gory. This allows us to represent each category through a unique embedding. Tab. 2 reports the average cosine Discussion Fig. 3 provides a visual analysis of the resimilarity between lemmas of a specific category and the sults. The Emotive score is computed category-wise as respective centroid. Finally, we compute the cosine sim- the average of the scores for each tweet, after having ilarity between each word embedding in the Word2vec standardized the values with a z-score approach. We Twitter vocabulary and each centroid, thus creating a keep a ℎ of 0.2 in terms of cosine similarity to filter new lexicon featuring a coverage of 76.51% w.r.t. the out excessively noisy category associations, while still AMI-PRF dataset. allowing low values to contribute to the average score.

We leverage the similarity scores to define a hurtful This provides a general overview on the hurtful language emotive score for each tweet as follows: let t be a lem- across diferent professions. According to the Emotive matized tweet, a lemma in t, one of the 9 HurtLex analysis, politicians are mainly targeted with insults recategories, ˜ the centroid of category , the cosine sim- lated to crime, homosexuality and male genitalia. This is ilarity function and the set of vocabulary items, i.e. consistent with what has been observed in Fig. 2, where the words for which we have a Twitter emmbedding. For forms of sexual harassment discourse were mainly dieach ∈ , we define the function as: rected toward political figures. For artists, we notice a peak w.r.t. female genitalia, while for athletes we register {︃(, ˜) if (, ˜) ≥ ℎ a more balanced trend, except for a peak in negative con (, ˜, ℎ) = (1) notation. On the other hand, authors seem to be mainly 0 if (, ˜) < ℎ targeted with crime and profession-related topics, conwhere ℎ designates a threshold in [ 0, 1 ] range. In sistent with the fact that the type of misogyny mostly other words, the function outputs the cosine sim- inflicted towards this profession consists of derailing and ilarity value between and ’s centroid if such value stereotypes. is greater or equal then ℎ, while it outputs 0 if it is lower than ℎ. Additionally, if is not found in the 5. Conclusion vocabulary, its value is also considered 0.

The Emotive score for a tweet t w.r.t. a category and a threshold ℎ is then computed as:

In this paper, we investigated the phenomenon of misogyny on Twitter through the lens of hurtfulness, qualifying its diferent manifestation considering the profession of the targets of the misogynistic attacks.

Specifically, we examined how diferent types of misogyny are distributed across various professions, unveiling how derailing discourse is mostly used to attack authors, while dominance and sexual harassment speech targets especially politicians.

Additionally, we studied through a hurtfulness score measure how the language used in misogynistic tweets varies across diferent professions: politicians tend to be targeted with hate speech revolving around sexuality (female/male genitalia, homosexuality) and crime, while artists seem to be insulted mainly through general derogatory terms. On the other hand, less heterogeneous results were obtained for athletes and authors, except for peaks in hurtful topics regarding crimes and professions.

We acknowledge two potential limitations of our contribution: the incomplete coverage of our dataset’s vocabulary by the Hurtlex-based ItEM lexicon, and our decision to focus on just four professions, which, as motivated, was guided by the representation of those professions in the AMI dataset. We therefore plan to extend the approach adopting a richer vocabulary w.r.t. datasets as well as expanding the set of professions. Indeed, as further future investigations, it could be assessed how hurtfulness dimensions change using diferent lexicons or automatic approaches. We also intend to investigate the distribution of misogynistic language both textual and multi-modal, as well as the broader expression of emotions in posts associated with diferent professions.

Acknowledgments Research partially funded by PNRR-PE00000013 “FAIR

- Future Artificial Intelligence Research” - Spoke 1 “Human-centered AI” under NextGeneration EU, ERC2018-ADG G.A. 834756 XAI: Science and technology for the eXplanation of AI decision making under Horizon 2020, and PRIN 2022 PIANO (Personalized Interventions Against Online Toxicity) project, CUP B53D23013290006. hurtful sentence completion in language models, Prevent This Nightmare, America”: Nancy Pelosi in: K. Toutanova, A. Rumshisky, L. Zettlemoyer, As the Monstrous-Feminine in Donald Trump’s D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, YouTube Attacks, Women’s Studies in CommuT. Chakraborty, Y. Zhou (Eds.), Proceedings of the nication 45 (2022) 316–337. 2021 Conference of the North American Chapter [21] J. Ritchie, Creating a monster: Online media conof the Association for Computational Linguistics: structions of Hillary Clinton during the democratic Human Language Technologies, NAACL-HLT 2021, primary campaign, 2007–8, Feminist Media Studies Online, June 6-11, 2021, Association for Computa- 13 (2013) 102–119.

tional Linguistics, 2021, pp. 2398–2406. [22] N. Saluja, N. Thilaka, Women leaders and digi[13] M. Rezvan, S. Shekarpour, L. Balasuriya, tal communication: Gender stereotyping of female K. Thirunarayan, V. L. Shalin, A. P. Sheth, politicians on twitter, Journal of Content, CommuA quality type-aware annotated corpus and lexicon nity & Communication 7 (2021) 227–241. for harassment research, in: H. Akkermans, [23] S. Ghafari, Discourses of celebrities on instaK. Fontaine, I. E. Vermeulen, G. Houben, M. S. We- gram: digital femininity, self-representation and ber (Eds.), Proceedings of the 10th ACM Conference hate speech, in: Social Media Critical Discourse on Web Science, WebSci 2018, Amsterdam, The Studies, Routledge, 2023, pp. 43–60. Netherlands, May 27-30, 2018, ACM, 2018, pp. 33– [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, 36. URL: https://doi.org/10.1145/3201064.3201103. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, doi:10.1145/3201064.3201103. Attention is all you need, in: I. Guyon, U. von [14] P. Parikh, H. Abburi, P. Badjatiya, R. Krishnan, Luxburg, S. Bengio, H. M. Wallach, R. Fergus, N. Chhaya, M. Gupta, V. Varma, Multi-label cat- S. V. N. Vishwanathan, R. Garnett (Eds.), Adegorization of accounts of sexism using a neural vances in Neural Information Processing Systems framework, in: Proceedings of the 2019 Confer- 30: Annual Conference on Neural Information ence on Empirical Methods in Natural Language Processing Systems 2017, December 4-9, 2017, Processing and the 9th International Joint Con- Long Beach, CA, USA, 2017, pp. 5998–6008. URL: ference on Natural Language Processing (EMNLP- https://proceedings.neurips.cc/paper/2017/hash/ IJCNLP), Association for Computational Linguis- 3f5ee243547dee91fbd053c1c4a845aa-Abstract. tics, Hong Kong, China, 2019, pp. 1642–1652. URL: html. https://aclanthology.org/D19-1174. doi:10.18653/ [25] S. Barreto, R. Moura, J. Carvalho, A. Paes, A. Plasv1/D19-1174. tino, Sentiment analysis in tweets: an assessment [15] P. Chiril, F. Benamara, V. Moriceau, “be nice to study from classical to modern word representation your wife! the restaurants are closed”: Can gender models, Data Min. Knowl. Discov. 37 (2023) 318–380. stereotype detection improve sexism classification?, URL: https://doi.org/10.1007/s10618-022-00853-0. in: Findings of the Association for Computational doi:10.1007/S10618-022-00853-0. Linguistics: EMNLP 2021, Association for Compu- [26] E. Bassignana, V. Basile, V. Patti, Hurtlex: A multational Linguistics, Punta Cana, Dominican Repub- tilingual lexicon of words to hurt, in: E. Cabrio, lic, 2021, pp. 2833–2844. URL: https://aclanthology. A. Mazzei, F. Tamburini (Eds.), Proceedings of the org/2021.findings-emnlp.242. doi: 10.18653/v1/ Fifth Italian Conference on Computational Lin2021.findings-emnlp.242. guistics (CLiC-it 2018), Torino, Italy, December 10[16] D. Felmlee, P. Inara Rodis, A. Zhang, Sexist slurs: 12, 2018, volume 2253 of CEUR Workshop ProceedReinforcing feminine stereotypes online, Sex Roles ings, CEUR-WS.org, 2018. URL: https://ceur-ws.org/ 83 (2020) 16–28. Vol-2253/paper49.pdf . [17] A.-M. Hancock, When multiplication doesn’t equal [27] F. Godin, Improving and interpreting neural netquick addition: Examining intersectionality as a works for word-level prediction tasks in natural research paradigm, Perspectives on politics 5 (2007) language processing, Ghent University, Belgium 63–79. (2019). [18] R. K. Dhamoon, Considerations on mainstreaming intersectionality, Political research quarterly 64 (2011) 230–243. [19] D. Silva-Paredes, D. Ibarra Herrera, Resisting antidemocratic values with misogynistic abuse against a chilean right-wing politician on twitter: The# camilapeluche incident, Discourse & Communication 16 (2022) 426–444. [20] E. B. Phipps, F. Montgomery, “Only YOU Can In Figure 4, we display the tree of nested professions based on the Wikidata taxonomy concerning the popular women selected to collect the PRF dataset (§3.2). Branches identify Wikidata subclass of relationships, while dashed marks the connections between women and the first (or unique) occupation appearing on their Wikidata pages.We avoid reporting women’s names to maintain anonymity.

Person Environmentalist Paocltiitviicsatl POLITICIAN

ARTIST

[1] M. E. David, Reclaiming feminism: Challenging everyday misogyny , Policy Press, 2016 .

[2]

Tileagă , Communicating misogyny: An interdisciplinary research agenda for social psychology , Social and Personality Psychology Compass 13 ( 2019 ) e12491 .

[3]

E. A.

Jane , ' Back to the kitchen, cunt': Speaking the unspeakable about online misogyny , Continuum 28 ( 2014 ) 558 - 570 .

[4]

Ging , E. Siapera, Special issue on online misogyny, Feminist media studies 18 ( 2018 ) 515 - 524 .

[5]

Marques , Exploring gender at work, Springer, 2021 .

[6]

Fontanella ,

Chulvi ,

Ignazzi ,

Sarra ,

Tontodimamma , How do we study misogyny in the digital age? A systematic literature review using a computational linguistic approach , Humanities and Social Sciences Communications 11 ( 2024 ) 1 - 15 .

[7]

Fersini ,

Nozza ,

Rosso , Overview of the evalita 2018 task on automatic misogyny identiifcation (AMI), in: Tommaso Caselli and Nicole Novielli and Viviana Patti and Paolo Rosso (Ed.), Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018 ) colocated with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018 ), Turin, Italy, December 12-13 , 2018 , volume 2263 of CEUR Workshop Proceedings, CEUR-WS.org , 2018 . URL: http: //ceur-ws. org/ Vol- 2263 /paper009.pdf .

[8]

Basile ,

Bosco ,

Fersini ,

Nozza ,

Patti ,

F. M.

Rangel Pardo ,

Rosso , M. Sanguinetti, SemEval2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter , in: Proceedings of the 13th International Workshop on Semantic Evaluation , Association for Computational Linguistics , Minneapolis, Minnesota, USA, 2019 , pp. 54 - 63 . URL: https://aclanthology.org/S19-2007. doi: 10 .18653/v1/ S19 -2007.

[9]

Zampieri ,

Nakov ,

Rosenthal ,

Atanasova , G. Karadzhov,

Mubarak ,

Derczynski ,

Pitenis , Ç. Çöltekin, SemEval-2020 task 12: Multilingual ofensive language identification in social media (OfensEval 2020) , in: Proceedings of the Fourteenth Workshop on Semantic Evaluation , International Committee for Computational Linguistics, Barcelona (online) , 2020 , pp. 1425 - 1447 . URL: https://aclanthology.org/ 2020 .semeval- 1 .188. doi: 10 .18653/v1/ 2020 .semeval- 1 . 188 .

[10]

L. C.

Passaro ,

Lenci , Evaluating context selection strategies to build emotive vector space models , in: N. Calzolari , K.

Choukri , T.

Declerck , S.

Goggi , M.

Grobelnik , B.

Maegaard , J.

Mariani , H.

Mazo , A.

Moreno , J.

Odijk , S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016 , Portorož, Slovenia, May 23 -28, 2016 ,

European

Language Resources Association (ELRA), 2016 . URL: http://www.lrec-conf.org/proceedings/ lrec2016/summaries/637.html.

[11]

Bondielli ,

L. C.

Passaro , Leveraging CLIP for image emotion recognition , in: E. Cabrio , D.

Croce , L. C.

Passaro , R. Sprugnoli (Eds.), Proceedings of the Fifth Workshop on Natural Language for Artificial Intelligence (NL4AI 2021 ) co-located with 20th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2021), Online event , November 29 , 2021 , volume 3015 of CEUR Workshop Proceedings, CEUR-WS.org , 2021 . URL: https://ceur-ws. org/ Vol- 3015 /paper172.pdf .

[12]

Nozza ,

Bianchi ,

Hovy , HONEST: measuring