=Paper= {{Paper |id=Vol-3878/21_main_long |storemode=property |title=Women's Professions and Targeted Misogyny Online |pdfUrl=https://ceur-ws.org/Vol-3878/21_main_long.pdf |volume=Vol-3878 |authors=Alessio Cascione,Aldo Cerulli,Marta Marchiori Manerba,Lucia Passaro |dblpUrl=https://dblp.org/rec/conf/clic-it/CascioneCMP24 }} ==Women's Professions and Targeted Misogyny Online== https://ceur-ws.org/Vol-3878/21_main_long.pdf

Women’s Professions and Targeted Misogyny Online
Alessio Cascione1,* , Aldo Cerulli2,* , Marta Marchiori Manerba1 and Lucia C. Passaro1
1
Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo 3, Pisa, 56127, Italy
2
Dipartimento di Filologia, Letteratura e Linguistica, Università di Pisa, Via Santa Maria 36, Pisa, 56126, Italy

Abstract
With the increasing popularity of social media platforms, the dissemination of misogynistic content has become more prevalent
and challenging to address. In this paper, we investigate the phenomenon of online misogyny on Twitter through the lens of
hurtfulness, qualifying its different manifestation in English tweets considering the profession of the targets of misogynistic
attacks. By leveraging manual annotation and a BERTweet model trained for fine-grained misogyny identification, we find
that specific types of misogynistic speech are more intensely directed towards particular professions. For example, derailing
discourse predominantly targets authors and cultural figures, while dominance-oriented speech and sexual harassment are
mainly directed at politicians and athletes. Additionally, we use the HurtLex lexicon and ItEM to assign hurtfulness scores
to tweets based on different hate speech categories. Our analysis reveals that these scores align with the profession-based
distribution of misogynistic speech, highlighting the targeted nature of such attacks.

Keywords
Abusive Language, Online Misogyny, Hurtfulness

1. Introduction social media posts. By examining the correlation between
the profession of offended women and the prevalence
Misogyny is a radical manifestation of sexism directed to- of misogynistic attitudes, we aim to shed light on the
ward the female gender, which becomes subject of hatred. extent to which misogyny is perpetuated within specific
Its effects are widespread and systematic, bearing severe professional domains.
both social and individual consequences, such verbal and Fontanella et al. [6] highlight how research focusing
physical violence, rape and femicide. Indeed, misogyny, on automatic detection of misogyny tends to show weak
prejudice, and contempt towards women continue to per- connections with other conceptual areas addressing dif-
sist in various forms in our society. While overt acts of ferent aspects of the phenomenon. The finding suggests
discrimination and sexism have received attention, it is that current research has not yet adequately addressed
crucial to acknowledge that misogyny often manifests the fine-grained manifestations of online misogynistic
in subtle and nuanced ways [1, 2]. Moreover, with the attacks. Our contribution conducts novel analyses to
increasing popularity of social media platforms, the dis- uncover and measure misogynistic attitudes within dif-
semination of misogynistic content has become more ferent professional fields. Specifically, we examine how
prevalent and challenging to address [3, 4]. different types of misogyny are distributed across vari-
From a socio-historical perspective, women have faced ous women’s professions and how the language used in
numerous barriers that limited their access to certain pro- misogynistic posts varies across them. To explore this
fessions, hindered their career progression, and subjected relationship, we expand the English misogyny identifi-
them to belittlement and offense related to their work [5]. cation dataset introduced by Fersini et al. [7], known as
These gendered biases not only perpetuate inequality but AMI, by incorporating the professions of the women tar-
also serve as breeding grounds for misogyny. geted. By adding professional categories to AMI, we en-
In this paper, we focus on automated misogyny detec- able novel analyses on how misogynistic attacks against
tion, specifically investigating whether different profes- women differ based on their profession. Our research is
sional roles trigger varying degrees of hurtfulness across driven by the following research questions:

CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, RQ1 How does misogyny distribute across pro-
Dec 04 — 06, 2024, Pisa, Italy fessions? We analyze women’s profession ac-
*
Corresponding authors. These authors contributed equally. cording to the type of misogyny directed towards
$ a.cascione@studenti.unipi.it (A. Cascione); them.
a.cerulli1@studenti.unipi.it (A. Cerulli);
RQ2 How does the language used in misogynistic
marta.marchiori@phd.unipi.it (M. Marchiori Manerba);
lucia.passaro@unipi.it (L. C. Passaro) tweets vary across different professions? We
https://martamarchiori.github.io/ (M. Marchiori Manerba); investigate how specific hurtful expressions are
https://luciacpassaro.github.io/ (L. C. Passaro) directed at specific professions more frequently
0009-0003-5043-5942 (A. Cascione); 0000-0002-0877-7063 than others.
(M. Marchiori Manerba); 0000-0003-4934-534 (L. C. Passaro)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). To address our RQs, we proceed following the work-

CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
AMI Dataset

mis 43.90%

der 4.61%
AMI - PRF Dataset
dis 51.35% Tweets filtering
dom 12.30% Manual annotation
der 4.91% 30.10% politician
sex 17.28% of profession
dis 54.30% 29.38% artist
ste 14.44%
dom 9.74% 21.49% athlete
not
56.10% 19.03% author
mis sex 11.84%

ste 19.21%

PRF Dataset

politician 21.84%

artist 28.69%
Automatic annotation
of misogyny type
athlete 31.05%

author 18.42%

Figure 1: A subset of the AMI dataset, containing ground-truth misogyny annotations, is manually labeled with the professions
of victims of misogynistic attacks, as detailed in Section 3. The PRF dataset, featuring professions by-design, is extracted and
automatically annotated with misogyny types using a BERTweet model trained on the AMI dataset. The manually annotated
AMI subset and the automatically annotated PRF dataset are then combined to form the AMI-PRF dataset. Labels distributions
of each dataset are displayed in the workflow.

flow depicted in Figure 1. We begin by utilizing a subset yny detection [7, 8, 9]. Indeed, it is a pressing need to
of the AMI dataset, which contains ground-truth annota- develop systems for detecting emotive [10, 11] and of-
tions for misogyny. This subset is manually labeled with fensive word lexicons for harassment research [12], as
the professions of the victims of misogynistic attacks, highlighted by Rezvan et al. [13]. Contributing to the field
as detailed in Section 3.2. We then employ a misogyny of sexism categorization, Parikh et al. [14] provide a large
classifier to automatically annotate with various types of dataset for multi-label classification of sexism. Chiril et al.
misogyny a novel collection, the Profession (PRF) dataset, [15] explore the detection of sexist hate speech, examin-
which comprises 760 tweets labeled with professions. The ing the relationship between gender stereotype detection
final step involves combining the manually annotated and sexism classification. Similarly, Felmlee et al. [16]
AMI subset with the automatically annotated PRF dataset, investigate online aggression towards women on social
resulting in the AMI-PRF dataset1 . This enriched dataset media platforms, focusing on the strategic nature of sex-
provides a resource that enables a thorough investigation ist tweets and the reinforcement of stereotypes.
of the phenomenon. Emphasizing the interaction and co-influence of so-
The remainder of this paper is organized as follows. cial dimensions, like gender and profession, can assist
Section 2 discusses previous works that closely related to in capturing complex social dynamics and informing the
ours, while Section 3 details the enrichment of the AMI development of norms that promote equity and justice,
dataset with professional categories. Section 4 reports as outlined by Hancock [17] and Dhamoon [18]. Specifi-
the experiments conducted to answer our RQs, whereas cally, previous social science research has examined hate
Section 5 outlines conclusions, limitations, and future discourse directed at specific groups of women, such as
directions of the work. politicians and celebrities. For example, Silva-Paredes
and Ibarra Herrera [19] offer a corpus-based analysis of
gender-based aggression towards a Chilean right-wing
2. Related Work female politician, while Phipps and Montgomery [20]
and Ritchie [21] focus on forms of hate speech in me-
In recent years, the field of NLP has witnessed a grow-
dia campaigns against Nancy Pelosi and Hillary Clin-
ing interest in detecting misogyny and sexist content
ton, respectively. Specifically for tweets, Saluja and Thi-
on social media platforms. Various works have signifi-
laka [22] employ the Feminist Critical Discourse Theory
cantly contributed to this area by publicly introducing
to perform gender-specific inferences w.r.t. Twitter dis-
diverse datasets and evaluation tasks tailored for misog-
course concerning Indian political leaders. On the other
1
hand, Ghaffari [23] analyzes 2000 user-generated posts
The dataset is accessible for research purposes by requesting it by
email from the authors. To protect the identities of the affected
focusing on American celebrity Lena Dunham, examin-
women, we chose to omit explicit references to profiles and original ing manifestations of hate and stereotypes. To the best
tweet IDs from the dataset. of our knowledge, this is the first data-driven work that
examines the relationship between women professional Table 1
categories and types of misogynistic attacks on online BERTweet multi-classification results on AMI test set.
platforms. support% Precision Recall F1-score
der 2.391% 0.250 0.273 0.261
3. Data Exploration and dis 30.65% 0.626 0.794 0.700
dom 26.95% 0.811 0.484 0.606
Enrichment sex 9.565% 0.500 0.773 0.607
ste 30.43% 0.906 0.821 0.861
In this section, we detail the construction of our novel
AMI-PRF dataset. Macro Avg, - 0.618 0.629 0.607
Wtd. Avg. - 0.740 0.704 0.704
Accuracy - - - 0.704
3.1. AMI Dataset
We address the lack of misogynous data annotated w.r.t.
victims’ professions by enriching the AMI dataset2 [7]. by examining relevant job details in the tweet content
The dataset includes a coarse-grained distinction between or on the profile page of the victim, if mentioned. For
misogynistic and not-misogynistic tweets, as well as a such cases, a collaborative approach was taken during
fine-grained labeling for misogynistic tweets, categoriz- group meetings to share general insights, ensuring that
ing them into five different types of misogynistic hate any disagreements were addressed through discussions
speech: derailing (to justify women abuse), discredit and ultimately resolved through consensus. In absence
(general slurring), dominance (to assert men superior- of clues regarding the profession, the tweet is simply
ity), sexual harassment (sexual advances and violence) labeled as ‘generic’.
and stereotype (oversimplification and objectification). Finally, we point out that not all tweets in the AMI
We enrich AMI by adding information about the pro- dataset have women as victims. In several cases, misogy-
fessions of the victims. This enrichment is performed nist language is used to insult men, companies or politi-
through retrieving from Wikidata3 professional figures cal parties. Out of 5000 AMI tweets, we initially filtered
that are subclasses of the person class. out those that were not directed at women. Among the
Our annotation of professions include four categories, remaining tweets, 2187 were labelled as misogynistic.
namely ‘artist’, ‘author’, ‘athlete’, ‘politician (and ac- However, we were able to obtain professional categories
tivist)’. We focus on these professions as they are repre- for only a subset of 380 of these tweets, highlighting the
sented in the AMI dataset, based on the popular women need for additional data collection.
referenced. Although the first two are both subclasses
of creator, which is an immediate subclass of person, we 3.2. PRF Dataset
keep them separate due to their different natures: the
former encompasses visual and performing arts, the lat- To address the issue of having only a small number of
ter intellectual activities. On the other hand, we choose tweets annotated for both misogyny and profession, we
to group politicians and activists together to highlight crawl additional tweets. From the most common expres-
their shared involvement in public social activities, even sions in the misogynistic tweets of AMI, we derive a list
though they are not directly related according to Wiki- of misogynistic keywords. For each of our target profes-
data taxonomy. sions, we choose five representative popular women, col-
As shown by Fig. 4 (Appendix A), each macro- lecting tweets containing a reference to them in the form
profession initiates a potentially large set of nested sub- of a hashtag, mention and/or explicit name and surname.
professions based on Wikidata subclass of relationship. As a result, we extract 760 tweets labeled with profes-
We leverage these professions to manually label AMI sions, which have been posted before the beginning of
misogynistic tweets that actually refer to women. In February 2023: we refer to this collection as the Profes-
order to produce a consistent labeling, we establish the sion (PRF) dataset. Since these tweets are filtered using
following conventions: if the tweet refers to a famous specific keywords and are directed at popular women,
woman, we choose the first (or unique) occupation among we consider them inherently misogynistic, as a woman
those appearing on her Wikidata page, tracing it back to is the primary target of hate speech.
the appropriate macro-category. This approach mitigates To identify the type of misogyny in PRF, we lever-
annotation inconsistencies by leveraging an established age BERTweet4 , a transformer-based [24] model trained
external resource for labeling. When such information on the AMI multi-classification dataset. We opt for this
is unavailable, we determine the professional category model since it is pre-trained on Twitter, and it achieves
2
https://live.european-language-grid.eu/catalogue/corpus/7272
3 4
https://www.wikidata.org/wiki/Wikidata:Main_Page https://github.com/VinAIResearch/BERTweet
state-of-the-art performance in Twitter sentiment analy-
sis tasks [25]. Before training, the AMI tweets are prepro-
cessed with a TweetNormalizer function5 which maps
emojis into text strings and substitutes user mentions and
web/url links with @USER and HTTPURL placeholders. For
model selection, we perform a stratified cross-validation
with k = 5. We search for the best weight decay and
learning rate in [1e-2,1e-5] and [1e-5,3e-5], respectively.
For each configuration, we set 10 epochs, 500 warm up
steps and a train/validation batch of 16/8. The optimal
performance is achieved with a learning rate of 3e-5 and
a weight decay of 1e-2. Tab. 1 shows BERTweet perfor- Figure 2: Alluvial plot depicting the relationship between
mances for the multi-class misogyny detection task on misogyny types and professions. Thicker streams indicate a
AMI test set, comprising 1000 tweets (460 misogynistic). higher number of tweets corresponding to the misogyny type
For the multi-classification task, we focus only on misog- originating from the respective block.
ynistic tweets. The evaluation metrics include Accuracy,
as well as weighted and unweighted average Precision,
Recall, and F1-score. We adopt this model to label our could be explained as an attempt to undermine the le-
PRF dataset with types of misogyny. gitimacy and value of women holding relevant public
roles. Sexual harassment is notably prevalent towards
AMI-PRF Dataset By combining the 380 tweets from politicians and athletes, as expressions of intent to assert
AMI, having ground-truth information regarding the power over women through threats of violence.
type of misogyny, and the PRF dataset, labeled with
our trained model, we obtain 1140 tweets featuring both 4.2. Hurtfulness by Profession (RQ2)
misogyny type and professions. Such dataset, named
AMI-PRF, is leveraged to investigate the relation between To address RQ2 – whether specific hurtful expressions
misogyny and professions. target women in certain professions – we define a quan-
titative lexicon-based measure for assessing the hurtful-
ness of tweets.
4. Experiments and Data Analyses
Hurtfulness Evaluation To define a hurtfulness mea-
4.1. Misogyny Type by Profession (RQ1) sure for tweets, we leverage the HurtLex lexicon, which
To address RQ1, we examine how different types of misog- compiles offensive words and stereotyped expressions
ynistic speech are distributed across various professions aimed at insulting and degrading marginalized individ-
in AMI-PRF. For each type of misogyny, we find how uals and groups [26]. HurtLex organizes words into 17
many tweets belonging to such class are directed towards fine-grained categories, each identifying a specific target
a specific profession and qualitatively compare the results or form of offense.
in Fig. 2. Inspired by the work of Nozza et al. [12], where a
harmful sentence completions indicator is defined for
generative language models, we employ a subset of 9
Discussion We observe distinct patterns in the usage
HurtLex categories for our purposes: animals, prostitu-
of misogynistic speech across professions: derailing dis-
tion, professions, negative connotations, homosexual-
course, which focuses on justifying women abuse and
ity, male genitalia, female genitalia, derogatory terms,
rejecting male responsibility, tends to primarily target au-
and crime6 . The hurtfulness score for a tweet w.r.t. one
thors compared to the other professions. This aligns with
of the 9 categories could be computed as the ratio of
the nature of derailing speech, which seeks to rationalize
HurtLex lemmas7 from that category to the total HurtLex
mistreatment of women and deflect male accountabil-
lemmas from any category present in the tweet. How-
ity. Therefore, this kind of discourse can be expected to
ever, an approach relying solely on the HurtLex lexicon
be commonly directed at public intellectuals or cultural
would not provide a sufficiently comprehensive analysis,
figures. In contrast, dominance-oriented misogynistic
as HurtLex has low coverage of the vocabulary in the
discourse, aimed at asserting male superiority along with
AMI-PRF dataset, with only 15.42% of the lemmas in a
stereotypical negative speech, is predominantly directed
tweet occurring in HurtLex on average.
at powerful figures such as politicians. This prevalence
6
For detailed descriptions of each category, we refer to Bassignana
5
https://github.com/VinAIResearch/BERTweet/blob/master/ et al. [26].
7
TweetNormalizer.py We retain only conservative-level lemmas.
Table 2
Average cosine similarity between HurtLex lemmas and ItEM
centroids using Word2vec Twitter embeddings.

HurtLex Category Centroid similarity
animals 0.57
prostitution 0.60
professions 0.60
negative connotations 0.55
homosexuality 0.59
male genitalia 0.52
female genitalia 0.56
derogatory 0.56
crime 0.57

Figure 3: Emotive z-scores for HurtLex categories with respect
to professions.
To enhance our reference vocabulary, we leverage
ItEM8 , a methodology proposed by Passaro and Lenci
[10]. For each lemma in the HurtLex subset, we obtain
its vectorial representation using ItEM and the Word2vec where 𝑞 is the number of lemmas in t which occur in
Twitter embeddings9 , following Godin [27]. For each 𝑉 . This allows us to obtain, for each tweet-category pair,
category, we compute a centroid embedding by averag- a score between [0, 1], indicating the tweet hurtfulness
ing the vectors associated with each lemma in that cate- tendency.
gory. This allows us to represent each category through
a unique embedding. Tab. 2 reports the average cosine Discussion Fig. 3 provides a visual analysis of the re-
similarity between lemmas of a specific category and the sults. The Emotive score is computed category-wise as
respective centroid. Finally, we compute the cosine sim- the average of the scores for each tweet, after having
ilarity between each word embedding in the Word2vec standardized the values with a z-score approach. We
Twitter vocabulary and each centroid, thus creating a keep a 𝑡ℎ𝑟 of 0.2 in terms of cosine similarity to filter
new lexicon featuring a coverage of 76.51% w.r.t. the out excessively noisy category associations, while still
AMI-PRF dataset. allowing low values to contribute to the average score.
We leverage the similarity scores to define a hurtful This provides a general overview on the hurtful language
emotive score for each tweet as follows: let t be a lem- across different professions. According to the Emotive
matized tweet, 𝑤 a lemma in t, 𝑘 one of the 9 HurtLex analysis, politicians are mainly targeted with insults re-
˜ the centroid of category 𝑘, 𝑠 the cosine sim-
categories, 𝑘 lated to crime, homosexuality and male genitalia. This is
ilarity function and 𝑉 the set of vocabulary items, i.e. consistent with what has been observed in Fig. 2, where
the words for which we have a Twitter emmbedding. For forms of sexual harassment discourse were mainly di-
each 𝑤 ∈ 𝑉 , we define the 𝐼𝑡𝐸𝑀 function as: rected toward political figures. For artists, we notice a
peak w.r.t. female genitalia, while for athletes we register
{︃
˜ ˜ a more balanced trend, except for a peak in negative con-
𝐼𝑡𝐸𝑀 (𝑤, 𝑘 ˜, 𝑡ℎ𝑟) = 𝑠(𝑤, 𝑘) if 𝑠(𝑤, 𝑘) ≥ 𝑡ℎ𝑟 (1) notation. On the other hand, authors seem to be mainly
0 ˜) < 𝑡ℎ𝑟
if 𝑠(𝑤, 𝑘 targeted with crime and profession-related topics, con-
where 𝑡ℎ𝑟 designates a threshold in [0, 1] range. In sistent with the fact that the type of misogyny mostly
other words, the 𝐼𝑡𝐸𝑀 function outputs the cosine sim- inflicted towards this profession consists of derailing and
ilarity value between 𝑤 and 𝑘’s centroid if such value stereotypes.
is greater or equal then 𝑡ℎ𝑟, while it outputs 0 if it is
lower than 𝑡ℎ𝑟. Additionally, if 𝑤 is not found in the 5. Conclusion
vocabulary, its 𝐼𝑡𝐸𝑀 value is also considered 0.
The Emotive score for a tweet t w.r.t. a category 𝑘 and In this paper, we investigated the phenomenon of misog-
a threshold 𝑡ℎ𝑟 is then computed as: yny on Twitter through the lens of hurtfulness, qualifying
its different manifestation considering the profession of
the targets of the misogynistic attacks.
∑︀
𝐼𝑡𝐸𝑀 (𝑤, 𝑘, 𝑡ℎ𝑟)
Emotive(t, 𝑘) = 𝑤∈t
(2) Specifically, we examined how different types of misog-
𝑞
8
yny are distributed across various professions, unveiling
https://github.com/Unipisa/ItEM/ how derailing discourse is mostly used to attack authors,
9
https://github.com/FredericGodin/TwitterEmbeddings
while dominance and sexual harassment speech targets computational linguistic approach, Humanities and
especially politicians. Social Sciences Communications 11 (2024) 1–15.
Additionally, we studied through a hurtfulness score [7] E. Fersini, D. Nozza, P. Rosso, Overview of the
measure how the language used in misogynistic tweets evalita 2018 task on automatic misogyny identi-
varies across different professions: politicians tend to fication (AMI), in: Tommaso Caselli and Nicole
be targeted with hate speech revolving around sexuality Novielli and Viviana Patti and Paolo Rosso (Ed.),
(female/male genitalia, homosexuality) and crime, while Proceedings of the Sixth Evaluation Campaign of
artists seem to be insulted mainly through general deroga- Natural Language Processing and Speech Tools
tory terms. On the other hand, less heterogeneous results for Italian. Final Workshop (EVALITA 2018) co-
were obtained for athletes and authors, except for peaks located with the Fifth Italian Conference on Com-
in hurtful topics regarding crimes and professions. putational Linguistics (CLiC-it 2018), Turin, Italy,
We acknowledge two potential limitations of our con- December 12-13, 2018, volume 2263 of CEUR Work-
tribution: the incomplete coverage of our dataset’s vocab- shop Proceedings, CEUR-WS.org, 2018. URL: http:
ulary by the Hurtlex-based ItEM lexicon, and our decision //ceur-ws.org/Vol-2263/paper009.pdf.
to focus on just four professions, which, as motivated, [8] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M.
was guided by the representation of those professions Rangel Pardo, P. Rosso, M. Sanguinetti, SemEval-
in the AMI dataset. We therefore plan to extend the 2019 task 5: Multilingual detection of hate speech
approach adopting a richer vocabulary w.r.t. datasets against immigrants and women in Twitter, in: Pro-
as well as expanding the set of professions. Indeed, as ceedings of the 13th International Workshop on Se-
further future investigations, it could be assessed how mantic Evaluation, Association for Computational
hurtfulness dimensions change using different lexicons Linguistics, Minneapolis, Minnesota, USA, 2019,
or automatic approaches. We also intend to investigate pp. 54–63. URL: https://aclanthology.org/S19-2007.
the distribution of misogynistic language both textual doi:10.18653/v1/S19-2007.
and multi-modal, as well as the broader expression of [9] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova,
emotions in posts associated with different professions. G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pite-
nis, Ç. Çöltekin, SemEval-2020 task 12: Mul-
tilingual offensive language identification in so-
Acknowledgments cial media (OffensEval 2020), in: Proceedings of
the Fourteenth Workshop on Semantic Evaluation,
Research partially funded by PNRR-PE00000013 “FAIR
International Committee for Computational Lin-
- Future Artificial Intelligence Research” - Spoke 1
guistics, Barcelona (online), 2020, pp. 1425–1447.
“Human-centered AI” under NextGeneration EU, ERC-
URL: https://aclanthology.org/2020.semeval-1.188.
2018-ADG G.A. 834756 XAI: Science and technology for
doi:10.18653/v1/2020.semeval-1.188.
the eXplanation of AI decision making under Horizon
[10] L. C. Passaro, A. Lenci, Evaluating context se-
2020, and PRIN 2022 PIANO (Personalized Interventions
lection strategies to build emotive vector space
Against Online Toxicity) project, CUP B53D23013290006.
models, in: N. Calzolari, K. Choukri, T. Declerck,
S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani,
References H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.),
Proceedings of the Tenth International Confer-
[1] M. E. David, Reclaiming feminism: Challenging ence on Language Resources and Evaluation LREC
everyday misogyny, Policy Press, 2016. 2016, Portorož, Slovenia, May 23-28, 2016, Eu-
[2] C. Tileagă, Communicating misogyny: An interdis- ropean Language Resources Association (ELRA),
ciplinary research agenda for social psychology, So- 2016. URL: http://www.lrec-conf.org/proceedings/
cial and Personality Psychology Compass 13 (2019) lrec2016/summaries/637.html.
e12491. [11] A. Bondielli, L. C. Passaro, Leveraging CLIP for
[3] E. A. Jane, ‘Back to the kitchen, cunt’: Speaking the image emotion recognition, in: E. Cabrio, D. Croce,
unspeakable about online misogyny, Continuum L. C. Passaro, R. Sprugnoli (Eds.), Proceedings of
28 (2014) 558–570. the Fifth Workshop on Natural Language for Ar-
[4] D. Ging, E. Siapera, Special issue on online misog- tificial Intelligence (NL4AI 2021) co-located with
yny, Feminist media studies 18 (2018) 515–524. 20th International Conference of the Italian Associ-
[5] J. Marques, Exploring gender at work, Springer, ation for Artificial Intelligence (AI*IA 2021), Online
2021. event, November 29, 2021, volume 3015 of CEUR
[6] L. Fontanella, B. Chulvi, E. Ignazzi, A. Sarra, A. Ton- Workshop Proceedings, CEUR-WS.org, 2021. URL:
todimamma, How do we study misogyny in the https://ceur-ws.org/Vol-3015/paper172.pdf.
digital age? A systematic literature review using a [12] D. Nozza, F. Bianchi, D. Hovy, HONEST: measuring
hurtful sentence completion in language models, Prevent This Nightmare, America”: Nancy Pelosi
in: K. Toutanova, A. Rumshisky, L. Zettlemoyer, As the Monstrous-Feminine in Donald Trump’s
D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, YouTube Attacks, Women’s Studies in Commu-
T. Chakraborty, Y. Zhou (Eds.), Proceedings of the nication 45 (2022) 316–337.
2021 Conference of the North American Chapter [21] J. Ritchie, Creating a monster: Online media con-
of the Association for Computational Linguistics: structions of Hillary Clinton during the democratic
Human Language Technologies, NAACL-HLT 2021, primary campaign, 2007–8, Feminist Media Studies
Online, June 6-11, 2021, Association for Computa- 13 (2013) 102–119.
tional Linguistics, 2021, pp. 2398–2406. [22] N. Saluja, N. Thilaka, Women leaders and digi-
[13] M. Rezvan, S. Shekarpour, L. Balasuriya, tal communication: Gender stereotyping of female
K. Thirunarayan, V. L. Shalin, A. P. Sheth, politicians on twitter, Journal of Content, Commu-
A quality type-aware annotated corpus and lexicon nity & Communication 7 (2021) 227–241.
for harassment research, in: H. Akkermans, [23] S. Ghaffari, Discourses of celebrities on insta-
K. Fontaine, I. E. Vermeulen, G. Houben, M. S. We- gram: digital femininity, self-representation and
ber (Eds.), Proceedings of the 10th ACM Conference hate speech, in: Social Media Critical Discourse
on Web Science, WebSci 2018, Amsterdam, The Studies, Routledge, 2023, pp. 43–60.
Netherlands, May 27-30, 2018, ACM, 2018, pp. 33– [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
36. URL: https://doi.org/10.1145/3201064.3201103. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,
doi:10.1145/3201064.3201103. Attention is all you need, in: I. Guyon, U. von
[14] P. Parikh, H. Abburi, P. Badjatiya, R. Krishnan, Luxburg, S. Bengio, H. M. Wallach, R. Fergus,
N. Chhaya, M. Gupta, V. Varma, Multi-label cat- S. V. N. Vishwanathan, R. Garnett (Eds.), Ad-
egorization of accounts of sexism using a neural vances in Neural Information Processing Systems
framework, in: Proceedings of the 2019 Confer- 30: Annual Conference on Neural Information
ence on Empirical Methods in Natural Language Processing Systems 2017, December 4-9, 2017,
Processing and the 9th International Joint Con- Long Beach, CA, USA, 2017, pp. 5998–6008. URL:
ference on Natural Language Processing (EMNLP- https://proceedings.neurips.cc/paper/2017/hash/
IJCNLP), Association for Computational Linguis- 3f5ee243547dee91fbd053c1c4a845aa-Abstract.
tics, Hong Kong, China, 2019, pp. 1642–1652. URL: html.
https://aclanthology.org/D19-1174. doi:10.18653/ [25] S. Barreto, R. Moura, J. Carvalho, A. Paes, A. Plas-
v1/D19-1174. tino, Sentiment analysis in tweets: an assessment
[15] P. Chiril, F. Benamara, V. Moriceau, “be nice to study from classical to modern word representation
your wife! the restaurants are closed”: Can gender models, Data Min. Knowl. Discov. 37 (2023) 318–380.
stereotype detection improve sexism classification?, URL: https://doi.org/10.1007/s10618-022-00853-0.
in: Findings of the Association for Computational doi:10.1007/S10618-022-00853-0.
Linguistics: EMNLP 2021, Association for Compu- [26] E. Bassignana, V. Basile, V. Patti, Hurtlex: A mul-
tational Linguistics, Punta Cana, Dominican Repub- tilingual lexicon of words to hurt, in: E. Cabrio,
lic, 2021, pp. 2833–2844. URL: https://aclanthology. A. Mazzei, F. Tamburini (Eds.), Proceedings of the
org/2021.findings-emnlp.242. doi:10.18653/v1/ Fifth Italian Conference on Computational Lin-
2021.findings-emnlp.242. guistics (CLiC-it 2018), Torino, Italy, December 10-
[16] D. Felmlee, P. Inara Rodis, A. Zhang, Sexist slurs: 12, 2018, volume 2253 of CEUR Workshop Proceed-
Reinforcing feminine stereotypes online, Sex Roles ings, CEUR-WS.org, 2018. URL: https://ceur-ws.org/
83 (2020) 16–28. Vol-2253/paper49.pdf.
[17] A.-M. Hancock, When multiplication doesn’t equal [27] F. Godin, Improving and interpreting neural net-
quick addition: Examining intersectionality as a works for word-level prediction tasks in natural
research paradigm, Perspectives on politics 5 (2007) language processing, Ghent University, Belgium
63–79. (2019).
[18] R. K. Dhamoon, Considerations on mainstreaming
intersectionality, Political research quarterly 64
(2011) 230–243.
[19] D. Silva-Paredes, D. Ibarra Herrera, Resisting anti-
democratic values with misogynistic abuse against
a chilean right-wing politician on twitter: The#
camilapeluche incident, Discourse & Communica-
tion 16 (2022) 426–444.
[20] E. B. Phipps, F. Montgomery, “Only YOU Can
A. Supplementary Material
In Figure 4, we display the tree of nested professions based on the Wikidata taxonomy concerning the popular
women selected to collect the PRF dataset (§3.2). Branches identify Wikidata subclass of relationships, while dashed
marks the connections between women and the first (or unique) occupation appearing on their Wikidata pages.We
avoid reporting women’s names to maintain anonymity.

Person

Sportsperson Worker
ACTIVIST

Professional
ATHLETE

Environmentalist Political
activist
POLITICIAN

Tennis Football Volleyball
Runner Swimmer Human
player player player
rights
activist

Sprinter Association
football
player

Creator

AUTHOR ARTIST

Writer

Performing Visual
Director Musician
artist artist

Non-fiction Vocalist Actor
writer
Film Art Designer Performance
Painter Photographer Sculptor
director director artist
Researcher Singer

Film actor

Scientist Fashion
Architect
designer
Singer -
songwriter

Astronaut Biologist Astronomer

Microbiologist Astrophysicist

Virologist

Figure 4: Tree of professions held by the group of popular women selected to collect the PRF dataset.