<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MAMITA: Benchmarking Misogyny in Italian Memes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Gasparini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Rizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aurora Saibene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper introduces MAMITA, a novel Italian multimodal benchmark dataset developed for the automatic detection of misogynistic content in online media, with a specific focus on memes. The dataset comprises 1880 memes sourced from popular social platforms-Facebook, Twitter, Instagram, Reddit-and meme-centric websites, selected using misogyny-related</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misogynous Memes</kwd>
        <kwd>Italian Benchmark</kwd>
        <kwd>Expert vs Crowd Annotation</kwd>
        <kwd>Perspectivism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Despite growing societal awareness and policy eforts</title>
        <p>aimed at addressing such an issue, the automatic
detecIn recent years, the proliferation of user-generated con- tion of multimodal misogynistic content remains a
signiftent on social media has intensified the creation of hateful icant challenge. A major limitation in the development of
content against women not only using textual messages robust misogyny detection systems is the scarcity of
highthat can implicitly or explicitly contain harmful content, quality, multimodal datasets that reflect the nuanced and
but also from a multimodal perspective1. Among the di- subjective nature of such content. Misogyny can
maniverse forms of online expression, memes have emerged fest in explicit or implicit forms, often relying on cultural
as viral communication tools, which can subtly convey references, irony, or layered symbolism.
harmful ideologies thanks to their combination of vi- The identification of this kind of abusive content is of
sual and textual elements. This kind of digital violence paramount importance not only for protecting women
can be an extension or a precursor to physical violence, and guaranteeing safe online environments, but also for
stalking and harassment, but it can also be a way to pun- eventually generating counter-narratives 2.
ish, abuse or silence women, increasing the isolation of In this paper, we provide three main contributions:
victims (Council of Europe, 2021) [2]. Through the
combination of apparently innocuous images coupled with
harmless superimposed text, misogynous memes can be
easily created and spread, normalizing and trivializing
detrimental stereotypes, objectification, and
marginalization of women. Their viral nature, usually due to the
ironic message behind, contributes to their rapid spread
across several media platforms, also fueling those
communities that reinforce misogynistic ideologies.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1. MAMITA (Multimedia Automatic Misogyny</title>
        <p>Identification in i TAlian), a novel Italian
benchmark focused on misogynistic content in memes,
which covers diverse forms of gender-based hate
such as body shaming, objectification, stereotype,
and violence.
2. Dual annotation strategy involving both
domain experts and crowd annotators, enabling
comparative analysis of labeling perspectives and
improving the robustness of misogyny detection.
3. Perspectivist annotation, capturing for each
annotator perceived misogyny along with
demographic and socio-cultural background such as
age, education, and social status, to support
re</p>
      </sec>
      <sec id="sec-1-3">
        <title>2https://rm.coe.int/study-on-efectiveness-risks-and-potentials-o</title>
        <p>f-using-counter-and-alter/1680b40775
(a) Shaming
(b) Stereotype
(c) Objectification
(d) Violence</p>
        <p>
          search on disagreement in hate speech perception also capture aggressiveness. Lastly, AMI-PRF [12] is the
and detection. most recent dataset of tweets annotated for both
misogThe paper is organized as follows. In Section 2, related yny and professional categories. A further contribution
works are presented. In Section 3, the proposed bench- is represented by PejorativITy [
          <xref ref-type="bibr" rid="ref10">13</xref>
          ], an Italian tweet
cormark is described, detailing the two types of annotations, pus annotated at word level for pejorativity, and at the
i.e., experts and crowd. In Section 4, insights from human sentence level for misogyny.
and multimodal models are reported. Finally, in Section While these eforts advanced text-based detection, they
5, conclusions are outlined. did not address the complexity of multimodal content
such as memes, which often rely on implicit visual cues,
humor, and cultural references to communicate
harm2. Related Work ful messages. Among the general hateful meme
benchmarks, we can highlight four main initiatives focused on
The automatic detection of hate speech, and misogyny the English language, i.e., Facebook Hateful Memes [14],
in particular, has received growing attention in Natural Memotion2 [15], Harmful Memes[16], MultiOFF [17],
Language Processing (NLP). Early eforts have primar- and Intervening Cyberbullying in Multimodal Memes
ily focused on text-based misogyny detection [3], using (ICMM) [18]. However, these benchmarks do not capture
datasets sourced from Twitter and Reddit. For instance, the specificity of misogyny, which often relies on
genregarding the multilingual settings, several benchmark der norms, implicit bias, and culturally coded references
datasets have been proposed in the literature to cover that difer significantly from general ofensive content or
multiple languages. A few representative benchmarks other forms of targeted hate (e.g., against immigrants or
are denoted by HATEVAL [4] focused on English and people with disabilities). Only a few benchmarks have
Spanish, BAJER [5] for the Danish language, BIASLY [6] been proposed to deal with the peculiarity of hate against
focused on movie subtitles and colloquial expressions in women in a multimodal settings, i.e., MAMI [19] for the
North American film, ArMIS [ 7] for the Arabic language, English language, MIMIC [20] for Hindi, EXIST [21, 22]
and EXIST [
          <xref ref-type="bibr" rid="ref4">8, 9</xref>
          ] for dealing with English and Spanish for English and Spanish, and Dravidian corpus [23]
fosexist expressions. cused on the Tamil and Malayalam languages.
        </p>
        <p>
          Regarding the Italian language, we can summarize Although all the previous initiatives represents a
two main benchmarking text-related initiatives, i.e., AMI fundamental step towards the identification of hateful
[
          <xref ref-type="bibr" rid="ref15">10, 11, 12</xref>
          ] and PejorativITy [
          <xref ref-type="bibr" rid="ref10">13</xref>
          ]. AMI (Automatic meme against women, to the best of our knowledge
Misogyny Identification) represents a set of benchmark no benchmark dataset has been developed to
specifidatasets that, starting from the initial challenge at Evalita cally address misogynistic content in the Italian
lan2018, have led to three main annotated corpora, i.e., guage, resulting in a remarkable gap in the resources
AMI@Evalita 2018, AMI@Evalita 2020, and AMI-PRF. available for the systematic investigation of this
pheThe AMI@Evalita 2018 dataset introduced in [10] pro- nomenon within the Italian contexts. To this purpose, we
vided one of the first benchmarks for detecting misog- propose MAMITA (Multimedia Automatic Misogyny
ynistic language on social media in English and Italian Identification in i TAlian), a novel benchmark dataset for
tweets. Its extension presented at the AMI@Evalita 2020 the Italian language that focuses on misogynous memes,
[
          <xref ref-type="bibr" rid="ref15">11</xref>
          ] denotes an extension of the former benchmark to
composed of a wide range of multimodal expressions de- led to a full agreement in 81.43% of the memes, where in
noting body shaming, objectification, stereotyping, and 70.86% of such cases the memes were labeled by the three
violence. The dataset is developed using a dual anno- annotators as misogynous. We computed Fleiss’ Kappa
tation strategy that combines input from both domain statistics [24] to assess the level of agreement among
experts and crowd annotators, enabling robust analysis the experts. The resulting score was 0.749, indicating a
of labeling perspectives. substantial inter-annotator reliability in the perception
of memes. This value suggests a strong consistency in
the evaluators’ judgments, particularly in distinguishing
3. MAMITA between misogynistic and non-misogynistic content.
The annotations given by the experts have also been
The meme collection was primarily carried out using aggregated following a majority voting strategy to
asvisual search engines such as Google Images and Pin- sign a final golden label about misogyny. The dataset
terest, based on the keywords reported in Table 1. All labeled by the experts finally consists of 57.71% of
misogthe keywords have been defined to try to capture four ynous and 42.29% of not misogynous memes. Regarding
main categories related to misogynous contents, i.e., body the category of misogyny, since multiple overlapping
shaming, objectification, stereotyping, and violence. The annotations were possible, the final dataset evaluated by
websites considered are typically dedicated to meme shar- the expert contains - among those memes considered as
ing (e.g., me.me and memedroid.com), as well as Insta- misogynous by the majority of the experts - 76.12% of the
gram accounts focused on themes related to femininity memes labeled as Objectification, 48.29% as Stereotype,
(e.g., alpha woman and scaricatricidiporto). Additional 20.18% as Violence and 8.84% as Body Shaming by at
content was sourced from Facebook groups intention- least one annotator. Considering that multiple labels are
ally created for the dissemination of misogynistic memes allowed for the type of misogyny, the dataset is provided
(e.g., facciaabuco, ignoranza sofocotti pecorina , and Io sono with soft labels denoting a probability distribution for
vaginatariano). The initial dataset consisted of approxi- each category.
mately 2,000 memes. Pornographic content, low-quality
images, and items that could not be clearly categorized
as memes were subsequently removed. Memes were 3.2. Crowd Annotation
also normalized to a maximum resolution of 640×640 For what concerns the annotation process performed by
pixels, preserving their aspect ratio. The final dataset the crowd, we prepared a proper Google Form and we
encomprises 1880 memes, with the textual content tran- gaged trusted voluntary annotators (from 4 to 10 labelers
scribed using Optical Character Recognition (OCR) tools for each meme). The total number of volunteers involved
(https://www.onlineocr.net/). Examples of misogynous is 231 (116 male, 110 female, and 5 non-responders). The
memes available in the MAMITA dataset are reported most frequent age is between 25-34 years old, i.e., about
in Figure 1. The dataset has been subsequently labeled 41% of the annotators. The native language is Italian for
by two distinct groups, i.e., expert and crowd annotators. the 99% of the participants, while the remaining three
The full dataset can be accessed by filling in the form annotators speak Italian fluently. The dataset was
dihttps://forms.gle/5Xz1gcxJdrh6GHnq5. vided into groups of 40 memes each, balanced in terms
of classification (20 misogynistic, 20 non-misogynistic)
3.1. Expert Annotation according to the experts’ preliminary evaluations, to be
subsequently evaluated by the engaged crowd annotators.
        </p>
        <p>For what regards the annotation process performed by The choice of presenting a limited number of memes is
the experts, we involved two male and three female due to the fact that sensory habituation cause people to
annotators. In order to label each meme, they adopted reduce their response to repeated or continuous stimuli
the definitions originally provided in [ 19], opportunely over time [25].
adapted for covering the multimodal scenario. Each Each meme was independently reviewed by a varying
meme was reviewed by one male and two female ex- number of labelers. Each annotator labeled the memes
perts. Each expert involved in the evaluation process as either misogynistic or non-misogynistic and, when
analyzed the memes, classifying them as either misogy- applicable, selected the primary Category of misogyny
nistic or non-misogynistic. In cases where a meme was that they perceived most together with the Intensity of
perceived as misogynistic, evaluators were also asked to ifgured out misogyny. Moreover, in order to provide a
specify the type of misogyny, selecting among violence, benchmark that is characterized by perspectivist
inforbody shaming, stereotyping, and objectification. In cases mation, we acquired a few variables to characterize the
of uncertainty about the categorization, evaluators were annotators. In particular, participants were required to
allowed to select multiple types of misogyny. provide a few information about themselves. Specifically,
The annotation process performed by the experts has the following specific details have been required:
bitch (stronza)
blondes (bionde)
call girl (escort)
cheap (squallida)
cheat (tradire / imbrogliare)
clean (pulire)
cleaning (pulizia)
cold (fredda)
complicated (complicata)
cooking (cucinare)
cougar (cugar)
couple (coppia)
crazy (pazza)
cunt (cagna)
dirty (sporca)
dishwasher (lavastoviglie)
driving (guida)
dumb (stupida)
equal rights (pari diritti)
escort (escort)
fat (grassa)
female (femmina)
feminism (femminismo)
feminist (femminista)
fuck (fottiti / scopare)
girl (ragazza)
girlfriend (fidanzata)
girl power (potere femminile)
girls (ragazze)
gold digger (arrampicatrice sociale)
harsch (dura / severa)
hooker (prostituta)
hore (puttana)
house (casa)
housewife (casalinga)
inferior (inferiore)
kitchen (cucina)
lazy (pigra)
marriage (matrimonio)
Mars &amp; Venus (Marte e Venere)
milf (milf)
misogynist (misogino)
misogyny (misoginia)
nazifeminist (nazifemminista)
pregnancy (gravidanza)
promiscuous (promiscua)
prostitute (prostituta)
rape (stupro)
sandwich (panino)
sex (sesso)
sexism (sessismo)
sexist (sessista)
slut (zoccola)
stupid (stupida)
tits (tette)
trixie (ragazza superficiale)
unstable (instabile)
wife (moglie)
witch (strega)
woman (donna)
• Subjective Social Status (SSS): we introduced a • Familiarity with memes: Yes/No response to
variable that has the goal to measure an individ- whether they know what memes are
ual’s perception of his/her social position com- • Frequency of meme visualization: how
ofpared to others. To this purpose, we adopted the ten the participant encounters memes, using a
MacArthur scale introduced in [26]. Participants 7-point Likert scale ranging from Never to Very
are asked to place themselves on a graduated scale Often
consisting of ten steps, ranging from the highest • Primary source of meme stimuli: social media,
to the lowest socioeconomic status. At the top messaging apps, websites and forums, other.
of the scale (10) are individuals with the
highest levels of income, education, and occupational Since the number of annotators varies for each meme,
prestige. At the bottom of the scale (1) are those they have been finally labeled as misogynous if at
with the lowest income, minimal education, and least 50% of the annotators provided the misogynous
the least respected jobs, or who may be unem- label. Based on the crowd annotations, the resulting
ployed. This self-placement invites participants dataset consists of 58.82% misogynous and 41.17%
nonto express a subjective evaluation of their social misogynous memes. The annotation process led to full
agreement for 43.14% of the memes. If we focus on each the absolute value indicates how large the diference is in
class, 37.97% of the misogynous memes and 50.45% of the terms of standard deviation, i.e., the larger the absolute
not misogynous ones show a full agreement, denoting (as value, the more statistically significant the diference.
expected) a higher disagreement on misogynous content. In this case, the p-value, which indicates the likelihood
To evaluate the overall level of agreement, we also com- that this diference occurred by chance, is extremely low
puted Krippendorf’s Alpha statistic [ 27], which yielded (2.14 × 10− 43). The results show a highly significant
a score of 0.43. While the percentage of full agreement diference in the perception of intensity between men
suggests some level of consistency, the Krippendorf’s and women, suggesting that the probability of observing
Alpha value indicates that a substantial portion of the such a diference by chance is asymptotic to zero.
agreement may be attributable to chance, highlighting
extremely subjective interpretation of what can be con- [Q2] Do statistically significant diferences exist
sidered as misogynous. As for the specific categories of among age groups to identify misogynistic content?
misogyny, the dataset includes 70.97% of misogynous The core idea is to assess whether the probability of
judgmemes labeled as objectification, 55.87% as stereotype, ing content as misogynistic depends on the annotator’s
30.47% as violence, and 22.47% as body shaming by at age group. For this purpose, we estimated both a
Chileast one annotator. Also in this case the dataset is pro- Squared statistic and a Binary Logistic Regression, which
vided with soft labels denoting a probability distribution verifies if there exists a relationship and estimates how
for each category derived through the crowd annotation much each age group afects the likelihood of judging
process. content as misogynistic, respectively.
In our case, the p-value equal to 7.10 related to the
4. Insights from MAMITA Chi-Squared test denotes a statistically significant
relationship between age and the misogyny judgment.</p>
        <p>As an additional observation, we report in Table 2 the
results of the Binary Logistic Regression where the
dependent variable (misogynous or not) is binary.</p>
      </sec>
      <sec id="sec-1-4">
        <title>In this section, we present a twofold analysis of the</title>
        <p>MAMITA dataset. First, we investigate how
sociodemographic and cognitive characteristics of human
annotators—such as gender, age, and Subjective Social
Status—influence the perception and labeling of
misogynistic content. Then, we evaluate the performance of
multimodal baseline models, specifically mCLIP and mBLIP,
in detecting misogyny and disagreement in memes,
providing a comparative perspective between human
subjectivity and machine predictions.</p>
        <sec id="sec-1-4-1">
          <title>4.1. Human Perspectives</title>
          <p>To better understand how individual diferences influence The independent variables are age categories,
comthe perception of misogynistic content, we formulated pared with a reference category 18-24 age group. We can
three research questions. easily note that the socio-demographic attribute related
to the Age is significantly associated with the likelihood
[Q1] Does the perceived intensity of misogyny sig- of labeling content as misogynistic, where all age groups
nificantly difer between male and female annota- compared to the baseline (18-24) are statistically
signifitors? The aim is to determine whether the observed cant (p-value &lt; 0.01). Moreover, the Odds Ratios increase
diferences in the perception of misogyny intensity be- with age, particularly from age 45 and up. This indicates
tween men and women are statistically significant or an increased probability of labeling content as
misogycould be due to chance. To this purpose, the Welch t-test nous as age increases (compared to the 18-24).
has been adopted, which does not assume the same
variance between the two populations. In this specific case, [Q3] Has the Subjective Social Status a significant
the null hypothesis is that the two means of the perceived relationship with the intensity of the perceived
intensity are equal and that any observed diference in misogyny? To explore the relationship between
inthe data can be attributed to random error or natural dividuals’ perceived social standing and their
sensitivsample variation, rather than to a real efect. ity to misogynistic content, we computed the Spearman</p>
          <p>The Welch t-test is -13.98, where the negative sign correlation between SSS and the perceived intensity of
indicates the direction of the diference since the mean misogyny. In particular, for each annotator, we
considof women is higher than that of men (5.07 vs. 4.29) and ered their self-reported SSS score obtained from the
backtion capabilities, with mBLIP consistently outperforming
mCLIP across all metrics. In the Crowd setting, mBLIP
achieves a higher average F1 score (0.79 vs. 0.70),
demonstrating better balance between precision and recall for
both misogynous and not misogynous labels. It is
interesting to note that mBLIP’s  1+ (0.83) and  1− (0.76)
suggest a strong ability to correctly identify both
misogynistic and non-misogynistic content according to crowd
judgments. Performance improves further when
considering the Expert annotations. Both models exhibit higher
F1 scores compared to the Crowd setting, with mBLIP
again leading (Avg. F1 = 0.83 vs. 0.73 for mCLIP). This
may indicate better alignment between the models’
predictions and the expert labeling criteria, possibly due to
more consistent or less ambiguous expert judgments. In
both evaluation contexts, mBLIP proves to be the more
robust of the two models, ofering more reliable and
accurate misogyny detection. These results suggest that
state-of-the-art multimodal models, particularly mBLIP,
can efectively capture harmful content signals when
ifne-tuned appropriately.
ground questionnaire and calculated the average
intensity of misogyny they assigned across all memes they
annotated as misogynistic. This approach allowed us to
assess whether annotators with difering self-reported
social positions systematically varied in how strongly
they perceived misogynistic content. Spearman’s rank
correlation was chosen due to its suitability for capturing
monotonic relationships without assuming normality in
the data distributions.</p>
          <p>The Spearman correlation analysis between the Social
Sensitivity Score and the perceived intensity of
misogynistic content yielded a statistically significant positive
correlation ( = 0.209,  = 0.0015). While the
correlation is relatively weak, it indicates that annotators with
a higher Social Sensitivity Score are slightly more
likely to assign higher intensity of perceived
misogyny. This finding highlights the influence of
annotatorlevel socio-cognitive traits on subjective annotation tasks
and suggests the importance of modeling annotator
variability when addressing harmful or sensitive content.</p>
        </sec>
        <sec id="sec-1-4-2">
          <title>4.2. Multimodal Baseline Models</title>
          <p>To assess the efectiveness of multimodal models in
identifying misogynistic content and disagreement between
annotators, we fine-tune two state-of-the-art
architectures: mCLIP 3[28, 29] and mBLIP4 [30]. These
models leverage both visual and textual information from
memes, enabling a comprehensive understanding of their
content. Both the vision encoder and text decoder are
trained jointly with a classification head, allowing the
models to tailor their multimodal representations to the
specific task of misogyny and disagreement detection on
ttehnetMbaAseMliInTeAfodraetvaasleuta.tTioonp,wroevfinidee-tuanseimboptlhe manoddeclosnbsyis- Approach  + +  1+ Cr− owd −  1− . 1
adding a linear classification layer on top of their origi-  0.00 0.00 0.00 0.57 1.00 0.72 0.36
nal representations, without further architectural mod-  0.00 0.00 0.00 0.57 1.00 0.72 0.36
ifications 5. The classifier is trained using binary cross-  (* ) 0.44 0.52 0.48 0.58 0.49 0.53 0.50
entropy loss and the Adam optimizer. To compare the  (* ) 0.42 0.69 0.53 0.55 0.28 0.37 0.45
baseline models, we measure Precision (P), Recall (R), and Approach  + +  1+ Ex− pert −  1− . 1
F-Measure (F1), distinguishing between the misogynous  0.81 1.00 0.90 0.00 0.00 0.00 0.45
label (+) and the non-misogynous one (-) as well as the  0.81 1.00 0.90 0.00 0.00 0.00 0.45
agreement label (+) vs the disagreement one (-). We adopt  (* ) 0.82 0.37 0.51 0.19 0.66 0.30 0.40
a 10-fold cross-validation approach to ensure robustness  (* ) 0.83 0.34 0.49 0.19 0.68 0.30 0.39
and generalizability of the evaluation. Table 4</p>
          <p>The results reported in Table 3 highlight the perfor- Disagreement prediction performance on Crowd and Expert
mance of mCLIP and mBLIP in predicting misogynistic labels. (* ) denotes models calibrated using the the Youden’s J
content, evaluated against both Crowd and Expert an- statistic.
notations. Overall, both models show good
classificaTable 4 reports the performance of the considered
base3https://huggingface.co/sentence-transformers/clip-ViT-B-32-mul line models in predicting disagreement between crowd
tilingual-v1 and expert judgments, under two conditions: raw model
45Thottpens:s/u/hreugregpinrogdfuacceib.ciloit/yGorefgoourr/mrebsulilpts-,mwt0e-rxelport the main training outputs and outputs calibrated using the Youden’s J
statisparameters used: batch size = 4, classification threshold = 0.5, and tic [31] to determine the best classification threshold on
number of training epochs = 5. the probability distribution. When evaluating against the
(a) Expert
(b) Crowd</p>
          <p>Crowd labels, both mCLIP and mBLIP perform poorly, types of misogyny. Figure 2 reports four violin plots
assigning all instances to the negative class. However, ap- corresponding to diferent misogyny categories,
distinplying the Youden correction significantly improves per- guishing between Experts and Crowd annotations. Each
formance, increasing the average F1 from 0.35 to 0.50 for plot displays the distribution of a specific variable as a
mCLIP and 0.45 for mBLIP. In the Expert setting, uncali- percentage6 on the y-axis. The bright-colored regions
repbrated models exhibit an inverse pattern: perfect recall resent the distributions within the whole dataset, while
and high precision for positive labels (F1 = 0.90), but do the darker-colored regions overlaid within each violin
ilnot detect negative samples, again reflecting a strong pre- lustrate the distribution of the errors for each label. From
diction bias. The use of the Youden’s threshold reduces the visual comparison, we can easily notice that:
such a bias (F1− = 0.30), at the cost of reduced precision
and recall on the positive class. Overall, these results • Stereotype and Objectification labels exhibit
relhighlight a key challenge in using pretrained multimodal atively symmetrical and balanced distributions
models for subtle content moderation tasks: while de- with a moderate spread, indicating consistent
disfault thresholds may lead to heavily skewed predictions, tribution across a broad range of values. The
ersimple calibration strategies can significantly rebalance ror distributions for these labels are also centered,
model behavior, though not without trade-ofs. suggesting relatively low and uniform prediction</p>
          <p>We further analyzed models’ errors to better evalu- errors.
ate models’ performances, particularly considering the • Shaming and Violence have a sharp, narrow
instances that were mislabeled by both classification mod- dataset and error distributions, denoting a lot
els. A first analysis focuses on the evaluation of errors 6The percentage value has been computed with respect to the subset
in misogyny identification with respect to the diferent of data labeled as misogynous by the majority of annotators.
by the degree of annotator agreement. As part of future
work, we plan to conduct a more in-depth qualitative
error analysis, with a specific focus on identifying the
most challenging archetypes of controversial or
ambiguous memes, following the approach proposed in [32], to
better understand the limitations of current models and
highlight open challenges in the detection of misogyny
in Italian.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Conclusions</title>
      <sec id="sec-2-1">
        <title>In this paper, we presented a novel Italian multimodal</title>
        <p>benchmark dataset designed to support the automatic
Figure 3: Violin plots showing the distribution of annotator detection of misogynistic memes in online social media.
laagbreelem(Meinsto(gyy-nayxisv,sp.eNrcoetn-Mtagiseo)gdyinstyi)ngaunidshainnngobteattwioenensocularcses The dataset emphasizes diversity in content and
label(Experts vs. Crowd). The lighter area in each violin represents ing perspectives, ofering a comprehensive view of how
the full dataset distribution, while the darker overlay indicates misogyny is manifested and perceived across diferent
the distribution of model prediction errors. annotator groups. The proposed benchmark, collected
using a variety of popular platforms and focusing on a
wide spectrum of misogynistic expressions, ensures a
broad coverage of the phenomenon. Moreover, the dual
of misogynous memes not belonging to those annotation strategy, which includes both domain experts
classes. and crowd annotators, provides an opportunity to
invesBy analyzing the shapes of the violin plots, we can no- tigate the discrepancies in perceiving contents, therefore
tice that the violins dedicated to Shaming and Violence improving the robustness of future automatic detection
assume a shape broader at the basis, denoting a signif- systems that account for perspectivism.
icant portion of misogynous memes not labeled with
those types. Considering all the misogyny types, we can Acknowledgments
notice that the Expert plot is consistent in shape with the
Crowd one for all the types, indicating a general ability We acknowledge the support of the PNRR ICSC National
of the crowd annotators in recognizing all the misogyny Research Centre for High Performance Computing, Big
types. Data and Quantum Computing (CN00000013), under the</p>
        <p>Subsequently, we evaluated models’ ability in detect- NRRP MUR program funded by the NextGenerationEU.
ing misogynistic content with respect to disagreement This work has also been supported by ReGAInS,
Debetween annotators. Figure 3 reports two violin plots partment of Excellence. The authors would also like
of the agreement among annotators along with the pre- to thank the significant contributions of the master’s
studiction error distributions for misogyny classification, dents Annalisa Bachir, Gökalp Recep Boz, Gaia Campisi,
distinguishing between Expert and Crowd annotators. Marco Cervelli, Lisa Cocchia, Francesca Frigerio, Rosa
The y-axis represents annotator agreement as a percent- Gotti, Monica Mantovani, Matteo Parisi, Emma Salvadori,
age, with higher values indicating stronger consensus whose dedicated eforts were fundamental to the
develamong annotators, both on the Misogynous and Non- opment and compilation of the MAMITA dataset.
Misogynous labels. Each violin, representing the Expert
and the crowd evaluation respectively, is divided into
two layers: the lighter area represents the distribution References
of the full dataset, while the darker overlay highlights
the distribution of the model’s prediction errors. It is [1] C. Bosco, E. Ježek, M. Polignano, M. Sanguinetti,
easy to notice that the Expert-dedicated violin assumes Preface to the Eleventh Italian Conference on
Coman hourglass shape, denoting a tendency for Experts to putational Linguistics (CLiC-it 2025), in:
Proceedagree on both classes. The crowd plot instead shows a ings of the Eleventh Italian Conference on
Compumore uniform distribution, denoting a greater variabil- tational Linguistics (CLiC-it 2025), 2025, pp. –.
ity in the disagreement between crowd annotators. In [2] The Council of Europe, 6th general report on
greboth cases, the error distribution appears to be consistent vio’s activities: Group of experts on action against
and unrelated to the disagreement distribution. These violence against women and domestic violence,
patterns indicate that model errors are not influenced
During the preparation of this work, the author(s) used ChatGPT (OpenAI) and Grammarly in order
to: Paraphrase and reword and Grammar and spelling check. After using these tool(s)/service(s), the
author(s) reviewed and edited the content as needed and take(s) full responsibility for the
publication’s content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          2024. URL: https://rm.coe.
          <source>int/6th- general- rep volume 2263</source>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>ort-on-grevio-s-activities/1680b5cbe8</article-title>
          . [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , et al.,
          <source>Ami</source>
          <volume>@</volume>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tiropanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bokhove</surname>
          </string-name>
          ,
          <article-title>The problem evalita2020: Automatic misogyny identification</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>of identifying misogynist language on twitter (and Proceedings of the 7th evaluation campaign of Natu-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>8th ACM Conference on Web Science</source>
          ,
          <year>2016</year>
          , pp.
          <source>(EVALITA</source>
          <year>2020</year>
          ),
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          333-
          <fpage>335</fpage>
          . [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cascione</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cerulli</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Manerba</surname>
            , L. Passaro, [4]
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fersini</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <article-title>Women's professions and targeted misogyny online,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Rangel</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , M. Sanguinetti, SemEval- in
          <source>: Proceedings of the 10th Italian Conference on</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>2019 task 5: Multilingual detection of hate speech Computational Linguistics (CLiC-it</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>against immigrants and women in Twitter</article-title>
          , in:
          <fpage>182</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>May</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shutova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , M. Apidi- [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Toraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>13th International Workshop on Semantic Evalu- C. Zapparoli, PejorativITy: Disambiguating pe-</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Minneapolis</surname>
          </string-name>
          , Minnesota, USA,
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          . Italian tweets, in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            , [5]
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Zeinert</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Inie</surname>
            , L. Derczynski, Annotating on- A. Lenci,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>line misogyny</article-title>
          , in: C.
          <string-name>
            <surname>Zong</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          2024 Joint International Conference on Computa-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          (Eds.),
          <source>Proceedings of the 59th Annual Meeting of tional Linguistics</source>
          , Language Resources and Evalua-
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>the Association for Computational Linguistics and tion (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <surname>ICCL</surname>
          </string-name>
          , Torino,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>the 11th International Joint Conference on Natu- Italia</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>12700</fpage>
          -
          <lpage>12711</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>ral Language Processing</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Firooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <year>2021</year>
          , pp.
          <fpage>3181</fpage>
          -
          <lpage>3197</lpage>
          . lenge:
          <article-title>Detecting hate speech in multimodal memes</article-title>
          , [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sheppard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Richter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohen</surname>
          </string-name>
          , E. Smith, in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
          </string-name>
          , M. Bal-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>An expert-annotated dataset for subtle misogyny Processing Systems</article-title>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>detection and mitigation</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>2611</fpage>
          -
          <lpage>2624</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          (Eds.), Findings of the Association for [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramamoorthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gunti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , S. Suryavar-
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Computational</surname>
            <given-names>Linguistics: ACL</given-names>
          </string-name>
          <year>2024</year>
          ,
          <article-title>Association dan, A</article-title>
          . Reganti,
          <string-name>
            <given-names>P.</given-names>
            <surname>Patwa</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. DaS</surname>
          </string-name>
          , T. Chakraborty,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          , Bangkok, Thailand,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekbal</surname>
          </string-name>
          , et al.,
          <source>Memotion</source>
          <volume>2</volume>
          : Dataset on
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <year>2024</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>452</lpage>
          .
          <article-title>sentiment and emotion analysis of memes</article-title>
          , in: Pro[7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Almanea</surname>
          </string-name>
          , M. Poesio, ArMIS - the
          <string-name>
            <surname>Arabic</surname>
          </string-name>
          misog- ceedings of De-Factify: workshop on multimodal
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          disagreements, in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Declerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goggi</surname>
          </string-name>
          , H. Isa- [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , T. Chakraborty,
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          , Mar- Computational
          <source>Linguistics: NAACL</source>
          <year>2022</year>
          , Associa-
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>seille</surname>
          </string-name>
          , France,
          <year>2022</year>
          , pp.
          <fpage>2282</fpage>
          -
          <lpage>2291</lpage>
          . tion for Computational Linguistics, Seattle, United [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rodríguez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , States,
          <year>2022</year>
          , pp.
          <fpage>1572</fpage>
          -
          <lpage>1588</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendieta-Aragón</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Marco-Remón</surname>
          </string-name>
          , [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suryawanshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          , M. Arcan,
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of exist 2022:
          <article-title>sexism iden- for identifying ofensive content in image and text,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          )
          <fpage>229</fpage>
          -
          <lpage>240</lpage>
          . S. Malmasi,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murdock</surname>
          </string-name>
          , D. Kadar (Eds.), Proceed[9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , R. Morante, ings of the Second Workshop on Trolling, Aggres-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <article-title>of exist 2023: sexism identification in social net- sources Association (ELRA), Marseille</article-title>
          , France,
          <year>2020</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          works, in: European Conference on Information pp.
          <fpage>32</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Retrieval</surname>
          </string-name>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>593</fpage>
          -
          <lpage>599</lpage>
          . [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mandal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chadha</surname>
          </string-name>
          , S. Saha, [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Overview</surname>
          </string-name>
          of P. Bhattacharyya,
          <article-title>MemeGuard: An LLM and VLM-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <article-title>the evalita 2018 task on automatic misogyny iden- based framework for advancing content moderation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 62nd An- J</source>
          . Clark,
          <string-name>
            <given-names>G.</given-names>
            <surname>Krueger</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , Learning transfer-
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <article-title>Association sion</article-title>
          ,
          <source>in: Proc. of the 38th International Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          , Bangkok, Thailand,
          <source>on Machine Learning (ICML)</source>
          , volume
          <volume>139</volume>
          <source>of Proc. of</source>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <year>2024</year>
          , pp.
          <fpage>8084</fpage>
          -
          <lpage>8104</lpage>
          . Machine Learning Research, PMLR,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>[</lpage>
          19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saibene</surname>
          </string-name>
          , 8763. URL: https://proceedings.mlr.press/v139/rad
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sorensen</surname>
          </string-name>
          , Semeval- ford21a.html.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>2022 task 5: Multimedia automatic misogyny iden-</article-title>
          [29]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-bert: Sentence
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <article-title>tification, in: Proceedings of the 16th International embeddings using siamese bert-networks</article-title>
          , in: Pro-
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Workshop on Semantic</surname>
          </string-name>
          <article-title>Evaluation (SemEval-2022), ceedings of the 2019 Conference on Empirical Meth-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <year>2022</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>549</lpage>
          . ods in
          <source>Natural Language Processing</source>
          , Association [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. K.</given-names>
            <surname>Singh</surname>
          </string-name>
          , Mimic: misog- for
          <source>Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: http:
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <article-title>yny identification in multimodal internet content //arxiv</article-title>
          .org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          <article-title>in hindi-english code-mixed language</article-title>
          , ACM Trans- [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Geigle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Timofte</surname>
          </string-name>
          , G. Glavaš, mblip:
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <surname>formation Processing</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <source>in: Proceedings of the 3rd Workshop on Advances</source>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Maeso,
          <source>in Language and Vision Research (ALVR)</source>
          ,
          <year>2024</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          , J. Gonzalo,
          <volume>7</volume>
          -
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , Overview of exist 2024-learn- [31]
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Youden</surname>
          </string-name>
          ,
          <article-title>Index for rating diagnostic tests</article-title>
          , Can-
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          <article-title>ing with disagreement for sexism identification and cer 3 (</article-title>
          <year>1950</year>
          )
          <fpage>32</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <article-title>characterization in tweets and memes</article-title>
          , in: Inter- [32]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saibene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , E. Fersini,
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <year>2024</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>117</lpage>
          . Management 60 (
          <year>2023</year>
          )
          <fpage>103474</fpage>
          . [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , I. Arcos, P. Rosso,
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          2025:
          <article-title>Learning with disagreement for sexism iden-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <string-name>
            <surname>Information</surname>
            <given-names>Retrieval</given-names>
          </string-name>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>442</fpage>
          -
          <lpage>449</lpage>
          . [23]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          , S. Rajiakodi,
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          <source>detection: Dravidianlangtech@ naacl</source>
          <year>2025</year>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <string-name>
            <surname>guages</surname>
          </string-name>
          ,
          <year>2025</year>
          , pp.
          <fpage>721</fpage>
          -
          <lpage>731</lpage>
          . [24]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Fleiss</surname>
          </string-name>
          , Measuring nominal scale agreement
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <article-title>among many raters</article-title>
          .,
          <source>Psychological bulletin 76</source>
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          (
          <year>1971</year>
          )
          <fpage>378</fpage>
          . [25]
          <string-name>
            <given-names>V.</given-names>
            <surname>Tarantino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Passerello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ben-Sasson</surname>
          </string-name>
          , T. Y.
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          naire,
          <source>PloS one 19</source>
          (
          <year>2024</year>
          )
          <article-title>e0309030</article-title>
          . [26]
          <string-name>
            <given-names>N. E.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Boyce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Chesney</surname>
          </string-name>
          , S. Cohen,
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          <source>American psychologist 49</source>
          (
          <year>1994</year>
          )
          <fpage>15</fpage>
          . [27]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendorf</surname>
          </string-name>
          , Answering the call
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <source>Communication methods and measures 1</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          77-
          <fpage>89</fpage>
          . [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ramesh,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>