<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G. Rizzi); prosso@dsic.upv.es
(P. Rosso); elisabetta.fersini@unimib.it (E. Fersini)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>From Explanation to Detection: Multimodal Insights into Disagreement in Misogynous Memes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulia Rizzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politècnica de València</institution>
          ,
          <addr-line>Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Warning: This paper contains examples of language and images that may be ofensive. This paper presents a probabilistic approach to identifying the disagreement-related elements in misogynistic memes by considering both modalities that compose a meme (i.e., visual and textual sources). Several methodologies to exploit such elements in the identification of disagreement among annotators have been investigated and evaluated on the Multimedia Automatic Misogyny Identification (MAMI) [ 1] dataset. The proposed unsupervised approach reaches comparable performances, and in some cases even better, with state-of-the-art approaches, but with a reduced number of parameters to be estimated. The source code of our approaches is publicly available†.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Disagreement</kwd>
        <kwd>Perspectivism</kwd>
        <kwd>Multimodal</kwd>
        <kwd>Misogyny</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Multimedia Automatic Misogyny Identification (MAMI)</title>
        <p>
          dataset [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Moreover, four diferent strategies to exploit
Hate detection has been a serious concern in recent years, the presence of such elements in the identification of
penetrating internet platforms and causing harm to indi- disagreement are investigated.
viduals across various communities. Users found in the
online environment new modes of representation to
express various types of hatred, including the more deeply 2. Related Works
rooted ideologies and beliefs with historical origins, for
example towards women [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>Detecting abusive language has become an increasingly
important task. The challenges introduced by the new
modes of representation, which require a multimodal
analysis, are further compounded when considering the
subjectivity of the task. The subjectivity of the task
derives from the fact that individuals’ perception of what
characterizes a message of hate varies widely. Such
diversification is reflected in the labeling phase in the form
of disagreement among annotators. Identifying elements
within the sample that can lead to disagreement is of
paramount importance for several reasons. For content
that can lead to disagreement, specific annotation policies
might be introduced, and the number of annotators might
be enlarged to capture multiple perspectives [3, 4, 5].</p>
        <p>In this work, we propose a methodology to identify the
disagreement-related elements in multimodal samples
by exploring both visual and textual elements in the
Many natural language tasks, such as hate speech
detection, humor detection, and sentiment analysis, involve
subjectivity since they require an interpretation based on
human judgment, cultural context, or personal opinion
[6]. Such phenomenon is reflected in the dataset through
multiple labels from diferent annotators or via the
inclusion of a confidence level to ground truth labels. Labels
derived from diferent interpretations are therefore able
to capture multiple perspectives and understandings [6].</p>
        <p>Information about annotators’ disagreement has
primarily been exploited as a means to improve data quality
by excluding controversial instances [7, 8].
Alternatively, aiming at improving model performances,
different strategies have been developed to exploit
disagreement information in the training phase. For
instance, in [9], the authors assign weights to instances
to prioritize the ones with higher confidence levels.
Another commonly adopted strategy [6, 10] aims at directly
learning from disagreement without considering any
aggregated label. While a considerable amount of
research has been conducted to understand the reasons
behind annotators’ disagreement [11, 12, 8] and to
leverage disagreement when training classification models
agreement has been proposed by [12]. Such taxonomy to a multimodal scenario. In particular, [23] introduces
articulates four macro categories of reasons behind dis- a methodology to identify disagreement related
conagreement: sloppy annotations, ambiguity, missing infor- stituents that, however, is limited to textual content. The
mation, and subjectivity. Moreover, the authors evaluate approach includes a strategy to identify
disagreementthe impact on classification performance of the diferent related textual constituents and an approach for
gentypes. eralization towards unseen textual constituents. Both</p>
        <p>Only recently, works have focused on the task of ex- methods have been extended to a multimodal scenario
plaining disagreement [20, 21, 22, 23]. In [21], the au- in order to identify disagreement related elements both
thors propose exploratory text visualization techniques in textual and visual sources that compose a meme.
as a method for analyzing diferent perspectives from Given an element , a corresponding Element
Disagreeannotated data. In [22], the authors identify textual con- ment Score ( EDS(e)) has been computed according to the
stituents that contribute to hateful message explanation following equation:
by exploiting integrated gradients within a filtering
strategy. A more recent approach [23] proposes a probabilistic () =  (|) −  (¬|) (1)
semantic approach for the identification of
disagreementrelated constituents (e.g. textual elements) in hateful
content. Overall, the findings indicate that, while LLM
can yield promising results, comparable outcomes can
be attained with less complex strategies and fewer
computational resources. While previous research has
concentrated on the analysis of textual disagreement, this
study represents, to the best of our knowledge, a first
insight into the explanation of multimodal disagreement.</p>
        <p>In particular, we have revised and extended to the
multimodal environment the methodology proposed in [23]
in order to consider not only textual elements but also
visual ones.</p>
        <p>where  (|) represents the conditional
probability that there is agreement on a meme given
that the meme contains the element . Analogously,
 (¬|) denotes the conditional probability that
there is no agreement on a meme given that, that meme,
contains the element . Given that EDS represents a
difference between two complementary probabilities, it is
bounded within the range of -1 to +1. A higher positive
score indicates stronger agreement between annotators,
whereas a lower negative score suggests disagreement.</p>
        <p>The score can be estimated on the training data and
exploited to identify additional disagreement-related
elements on unseen memes.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Proposed Approach</title>
      <sec id="sec-2-1">
        <title>3.2. Disagreement identification</title>
        <p>
          3.1. Identification of Once the Element Disagreement Scores have been
estiDisagreement-Related Elements mated for each visual and textual element in the training
dataset, they can be exploited to qualify the level of
disThe first phase of the proposed approach aims to evalu- agreement on unseen samples. Analogously to what
ate the relationship between elements (both visual and was carried out in [23], diferent aggregation strategies
textual) that compose a meme and annotators’ disagree- have been investigated, relying on the hypothesis that
ment. Preliminary preprocessing operations have been the identified elements can be exploited for identifying
performed before identifying disagreement-related ele- the disagreement thanks to their diferent distribution in
ments. For what concerns the textual components, pre- samples with and without an agreement.
processing operations have been performed (i.e., tok- For each meme in the test set, the corresponding list
enization, lemmatization, lower casing and stop word of elements and the corresponding Elements
Disagreeremoval) to identify a valid set of tokens1 that might be ment Score estimated on the training data have been
related to disagreement. Considering the image com- extracted. In particular, for each meme, the textual and
ponent, the set of 14 human readable concepts (tags) visual elements have been identified and paired with the
identified by [
          <xref ref-type="bibr" rid="ref3">24</xref>
          ] to capture specific characteristics of corresponding score when available. The Multimodal
misogynous content has been adopted. As proposed by Disagreement Score (MDS) has been estimated according
the authors, tags were extracted via the Clarifai API [
          <xref ref-type="bibr" rid="ref4">25</xref>
          ]. to the following strategies: Sum, Mean, Median, and
The preprocessing steps allowed us to extract a list of vi- Minimum. A threshold  has been estimated according
sual and textual elements from each meme in the dataset. to a grid-search approach for each strategy.
        </p>
        <p>In order to measure the relationship among each ele- A qualitative evaluation, comprehensive of a
compariment in the memes and the disagreement among annota- son with the specific misogynistic terminology and an
tors, the approach proposed in [23] has been extended evaluation of the keyword included in the dataset
creation phase, has been performed to assess the quality of
the EDS, while both the F1-score for the two considered</p>
        <sec id="sec-2-1-1">
          <title>1To guarantee a more robust evaluation, tokens that appear less than</title>
          <p>10 times in the dataset have been removed.
classes (agreement (+) and disagreement (-)) and a global
F1-score have been computed to validate the MDS.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Generalization towards unseen elements</title>
        <p>The score estimation is strongly based on what is
observed in the training data, resulting in the lack of scores
for any elements that do not appear in the training
samples. This is particularly relevant for textual components
rather than visual ones. In fact, while we can assume
an open-word vocabulary (where a few terms on unseen
data can not appear in the training set) for the textual
source, we limited the visual tags to closed-word settings
(only 14 tags can be considered both in training and
unseen memes). Since we need to generalize only on unseen
textual constituents, for each (unseen) textual element
ˆ, an approximated EDS score has been computed as
follows:
 of the training lexicon:</p>
        <p>∑︀ [(, ˆ) · ()]
(ˆ) = ∈
(3)
∑︀ (, ˆ)
∈
• Multimodal Disagreement Score with
unseen constituents: All the above-proposed
strategies for MDS estimation have been extended
to also include elements that do not belong to the
training lexicon and for which the EDS score has
been estimated. In particular, given a multimodal
sample , the aggregation functions presented in
Section 3.2 will in this case consider the 
values of both seen (by considering the ())
and unseen (by considering the (ˆ))
elements. Such generalized aggregation functions
will be later referred to through the prefix − .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results</title>
      <p>
        • Embeddings of the training lexicon: the
contextualized embedding representation of each
textual element  has been obtained via mBert [
        <xref ref-type="bibr" rid="ref5">26</xref>
        ]. The proposed approach has been evaluated on the
An average embedding vector representation ⃗x Multimedia Automatic Misogyny Identification (MAMI)
is computed to jointly represent multiple embed- Dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] consisting of 10.000 memes for training and
ding representations of  derived by the diferent 1.000 memes for testing 2. The dataset comprises a range
contexts where it occurs. In particular, given an el- of memes that exemplify various forms of misogyny,
inement  and  sentences containing it, its vector cluding shaming, stereotyping, objectification, and
virepresentation ⃗x is obtained by a simple aver- olence. Each meme has been labeled by three
crowd sourced annotators for misogynistic content3, with an
age ⃗x = ∑︀ ⃗v/ , where ⃗v is the constituent estimated Fleiss-K [
        <xref ref-type="bibr" rid="ref6">27</xref>
        ] coeficient equal to 0.5767.
contextuali=z1ed embedding vector related to the In particular, the proposed approach has been adopted
ℎ occurrence of  and obtained through mBert. to estimate an Element Disagreement Score (EDS) for
each element and, consequently, MDS for each meme in
• tEemxtbuealdedlienmgesnotfˆ uwnitsheiennatgeirvmen: sgeinvteennacne, uitnssceoenn- the dataset.
      </p>
      <p>
        textualized embedding representation has been Table 1 reports the top-10 highest positive and
highcomputed via mBert [
        <xref ref-type="bibr" rid="ref5">26</xref>
        ]. est negative disagreement scores derived for the textual
component. We can notice how terms that are rarely
• Most similar constituent: given an unseen linked with misogynistic messages (e.g., flu ) and terms
textual element ˆ with the corresponding embed- commonly used to address women in a harmful way (e.g.,
ding ⃗v^ and the average embedding of a training whale) also exploiting stereotypes (e.g. gamer and
proelement , the set  of most similar constituents grammer), achieve a high positive score, indicating a
to ˆ is determined according to: strong relation with the agreement. Additionally, some
personal names of famous people (i.e., Bernie and
Mi(2) ley) appear within the ranking. In particular, such names
 = ⋃︁{|(⃗x, ⃗v^) ≤  }
      </p>
      <p>where (⃗x, ⃗v^) is the cosine similarity
between the average contextualized embedding
representation of element  and ˆ, and  is a grid
search estimated threshold.
• Unseen terms score: the EDS score for an
unseen textual element ˆ is computed as the
weighted average of the most similar constituents</p>
      <sec id="sec-3-1">
        <title>2Although both a training and a test dataset are provided, only the</title>
        <p>training dataset is adopted, as the proposed work is focused on
the analysis and prediction of disagreement and the test dataset
is constructed to include only samples with complete agreement.</p>
        <p>
          The training dataset, instead, is characterized by 65% of data with
complete agreement. Therefore, it has been divided in order to
isolate the 90% for token estimation and the remaining 10% for the
evaluation.
3Additionally, a boolean disagreement label has been derived to
represent complete agreement among annotators. In particular, this
last label is set to 1 if all the annotators have indicated the same
label, to 0 otherwise.
Table 1 subject of debate, particularly in relation to its
intersecTerms with the highest positive and lowest negative scores tion with misogynistic ideologies [
          <xref ref-type="bibr" rid="ref7 ref8">28, 29</xref>
          ]. Some
supporters, often aligned with "manosphere" or "red pill"
ideologies, argue that the sexual marketplace
disproportionately empowers women, giving them more control
might appear in memes as the target of a hateful message, over sexual selection and relationships, which can
disreferring to their personal life, physical appearance, or advantage men. On the other hand, critics assert that
specific events that involved them. As a consequence, this perspective reduces human relationships to
transacdepending on the reasons that lead to such criticism (gen- tional exchanges and objectifies both genders, ultimately
der, physical appearance, and personal choices for Miley reinforcing misogynistic attitudes. This last viewpoint
asCyrus vs. political stance and career, without the same serts that framing relationships in market terms devalues
gendered connotations, for Bernie Sanders) there might emotional connection and perpetuates harmful
stereobe disagreement about misogyny. types about women’s worth being tied solely to their
        </p>
        <p>Table 2 reports the top-5 highest positive and highest sexual desirability. Achieved results suggest the ability
negative disagreement scores derived for the visual com- of the approach to detect such variety in interpretations
ponent. It is easy to notice how all the scores are positive and reflect them within the EDS scores.
and achieve small values, denoting a tendency of such Figure 2 reports two memes that share the same text
tags to be weakly related to the agreement label. and a diferent image. Despite such commonalities, the</p>
        <p>Figure 1 reports an example of a meme with disagree- memes have been labeled diferently: while the first
ment along with the visual representation of the EDS of meme has been labeled as misogynous by 2 annotators
its textual and visual elements. Moreover, as highlighted out of 3, the second one has been unanimously labeled
with a grey bar, some of the reported scores have been es- as non-misogynous. Since such memes share a common
timated. Such scores correspond, in fact, to constituents textual representation, the derived textual elements and
that are not present in the training dataset and for which textual-EDS are also equal, resulting in an
indistinguishit was not possible to calculate the ESD score. The visual able representation that is inefective for disagreement
representation of the scores related to such elements cor- identification. Moreover, although the memes difer in
responds to the score obtained through the estimation the visual content, resulting in diferent tags and,
therestrategy. Overall, it is easy to notice the presence of ele- fore, diferent textual-EDS, as previously mentioned, such
ments strongly related to disagreement (i.e., sexual and a component alone is not suficient for disagreement
premarket), highlighted in pink. diction.</p>
        <p>The concept of the "sexual marketplace" is often the The findings demonstrate the necessity of joint
considera</p>
      </sec>
      <sec id="sec-3-2">
        <title>5 instead summarises results achieved by the aggregation</title>
        <p>of the scores derived from all the elements (i.e., terms
and tags). Results achieved on the textual component
only highlight G-Mean as the most performing approach.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Overall, the estimation strategy results in an improvement of performances up to 6%, confirming the ability</title>
        <p>
          tionships for unseen terms. Furthermore, BERT [
          <xref ref-type="bibr" rid="ref9">30</xref>
          ]4 has
of the proposed strategy to capture disagreement rela- the best approach in terms of F1-score, and underline
represents the best approach according to the disagreement label.
been reported as a state-of-the-art baseline for unimodal  and  represent the best hyperparameters estimated via a
textual classification. Achieved results show how BERT
performs better on the majority class, struggling in
predicting the disagreement class. The proposed approach,
instead leads to performance more balanced among the
two classes.
provement in performances that, however, remain quite
visual component only. However, while the Sum ap- poor, highlighting the dificulty of the task. The
incluproach (i.e., the most performing approach among the
tagsion of the unseen constituents estimation leads to an
based) demonstrates satisfactory performance in iden- improvement of performance (except for the sum-based
tifying positive instances (achieving an F1+ of 0.69), it
exhibits considerable dificulty in accurately identifying
negative instances.
        </p>
        <p>Finally, Table 5 reports the performances of the
different approaches for disagreement identification jointly
considering both modalities. Furthermore, for a better
comparison of the performance achieved by the proposed
4BERT has been implemented and finetuned using the hugging-face
method) up to 8% for the mean-based approach.
However, the best performances are achieved by the minimum
and G-minimum approaches, for which the estimation
methodology is not efective. Such behavior may be
attributed to the imbalance in the dataset. The larger the
number of samples with agreement, the greater the
num5CLIP has been implemented and finetuned using the huggingface
framework with default hyperparameters. In particular, we used
framework with default hyperparameters. We adopted
"bert-basethe version available at https://huggingface.co/openai/clip-vit-l
cased" available at https://huggingface.co/google-bert/bert-base-c
arge-patch14 to which we concatenated a linear layer for binary
ased.</p>
        <p>classification.
0.34
0.48
0.49
0.49
0.52
0.45
0.40
0.40</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion and Future Works</title>
      <sec id="sec-4-1">
        <title>This paper proposes a probabilistic approach to identify</title>
        <p>disagreement-related elements in multimodal content.
The proposed approach allows for the identification of
elements that could be used as a proxy to identify samples
that might be perceived diferently by the annotators,
and therefore, that could lead to disagreement. Achieved
results highlight the dificulty of the task, denoting the
need for a more advanced approach. Future work will
include diferent strategies for image analysis in order to
provide a better description of the image itself in all the
elements that compose it. Furthermore, a study of the
compositionality might be carried out to better represent
the relationship among such elements inside the meme.
The sense of a meme is often derived from the meanings
of its individual parts (i.e. the image and text) and the
way they are combined. By analyzing how diferent
elements interact and contribute to the overall message, it is
possible to gain a deeper understanding of how the
meaning is represented within the diferent modalities. This
will help in identifying complex patterns and improve
the accuracy of classification models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>We acknowledge the support of the PNRR ICSC National Research Centre for High Performance Computing, Big Data and Quantum Computing (CN00000013), under the</title>
      </sec>
      <sec id="sec-5-2">
        <title>NRRP</title>
      </sec>
      <sec id="sec-5-3">
        <title>MUR program funded by the</title>
      </sec>
      <sec id="sec-5-4">
        <title>NextGenerationEU. The work of Paolo Rosso was</title>
        <p>in the framework of the FairTransNLP-Stereotypes
research project (PID2021-124361OB-C31) funded by
MCIN/AEI/10.13039/501100011033 and by ERDF, EU A
way of making Europe.
todimamma, How do we study misogyny in the
digital age? a systematic literature review using a
computational linguistic approach, Humanities and</p>
      </sec>
      <sec id="sec-5-5">
        <title>Social Sciences Communications 11 (2024) 1–15.</title>
        <p>[3] P. Kralj Novak, T. Scantamburlo, A. Pelicon,</p>
      </sec>
      <sec id="sec-5-6">
        <title>M. Cinelli, I. Mozetič, F. Zollo, Handling disagree</title>
        <p>ment in hate speech modelling, in: International</p>
      </sec>
      <sec id="sec-5-7">
        <title>Conference on Information Processing and Manage</title>
        <p>ment of Uncertainty in Knowledge-Based Systems,
Springer, 2022, pp. 681–695.
[4] C. van Son, T. Caselli, A. Fokkens, I. Maks,</p>
      </sec>
      <sec id="sec-5-8">
        <title>R. Morante, L. Aroyo, P. Vossen, GRaSP: A multi</title>
        <p>layered annotation scheme for perspectives, in:</p>
      </sec>
      <sec id="sec-5-9">
        <title>N. Calzolari, K. Choukri, T. Declerck, S. Goggi,</title>
      </sec>
      <sec id="sec-5-10">
        <title>M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo,</title>
        <p>A. Moreno, J. Odijk, S. Piperidis (Eds.),
Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC’16),
European Language Resources Association (ELRA),
Portorož, Slovenia, 2016, pp. 1177–1184. URL: https: (SemEval-2023), Association for Computational
Lin//aclanthology.org/L16-1187. guistics, Toronto, Canada, 2023, pp. 171–176. URL:
[5] S. Frenda, G. Abercrombie, V. Basile, A. Pedrani, https://aclanthology.org/2023.semeval- 1.24.</p>
        <p>R. Panizzon, A. T. Cignarella, C. Marco, D. Bernardi, doi:10.18653/v1/2023.semeval-1.24.
Perspectivist approaches to natural language pro- [15] M. Sullivan, M. Yasin, C. L. Jacobs, University at
cessing: a survey, Language Resources and Evalua- bufalo at semeval-2023 task 11: Masda–modelling
tion (2024) 1–28. annotator sensibilities through disaggregation, in:
[6] A. Uma, T. Fornaciari, D. Hovy, S. Paun, B. Plank, Proceedings of the 17th International Workshop
M. Poesio, Learning from disagreement: A survey, on Semantic Evaluation (SemEval-2023), 2023, pp.
Journal of Artificial Intelligence Research 72 (2021) 978–985.</p>
        <p>1385–1470. [16] A. de Paula, G. Rizzi, E. Fersini, D. Spina, et al.,
[7] B. Beigman Klebanov, E. Beigman, From annotator Ai-upv at exist 2023–sexism characterization
usagreement to noise models, Computational Linguis- ing large language models under the learning with
tics 35 (2009) 495–503. disagreements regime, in: CEUR WORKSHOP
PRO[8] Y. Sang, J. Stanton, The origin and value of dis- CEEDINGS, volume 3497, CEUR-WS, 2023, pp. 985–
agreement among data labelers: A case study of 999.
individual diferences in hate speech annotation, in: [17] J. Erbani, E. Egyed-Zsigmond, D. Nurbakova,
P.Information for a Better World: Shaping the Global E. Portier, When multiple perspectives and an
Future: 17th International Conference, iConference optimization process lead to better performance,
2022, Virtual Event, February 28–March 4, 2022, an automatic sexism identification on social media
Proceedings, Part I, Springer, 2022, pp. 425–444. with pretrained transformers in a soft label context,
[9] A. Dumitrache, F. Mediagroep, L. Aroyo, C. Welty, Working Notes of CLEF (2023).</p>
        <p>A crowdsourced frame disambiguation corpus with [18] M. E. Vallecillo-Rodríguez, F. del Arco, L. A.
Ureñaambiguity, in: Proceedings of NAACL-HLT, 2019, López, M. T. Martín-Valdivia, A. Montejo-Ráez,
Intepp. 2164–2170. grating annotator information in transformer
fine[10] T. Fornaciari, A. Uma, S. Paun, B. Plank, D. Hovy, tuning for sexism detection, Working Notes of
M. Poesio, et al., Beyond black &amp; white: Leveraging CLEF (2023).
annotator disagreement via soft-label multi-task [19] G. Rizzi, M. Fontana, E. Fersini, Perspectives on
learning, in: Proceedings of the 2021 Conference hate: General vs. domain-specific models, in:
of the North American Chapter of the Association Proceedings of the 3rd Workshop on
Perspecfor Computational Linguistics: Human Language tivist Approaches to NLP (NLPerspectives)@
LRECTechnologies, Association for Computational Lin- COLING 2024, 2024, pp. 78–83.</p>
        <p>guistics, 2021. [20] M. Michele, V. Basile, F. M. Zanzotto, et al., Change
[11] L. Han, E. Maddalena, A. Checco, C. Sarasua, my mind: How syntax-based hate speech
recogU. Gadiraju, K. Roitero, G. Demartini, Crowd nizer can uncover hidden motivations based on
difworker strategies in relevance judgment tasks, in: ferent viewpoints, in: 1st Workshop on
PerspecProceedings of the 13th international conference tivist Approaches to Disagreement in NLP,
NLPeron web search and data mining, 2020, pp. 241–249. spectives 2022 as part of Language Resources and
[12] M. Sandri, E. Leonardelli, S. Tonelli, E. Ježek, Why Evaluation Conference, LREC 2022 Workshop,
Eudon’t you do it right? analysing annotators’ dis- ropean Language Resources Association (ELRA),
agreement in subjective tasks, in: Proceedings of 2022, pp. 117–125.
the 17th Conference of the European Chapter of the [21] L. Havens, B. Bach, M. Terras, B. Alex, Beyond
exAssociation for Computational Linguistics, 2023, pp. planation: A case for exploratory text visualizations
2428–2441. of non-aggregated, annotated datasets, in: G.
Aber[13] S. Shahriar, T. Solorio, Safewebuh at semeval- crombie, V. Basile, S. Tonelli, V. Rieser, A. Uma
2023 task 11: Learning annotator disagreement (Eds.), Proceedings of the 1st Workshop on
Perspecin derogatory text: Comparison of direct training tivist Approaches to NLP @LREC2022, European
vs aggregation, arXiv preprint arXiv:2305.01050 Language Resources Association, Marseille, France,
(2023). 2022, pp. 73–82. URL: https://aclanthology.org/202
[14] E. Gajewska, eevvgg at SemEval-2023 task 11: 2.nlperspectives-1.10.</p>
        <p>Ofensive language classification with rater-based [22] A. Astorino, G. Rizzi, E. Fersini, Integrated
gradiinformation, in: A. K. Ojha, A. S. Doğruöz, ents as proxy of disagreement in hateful content, in:
G. Da San Martino, H. Tayyar Madabushi, R. Ku- CEUR WORKSHOP PROCEEDINGS, volume 3596,
mar, E. Sartori (Eds.), Proceedings of the 17th CEUR-WS. org, 2023.</p>
        <p>International Workshop on Semantic Evaluation [23] G. Rizzi, A. Astorino, P. Rosso, E. Fersini,
Unrav</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fersini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saibene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lees</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Sorensen,</surname>
          </string-name>
          <article-title>SemEval2022 task 5: Multimedia automatic misogyny identification</article-title>
          ,
          <source>in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>533</fpage>
          -
          <lpage>549</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fontanella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ignazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarra</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Toneling disagreement constituents in hateful speech</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gasparini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saibene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , E. Fersini,
          <article-title>Recognizing misogynous memes: Biased models and tricky archetypes</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>60</volume>
          (
          <year>2023</year>
          )
          <fpage>103474</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Clarifai</surname>
          </string-name>
          , Clarifai guide, ???? URL: https://docs.clari fai.com/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [26]
          <string-name>
            <surname>J. D. M.-W. C. Kenton</surname>
            ,
            <given-names>L. K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>Bert: Pretraining of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of NAACLHLT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Fleiss</surname>
          </string-name>
          ,
          <article-title>Measuring nominal scale agreement among many raters</article-title>
          .,
          <source>Psychological bulletin 76</source>
          (
          <year>1971</year>
          )
          <fpage>378</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ging</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neary</surname>
          </string-name>
          , Gender, sexuality, and bullying special issue editorial,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ignazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fontanella</surname>
          </string-name>
          , et al.,
          <article-title>Exploring misogyny through time: From historical origins to modern complexities</article-title>
          ,
          <source>Philosophies of Communication</source>
          (
          <year>2023</year>
          )
          <fpage>195</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [30]
          <string-name>
            <surname>J. D. M.-W. C. Kenton</surname>
            ,
            <given-names>L. K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>Bert: Pretraining of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of NAACLHLT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , et al.,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>