<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>RMIT-IR at EXIST Lab at CLEF 2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tony Kim Smith</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H Ruda Nie</string-name>
          <email>rudadhtn89@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johanne R. Trippas</string-name>
          <email>j.trippas@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damiano Spina</string-name>
          <email>damiano.spina@rmit.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>RMIT University</institution>
          ,
          <addr-line>Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tay Nguyen University</institution>
          ,
          <addr-line>Buon Ma Thuot</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper describes RMIT-IR team's participation in the EXIST Lab at CLEF 2024. The proposed approaches aim to address sexism characterization on microblog posts (Tasks 1, 2, and 3) and sexism identification on memes (Task 4). For Tasks 1-3, we studied the efectiveness of zero-shot In-Context Learning (ICL) [1] with of-the-shelf pre-trained Large Language Models (LLMs) to mimic the scenario of minimal intervention of a practitioner aiming to build sexism characterization systems. Our approaches for meme classification (Task 4) utilize CLIP (Contrastive Language-Image Pre-training) [ 2] to experiment with multi-modal embeddings and zero-shot sexism identification models. We report the performance of our approaches under the learning with disagreements regime (Soft evaluation) and also for label predictions (Hard evaluation). The code of our submission is available at https://github.com/rmit-ir/exist2024/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Social media has had a considerable impact on human societies. Applications such as Facebook,
YouTube, Instagram, and TikTok have helped move the zeitgeist while creating large
communities in the millions. However, many social media platforms have issues with people creating and
posting harmful information. The challenge of detecting and managing such harmful content
remains a concern for social media companies, contributing to consequences ranging from
misinformation to adverse efects on mental health [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition, the rise of social media has
empowered influencers who often unwittingly or deliberately propagate harmful stereotypes
and negative gender norms. This type of content attracts an audience and drives advertising
revenue, perpetuating a cycle of negativity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. As a result, it often fosters negative behaviour
towards women and minority groups, impacting many people negatively [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
LGOBE
https://www.johannetrippas.com (J. R. Trippas); https://www.damianospina.com (D. Spina)
      </p>
      <p>
        Sexism is the belief that the members of one sex or gender are less than the members of the
other sex, especially that women are less able than men [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This can be categorized into hostile
sexism and benevolent sexism.
      </p>
      <p>
        Sexism can limit the opportunities and roles people of diferent sexes and genders are expected
to take. It can be conveyed through any form of expression, like images, cartoons, memes,
objects, gestures, and symbols, and can be spread ofline or online. This oppression can take
diferent forms, such as economic exploitation and social domination [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Sexist attitudes and behaviours can perpetuate stereotypes of social and gender roles based
on one’s biological sex. Usually, people are socialized with sexist concepts that teach traditional
gender roles for males and females [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Hostile sexism represents a form of sexist ideology,
marked by explicit hostility towards women and the perception of them as inferior and
submissive[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This deeply ingrained perception often results in the mistreatment of women at
both individual and institutional levels [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Benevolent sexism is a nuanced manifestation that
ingrains in men the belief that they should be responsible for providing for women in intimate
relationships [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This belief system dictates specific roles and behaviours for women, such as
expecting them to demonstrate motherly instincts, subtly reinforcing traditional gender roles.
A society that has high rates of hostile and benevolent sexism often has high rates of violence
against women, such as domestic violence, rape, and the commodification of women and their
bodies [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
      </p>
      <p>
        There has been a recent increase in research on identifying diferent forms of hate speech,
corresponding with advancements in generative pre-trained transformers and, in general, large
language models (LLMs) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Researchers are asking how LLMs can be trained to identify
subtle and overt sexist content [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ]. However, many questions on how state-of-the-art
LLMs can be used for sexism detection are still open. What criteria should be used to evaluate
what constitutes sexism in varied cultural contexts? If a dataset with binary classifications 1 is
employed, can a machine learning model accurately capture the nuances within the text? And
how do we address the evolution of language with new slang and phrases continually emerging?
These questions highlight the complexity of sexism detection. The cost and technical skills
required to create a system that incorporates LLMs that can identify sexism make it unattainable
for most individuals. We aim to simplify the process using pre-trained LLMs and prompts to
address the EXIST lab tasks of classifying and labelling tweets.
      </p>
      <p>
        In addition to the text classification in Tasks 1–3, we address the problem of identifying
sexism in multi-modal formats for Task 4. Memes — ideas, images, or videos that are spread
very quickly on the internet [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] — exist not only in text form but also include any
accompanying images. Therefore, combining text and the attached image (i.e., making the input
multi-modal) can be more conducive to identifying whether a meme is sexist. Multi-modal
models are usually proposed to deal with multi-modal datasets for classification tasks. Among
existing multi-modal systems, Contrastive Language-Image Pre-Training (CLIP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a powerful
vision-and-language (VL) pre-trained model that can directly learn raw text about images. In
addition, CLIP has the ability to map data of diferent modalities, text and images into a shared
embedding space. Hence, CLIP has been shown to be a powerful tool for zero-shot image and
text classification [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Furthermore, CLIP can be beneficial for image-text feature fusion, which
1We acknowledge that the classification of sex and gender into two categories is a simplification of people’s identities.
can boost model performance on natural language processing (NLP) downstream tasks such as
text classification [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and multi-modal sarcasm detection [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Motivated by the success of
CLIP on various VL downstream tasks, this study aims to investigate the following research
questions for Task 4:
• How efective is CLIP for zero-shot sexism identification?
• How can the naturally inherited multi-modal knowledge from pre-trained CLIP be
extracted to identify sexism efectively?
Addressing the first research question, we proposed Prompt-CLIP for zero-shot sexism
identification. For the latter question, we employed CLIP to perform supervised sexism classification.
Inspired by the impressive performance of multi-view CLIP for sarcasm detection in a previous
study [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], we adopted multi-view CLIP for supervised sexism classification, namely, text-image
multi-view CLIP (TIMV-CLIP) and proposed text-image multi-modal models via CLIP-Guided
Learning (TI-CLIP) as a baseline.
      </p>
      <p>The paper is organized as follows. Details about the tasks participated in are described in
Section 2. Section 3 provides details about the proposed approaches. In Section 4, we provide
and discuss the results. Finally, we conclude in Section 5.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Tasks Addressed</title>
      <p>
        The sEXism Identification in Social neTworks (EXIST) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] lab at the Conference and Labs of the
Evaluation Forum (CLEF) 2024 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] aims to identify and characterize sexism using the learning
with disagreements paradigm [
        <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
        ]. This edition of the EXIST lab consists of sexism
characterization on microblog posts (tweets) and memes.
      </p>
      <sec id="sec-3-1">
        <title>2.1. Tasks 1–3: Sexism Characterization of Microblog Posts</title>
        <p>• Task 1: Addresses sexism identification in tweets as a binary classification, requiring the
system to classify whether a tweet is sexist (YES) or not (NO).
• Task 2: Focuses on determining the source intention in tweets as a multi-class
classification, requiring the system to classify the tweet’s intention as Direct, Reported, or
Judgemental.
• Task 3: Involves sexism categorization in tweets as a multi-label classification, requiring
the system to classify tweets into categories such as Ideological Inequality, Stereotyping
Dominance, Objectification , Sexual Violence, and Misogyny-Non-Sexual Violence.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Task 4: Sexism Identification of Memes</title>
        <p>While the above tasks address sexism identification in text, Task 4 deals with multi-modal input.
Task 4 aims to address sexism identification as a binary classification, requiring the systems to
classify whether a given meme is sexist or not.</p>
        <p>Task
Guidelines</p>
        <p>Prompt</p>
        <p>Prompt
Instantiation</p>
        <p>Pre-trained
Large Language</p>
        <p>Model (e.g.,</p>
        <p>GPT-4)
/</p>
        <p>Activity with / without manual intervention</p>
        <p>Artifact
LLM
Output</p>
        <p>Valid</p>
        <p>Run
Output
Validation</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Evaluation approaches</title>
        <p>
          • Soft-Soft Evaluation: For systems that produce probabilities for each category, soft-soft
evaluation is provided to compare the probabilities assigned by the systems with those
assigned by the set of human annotators. The oficial evaluation metric is ICM-soft [
          <xref ref-type="bibr" rid="ref23">24, 23</xref>
          ].
        </p>
        <p>Additionally, Cross Entropy is also reported.
• Hard-Hard Evaluation: Hard labels are derived from the diferent annotators’ labels
through a probabilistic threshold computed for each task. Hard-hard evaluation is
provided to evaluate systems that return Hard labels as output by comparing against a ground
truth that combines multiple annotations into one. The original ICM [25] and  1 score
are used as evaluation metrics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed Approaches</title>
      <sec id="sec-4-1">
        <title>3.1. Unsupervised In-Context Learning for Sexism Characterization in</title>
      </sec>
      <sec id="sec-4-2">
        <title>Microblog Posts</title>
        <p>Our goal was to examine the procedure of developing a functional solution with readily available
LLMs while minimizing the manual efort required from the practitioner. As shown in Figure 1,
the basic architecture involves giving the researcher a set of labeling or classification tasks
and asking the LLM to generate an accurate output. To ensure that the responses followed the
predefined criteria for each task, the outputs were systematically stored in a JSON format and
manually inspected for errors in the ”value” field. Responses such as ”YES”, ”YES”, or variations
with additional text or punctuation like ”Yes, the ... is sexist” required manual corrections
to conform to the expected format. Incidences of token limit rates that resulted in ”HTTP”
errors were addressed by re-running the task for the afected tweet using its unique ID. These
occurrences were uncommon, making manual correction a more eficient solution than an
automated task given the time constraint.</p>
        <p>The prompts used for runs submitted to Tasks 1, 2, and 3 were designed with multiple parts:
• Definition of the underlying concept being addressed in the task (e.g., sexism):
Sexism, prejudice or discrimination based on sex or gender, especially against women and
girls. Although its origin is unclear, the term sexism emerged from the “second-wave”
feminism of the 1960s through ‘80s and was most likely modeled on the civil rights
movement’s term racism (prejudice or discrimination based on race). Sexism can be a
belief that one sex is superior to or more valuable than another sex. It imposes limits
on what men and boys can and should do and what women and girls can and should
do. The concept of sexism was originally formulated to raise consciousness about the
oppression of girls and women, although by the early 21st century it had sometimes been
expanded to include the oppression of any sex, including men and boys, intersex people,
and transgender people.
• Instruction to Address Task and to Obtain Consistent Outputs: You are a robot
who detects sexism from text given in the prompt.
• Perspectivism:
– Level of Education: For each response, consider the perspective of individuals
representing the following study levels: [study_levels_annotators]
– Level of Education and Gender: For each response, consider the perspective of
individuals representing the following study levels: [study_levels_annotators] and
gender: [gender_annotators].
• Output Format:
– Task 1 (Soft): Give me 6 answers with NO or YES. Format: [NO], [YES]
– Task 1 (Hard): Give me 1 answer with [NO] or [YES]
– Task 2 (Soft) : Give me 6 answers with NO, DIRECT, REPORTED or
JUDGEMENTAL using commas for each answer. Example: [NO], [DIRECT], [REPORTED],
[JUDGEMENTAL], [JUDGEMENTAL], [NO]
– Task 2 (Hard): Give me 1 answer with NO, DIRECT, REPORTED or
JUDGEMENTAL using commas for each answer. Example: [NO], [DIRECT], [REPORTED],
[JUDGEMENTAL], [JUDGEMENTAL], [NO]
– Task 3 (Soft): Give me 6 answers with NO, IDEOLOGICAL-INEQUALITY,
STEREOTYPING-DOMINANCE, OBJECTIFICATION, SEXUAL-VIOLENCE, or
MISOGYNY-NON-SEXUAL-VIOLENCE using commas for each answer. Example:
[NO], [IDEOLOGICAL-INEQUALITY], [STEREOTYPING-DOMINANCE],
[OBJECTIFICATION], [SEXUAL-VIOLENCE], [MISOGYNY-NON-SEXUAL-VIOLENCE]
– Task 3 (Hard): Give me 1 answers with NO, IDEOLOGICAL-INEQUALITY,
STEREOTYPING-DOMINANCE, OBJECTIFICATION, SEXUAL-VIOLENCE, or
MISOGYNY-NON-SEXUAL-VIOLENCE using commas for each answer. Example:
[NO], [IDEOLOGICAL-INEQUALITY], [STEREOTYPING-DOMINANCE],
[OBJECTIFICATION], [SEXUAL-VIOLENCE], [MISOGYNY-NON-SEXUAL-VIOLENCE]
• Instance to Classify: #### [tweet] ####
An example of a prompt submitted and output obtained from gpt-4-turbo to classify instance
with id_EXIST 600090 using RMIT-IR_3 for Task 2 (Soft):
• Input: Sexism, prejudice or discrimination based on sex or gender, especially against
women and girls. Although its origin is unclear, the term sexism emerged from the
“second-wave” feminism of the 1960s through ‘80s and was most likely modeled on the
civil rights movement’s term racism (prejudice or discrimination based on race). Sexism
can be a belief that one sex is superior to or more valuable than another sex. It imposes
limits on what men and boys can and should do and what women and girls can and should
do. The concept of sexism was originally formulated to raise consciousness about the
oppression of girls and women, although by the early 21st century it had sometimes been
expanded to include the oppression of any sex, including men and boys, intersex people,
and transgender people. You are a robot who detects sexism from text given in the prompt.
For each response, consider the perspective of individuals representing the following
study levels: [“High school degree or equivalent”, “Bachelor’s degree”, “Bachelor’s degree”,
“Bachelor’s degree”, “Bachelor’s degree”, “High school degree or equivalent”]. Give me
6 answers with NO, DIRECT, REPORTED or JUDGEMENTAL using commas for each
answer. Example: [NO], [DIRECT], [REPORTED], [JUDGEMENTAL], [JUDGEMENTAL],
[NO].
#### Girls, don’t let anyone ever tell you, you’re not as good as a man #gender #girlpower
#equity ####
• Output: [NO],[NO],[NO],[NO],[NO],[NO]
To find the distribution of responses, we initially tried to use GPT to figure out the likelihood
percentage. Unfortunately, GPT only gave absolute values (either 100 or 0) or a consistent split
of 70/30 most of the time. We directed the model to generate six responses for each tweet, which
matched the number of annotators per tweet. For example, a set of responses like ”YES”, ”NO”,
”YES”, ”NO”, ”YES”, and ”NO” would result in a calculated distribution of 50%. Additionally,
in our final submissions, experimental runs two and three included prompts that provided
additional context, such as the annotators’ gender or educational backgrounds. This was done
to see if providing relevant background information would improve the LLM’s ability to predict
annotator responses. The formats for each task can be seen above along with an example
prompt used in RMIT-IR_3 for Task 2 (Soft).</p>
        <sec id="sec-4-2-1">
          <title>3.1.1. Runs Submitted to Tasks 1–3</title>
          <p>We used OpenAI’s API to submit prompts to the pre-trained model
gpt-4-turbo-2024-0409 [26]. For each tweet in the test set, we instantiated the prompt from above by appending
the textual content of the instance. We used the syntax #### [tweet] #### to provide explicit
delimiters to the model. For Soft tasks, we asked for six instances and then created a distribution
based on the frequency of the predicted labels.</p>
          <p>We experimented with multiple versions of prompt templates using the development set
supplied by the EXIST organizers (we did not use the training set). We found that the following
elements were especially efective in directing the model to concentrate on the specific task and
to ensure the responses were properly formatted (i.e., single-word answers and capitalized):
• Employing a role-playing technique of framing the task with the prompt “You are a robot
who detects sexism from text given in the prompt.”
• Giving explicit formatting instructions such as “Give me 6 answers with NO or YES. Format:
[NO], [YES]”.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>3.2. Multi-modal Contrastive Learning for Sexism Identification on Memes</title>
        <p>
          Inspired by the successful applications of CLIP [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for NLP [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and computer vision tasks [27],
[28], we adopted CLIP for the sexism identification task (Task 4). Unlike conventional methods
that rely heavily on labelled image-text pairs, CLIP is a cross-modality model pre-trained with
400M noisy image-text pairs collected from the internet to learn high-level semantic features.
CLIP consists of two encoders that embed texts and images into a uniform mathematical space.
Then, for the matched image-text pair, CLIP is encouraged to maximize the cosine similarity
between the embedding of the two modalities. Otherwise, the similarity is minimized for the
model to find the most suitable paired images and texts. Our motivation for using CLIP-based
learning for sexism identification is to capture cross-modal ambiguity by explicitly measuring
the correlation between texts and images of targeted memes and to guide the feature-fusing
and decision-making stages.
        </p>
        <p>We propose two supervised contrastive learning models based on CLIP: Text-Image
multimodal model via CLIP-guided learning (TI-CLIP) and Text-Image Multi-View multi-modal model
via CLIP-guided learning (TIMV-CLIP). The architecture of TIMV-CLIP is shown in Figure 2.
We also propose Prompt-CLIP to address zero-shot sexism classification and CLIP-based models
for supervised sexism classification.</p>
        <p>
          • TI-CLIP: The overall architecture of TI-CLIP consists of two feature encoding models
used to encode texts and images. These embeddings are then combined into a multi-modal
embedding before passing into a feedforward network for sexism classification.
• TIMV-CLIP: We adopted a novel multi-view CLIP framework (MV-CLIP) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] for sexism
identification, namely TIMV-CLIP (Figure 2). In addition to encoding image and text
as TI-CLIP, TIMV-CLIP further considers modelling relationships across text and image
DO I LOOK LIKE
YOUR BITCH
makeameme.org
WHEN YOU\'RE
AUNT HAS THIS
WEIRD ASS MAN
AND SHE SAYS
girls, imagine that
amazing world if
there were no men
        </p>
        <p>Image Encoder</p>
        <p>(CLIP)
Image Encoder
(CLIP)</p>
        <p>+
Text Encoder</p>
        <p>(CLIP)
Text Encoder
(CLIP)</p>
        <p>Transformer</p>
        <p>Encoder</p>
        <p>Key-less
Attention</p>
        <p>Linear</p>
        <p>Softmax
Linear</p>
        <p>Softmax
Linear</p>
        <p>Softmax</p>
        <p>
          modality using a transformer encoder, which aims to capture the interaction across
diferent modalities. Unlike MV-CLIP in a previous study [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], TIMV-CLIP employs BERT
Base Multilingual (mBERT) [29] to encode texts.
• Prompt-CLIP: Prompt-CLIP performs zero-shot sexism identification. Prompt-CLIP uses
a pre-trained CLIP model to create a custom classifier without training and considers
images as inputs. It further encodes pre-defined classes (sexism and not sexism) with more
description, known as prompts, into a learned latent space, and compares their similarity
to the image latent space. In this study, we used “an image contains no information about
sexism” and “an image contains information about sexism and against women” as prompts
for Prompt-CLIP. The pre-trained text encoder transforms the class names (e.g., prompts)
into a text embedding vector, while the pre-trained Image Encoder embeds the image.
• Model Training: We first randomly split the training subset into training (80%) and
validation (20%) for cross-validation purposes. We implemented TI-CLIP and TIMV-CLIP
based on the Hugging Face library [30] and adopted clip-vit-base-patch32 as the
backbone. Both TI-CLIP and TIMV-CLIP were trained directly with Soft labels. We
use Adam as an optimizer to optimize the parameters in both TI-CLIP and TIMV-CLIP
models. After several trials with other hyperparameters, we selected the parameters that
performed best on the validation set. Specifically, the batch size is 32. The learning rate
for CLIP is 1e-6 and for the other parts is 5e-4. Finally, we use the dropout percentage of
0.3 and train the models for 10 epochs.
        </p>
        <sec id="sec-4-3-1">
          <title>3.2.1. Runs Submitted to Task 4</title>
          <p>The proposed multi-modal sexism identification models mainly focus on Soft label predictions.
For the Hard submissions, hard labels are directly assigned by applying the max function, i.e.,
based on the highest probability score.</p>
          <p>Table 2 presents the submitted runs for Task 4, which can be summarized as follows:
• RMIT-IR_1: For the first submission, the trained TI-CLIP model was used to predict
whether given memes are sexist or not sexist.
• RMIT-IR_2: We used the trained TIMV-CLIP to generate the second submission.
• RMIT-IR_3: Prompt-CLIP was used to predict Soft and Hard labels for the third
submission.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results and Discussion</title>
      <p>4.1. Tasks 1–3
The performance of our proposed approaches for Tasks 1, 2, and 3 are presented in Tables 3,
4, and 5, respectively. As shown in Table 3, simplifying the architecture required for creating
a LLM that can identify sexism shows promise, as evidenced by the classification of English
and Spanish tweets. Although the current model achieved a 49% ICM-Soft score, which is 17%
lower than the best-performing run, this result indicates the potential to use prompting for
classification tasks.</p>
      <p>The cost of this process, which included testing with a development dataset and submitting
with a gold dataset, was close to $150 AUD. In particular, it did not require knowledge of cloud
computing, expensive hardware, or much energy. The average time taken to produce an output
was about 90 minutes.</p>
      <p>Looking at Table 3, Task 1, the use of the Task-specific Prompt yielded an ICM-Soft Norm score
of 49%, securing the 23rd position overall. It is interesting to note that the inclusion of additional
clues such as the annotator’s education level and gender did not bolster performance; instead,
it diminished the score. In particular, when looking at the Spanish test instances, the model
scored higher across all Spanish test instances compared to all English test instances, despite
being trained on an English-based GPT. This underscores the robust cross-lingual applicability
of the model, showcasing its proficient handling of Spanish data despite its primary training on
English.</p>
      <p>In Task 2, our analysis revealed challenges in multi-class classification as shown in Table 4. The
approach yielded a 13% ICM-Soft Norm score, indicating considerable dificulty in discerning the
intention of the tweets. The introduction of additional clues, such as the annotator’s education
level, generally led to a decline in performance. However, adding gender information resulted
in a slight improvement, elevating the score from almost 0% to 3%. The results indicated that
the approach performed more efectively in English without additional clues; however, its
performance diminished once clues were introduced. Conversely, our analysis demonstrated
that the GPT model exhibited greater eficacy with clues in Spanish, suggesting potential
advantages in providing contextual information in non-English scenarios.</p>
      <p>Task 3 was also challenging with multi-label classification. The initial ICM-Soft Norm score,
as shown in Table 5, stood at 11%. Following a similar trend to Task 2, the integration of clues
such as the level of education of the annotator, resulted in a reduction in performance to 8%.
Subsequently, when both education and gender clues were included, performance decreased
further to 4%. The English scores are quite similar to Task 2. Notably from the initial performance
on the Spanish dataset exceeded that of Task 2, but declined with the addition of educational
clues and further declined with the incorporation of both education and gender clues. This
observation underscores a consistent pattern of diminishing returns with the incorporation of
more specific annotator information.</p>
      <p>Our proposed approaches for the second and third runs involve implementing few-shot and
in-context learning. The experiments for these runs were conducted using gpt-4-turbo, and
future tests should include gpt-4o along with other pre-trained LLMs to determine their eficacy
in this context. Our experiment results show that involving few-shot and in-context learning
does not improve model performance on sexism identification in tweets (as shown in Tables
3–5). Although prompting requires less coding and understanding of LLMs, producing the
exact desired response 100% of the time was challenging. The prompts had to be carefully
designed to ensure that the GPT provided a single and consistent answer, especially when
dealing with distribution. Although pre-processing text for an LLM is more complex than
pre-processing answers from GPTs, ensuring the response is in the correct format is simpler.
Results of the proposed approaches for Task 4 (Soft and Hard).</p>
      <p>Rank (Soft) ICM-Soft ICM-Soft Norm Cross Entropy ICM-Hard ICM-Hard Norm F1 YES
Another unexpected aspect of this architecture was the ability to assist the GPT with hints. We
tested how adding biases by including the annotator’s education level and gender afected its
ability to classify or label tweets. This gave insights into how such biases can influence model
performance and classification accuracy.
4.2. Task 4
proaches, TIMV-CLIP performs best in all cases (English+Spanish, English, or Spanish test
instances) considering both Soft and Hard evaluation scenarios. This indicates the
importance of efectively utilizing deep interactions between texts and images of memes with CLIP.
Furthermore, TIMV-CLIP achieved the best-performance model (RMIT-IR_2) on English test
instances with an ICM-Soft Norm score of 0.4998, ranked first in the leaderboard considering
the Soft evaluation (English test instances). This observation confirms the advantages of CLIP
for text-image pair classification tasks. However, the performance of TIMV-CLIP has dropped
in Spanish test instances, which leads to lower performance in all test instances (Spanish test
instances). We believe using a translation component for Spanish text in memes could lead to
better overall performance.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>This paper proposed unsupervised in-context learning with of-the-shelf pre-trained LLMs to
address sexism characterization on microblog posts (Tasks 1, 2, and 3). Dealing with multi-modal
inputs, we proposed multi-modal contrastive learning, including Prompt-CLIP, TI-CLIP, and
TIMV-CLIP for sexism identification in memes (Task 4).</p>
      <p>The results of our experiment demonstrated the efectiveness of TIMV-CLIP under the
Learning with Disagreements regime, indicating the need to consider capturing sexism cues
from diferent perspectives, including image, text, and image-text interactions.</p>
      <p>Future work includes further experimentation with unsupervised In-Context Learning in
other tasks or meta-tasks such as MonsterCLEF [31], and the inclusion of machine translation
for multi-modal contrastive learning.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research has been carried out in the unceded lands of the Woi Wurrung and Boon Wurrung
peoples of the eastern Kulin Nation. We pay our respects to their Ancestors and Elders, past,
present, and emerging. This research is partially supported by the Australian Research Council
(ARC, project nr. DE200100064 and CE200100005).
2024 – Conference and Labs of the Evaluation Forum, 2024.
[24] L. Plaza, J. Carrillo-de-Albornoz, R. Morante, E. Amigó, J. Gonzalo, D. Spina, P. Rosso,
Overview of EXIST 2023 – Learning with Disagreement for Sexism Identification and
Characterization (Extended Overview), in: M. Aliannejadi, G. Faggioli, N. Ferro, M. Vlachos
(Eds.), Working Notes of CLEF 2023 – Conference and Labs of the Evaluation Forum, 2023,
pp. 813–854. URL: https://ceur-ws.org/Vol-3497/paper-070.pdf.
[25] E. Amigó, A. Delgado, Evaluating Extreme Hierarchical Multi-label Classification, in:
S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers), Association for
Computational Linguistics, Dublin, Ireland, 2022, pp. 5809–5819. doi:10.18653/v1/2022.
acl-long.399.
[26] OpenAI, https://help.openai.com/en/articles/8555510-gpt-4-turbo-in-the-openai-api, 2024.</p>
      <p>Accessed: 2024-07-04.
[27] J. Fu, S. Xu, H. Liu, Y. Liu, N. Xie, C.-C. Wang, J. Liu, Y. Sun, B. Wang, CMA-CLIP:
Cross-Modality Attention Clip for Text-Image Classification, in: 2022 IEEE International
Conference on Image Processing (ICIP), 2022, pp. 2846–2850. doi:10.1109/ICIP46576.
2022.9897323.
[28] M. V. Conde, K. Turgutlu, CLIP-Art: Contrastive Pre-training for Fine-Grained Art
Classification, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), 2021, pp. 3951–3955. doi:10.1109/CVPRW53098.2021.00444.
[29] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.),
Proceedings of the 2019 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short
Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp.
4171–4186. doi:10.18653/v1/N19-1423.
[30] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao,
S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-Art Natural Language
Processing, in: Q. Liu, D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, Association for
Computational Linguistics, Online, 2020, pp. 38–45. doi:10.18653/v1/2020.emnlp-demos.6.
[31] N. Ferro, J. Gonzalo, J. Karlgren, H. Müller, MonsterCLEF: One Lab to Rule Them All, 2024.</p>
      <p>URL: https://monsterclef.dei.unipd.it/, Accessed: 2024-07-04.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          ,
          <source>A Survey on In-context Learning</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2301</volume>
          .
          <fpage>00234</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Krueger</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Learning Transferable Visual Models From Natural Language Supervision</article-title>
          , in: M.
          <string-name>
            <surname>Meila</surname>
          </string-name>
          , T. Zhang (Eds.),
          <source>Proceedings of the 38th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2021</year>
          ,
          <volume>18</volume>
          -
          <issue>24</issue>
          <year>July 2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , volume
          <volume>139</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8748</fpage>
          -
          <lpage>8763</lpage>
          . URL: http://proceedings.mlr.press/v139/radford21a.html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Braghieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Makarin</surname>
          </string-name>
          ,
          <source>Social Media and Mental Health, American Economic Review</source>
          <volume>112</volume>
          (
          <year>2022</year>
          )
          <fpage>3660</fpage>
          -
          <lpage>3693</lpage>
          . doi:
          <volume>10</volume>
          .1257/aer.20211218.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kananovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , Young Adults'
          <article-title>Folk Theories of How Social Media Harms Its Users</article-title>
          ,
          <source>Mass Communication and Society</source>
          <volume>26</volume>
          (
          <year>2023</year>
          )
          <fpage>23</fpage>
          -
          <lpage>46</lpage>
          . doi:
          <volume>10</volume>
          .1080/ 15205436.
          <year>2021</year>
          .
          <volume>1970186</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. R. Stephanie</given-names>
            <surname>Wescott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>The Problem of Anti-feminist 'Manfluencer' Andrew Tate in Australian Schools: Women Teachers' Experiences of Resurgent Male Supremacy</article-title>
          ,
          <source>Gender and Education</source>
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>167</fpage>
          -
          <lpage>182</lpage>
          . doi:
          <volume>10</volume>
          .1080/09540253.
          <year>2023</year>
          .
          <volume>2292622</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] Cambridge Dictionary, Definition of sexism, https://dictionary.cambridge.org/dictionary/ english/sexism,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -07-04.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>McIntosh</surname>
          </string-name>
          ,
          <article-title>The State and the Oppression of Women, in: Feminism and Materialism (RLE Feminist Theory)</article-title>
          , Routledge,
          <year>2013</year>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Masequesmay</surname>
          </string-name>
          , Sexism | Definition, Types, Examples, &amp; Facts | Britannica,
          <year>2024</year>
          . URL: https://www.britannica.com/topic/sexism.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Glick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Fiske</surname>
          </string-name>
          ,
          <article-title>Ambivalent sexism</article-title>
          ,
          <source>in: Advances in Experimental Social Psychology</source>
          , volume
          <volume>33</volume>
          , Academic Press,
          <year>2001</year>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>188</lpage>
          . doi:
          <volume>10</volume>
          .1016/S0065-
          <volume>2601</volume>
          (
          <issue>01</issue>
          )
          <fpage>80005</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Blake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>M. O'Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. F.</given-names>
            <surname>Denson</surname>
          </string-name>
          ,
          <article-title>Misogynistic Tweets Correlate with Violence against Women</article-title>
          ,
          <source>Psychological science 32</source>
          (
          <year>2021</year>
          )
          <fpage>315</fpage>
          -
          <lpage>325</lpage>
          . doi:
          <volume>10</volume>
          .1177/ 0956797620968529.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Barker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Jurasz</surname>
          </string-name>
          , Online Misogyny,
          <source>Journal of International Afairs</source>
          <volume>72</volume>
          (
          <year>2019</year>
          )
          <fpage>95</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hashmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Yayilgan</surname>
          </string-name>
          <article-title>, Multi-class Hate Speech Detection in the Norwegian Language Using FAST-RNN and</article-title>
          <string-name>
            <surname>Multilingual Fine-Tuned</surname>
            <given-names>Transformers</given-names>
          </string-name>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>10</volume>
          (
          <year>2024</year>
          )
          <fpage>4535</fpage>
          -
          <lpage>4556</lpage>
          . doi:
          <volume>10</volume>
          .1007/s40747- 024- 01392- 5.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          , R. Valencia-García, UMUTeam at EXIST 2023:
          <article-title>Sexism Identification and Categorisation Fine-tuning Multilingual Large Language Models</article-title>
          , in: M.
          <string-name>
            <surname>Aliannejadi</surname>
            , G. Faggioli,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          , M. Vlachos (Eds.),
          <source>Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>985</fpage>
          -
          <lpage>999</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3497</volume>
          / paper-080.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abburi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chhaya</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Varma, Multi-task Learning Neural Framework for Categorizing Sexism</article-title>
          ,
          <source>Comput. Speech Lang</source>
          .
          <volume>83</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1016/j.csl.
          <year>2023</year>
          .
          <volume>101535</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <source>Overview of EXIST</source>
          <year>2023</year>
          :
          <article-title>sEXism Identification in Social neTworks</article-title>
          ,
          <source>in: Proceedings of ECIR'23</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>593</fpage>
          -
          <lpage>599</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 28241- 6_
          <fpage>68</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16] Cambridge Dictionary, Definition of meme, https://dictionary.cambridge.org/dictionary/ english/meme,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -07-04.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Che,
          <article-title>CLIPText: A New Paradigm for Zero-shot Text Classification</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>1077</fpage>
          -
          <lpage>1088</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings- acl.69.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Che</surname>
          </string-name>
          , R. Xu,
          <year>MMSD2</year>
          .
          <article-title>0: Towards a Reliable Multi-modal Sarcasm Detection System</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>10834</fpage>
          -
          <lpage>10845</lpage>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2023</year>
          .findings- acl.689.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , E. Amigó,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maeso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          , EXIST: sEXism Identification in Social neTworks, http: //nlp.uned.es/exist2024/,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -07-04.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <article-title>Conference and Labs of the Evaluation Forum (CLEF)</article-title>
          , https://www.clef-initiative.eu/,
          <year>2024</year>
          . Accessed:
          <fpage>2024</fpage>
          -07-04.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , E. Amigó,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maeso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <surname>EXIST</surname>
          </string-name>
          <year>2024</year>
          :
          <article-title>sEXism Identification in Social neTworks and Memes</article-title>
          ,
          <source>in: Advances in Information Retrieval: 46th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2024</year>
          , Glasgow, UK, March
          <volume>24</volume>
          -28,
          <year>2024</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>V</given-names>
          </string-name>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2024</year>
          , p.
          <fpage>498</fpage>
          -
          <lpage>504</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 56069- 9_
          <fpage>68</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de-Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maeso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , Overview of EXIST 2024 -
          <article-title>Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de-Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maeso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          , Overview of EXIST 2024 -
          <article-title>Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes (Extended Overview)</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.), Working Notes of CLEF
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>