<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">PINK at EXIST2024: A Cross-Lingual and Multi-Modal Transformer Approach for Sexism Detection in Memes Notebook for the EXIST Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giulia</forename><surname>Rizzi</surname></persName>
							<email>g.rizzi10@campus.unimib.it</email>
							<affiliation key="aff0">
								<orgName type="department">DISCo</orgName>
								<orgName type="institution">Università degli Studi di Milano-Bicocca</orgName>
								<address>
									<addrLine>Viale Sarca</addrLine>
									<postCode>336, 20126</postCode>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">PRHLT</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<addrLine>Camino de Vera, s/n</addrLine>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">David</forename><surname>Gimeno-Gómez</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">PRHLT</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<addrLine>Camino de Vera, s/n</addrLine>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Elisabetta</forename><surname>Fersini</surname></persName>
							<email>elisabetta.fersini@unimib.it</email>
							<affiliation key="aff0">
								<orgName type="department">DISCo</orgName>
								<orgName type="institution">Università degli Studi di Milano-Bicocca</orgName>
								<address>
									<addrLine>Viale Sarca</addrLine>
									<postCode>336, 20126</postCode>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos-D</forename><surname>Martínez-Hinarejos</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">PRHLT</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<addrLine>Camino de Vera, s/n</addrLine>
									<postCode>46022</postCode>
									<settlement>Valencia</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">PINK at EXIST2024: A Cross-Lingual and Multi-Modal Transformer Approach for Sexism Detection in Memes Notebook for the EXIST Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">35BABF82C6378692F6790070A9FF043E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sexism Characterization</term>
					<term>Learning with Disagreements</term>
					<term>Perspectivism</term>
					<term>Memes</term>
					<term>Ensemble</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Warning: This paper contains examples of language and images which may be offensive. With the increasing influence of social media platforms, new forms of expression have gained popularity, encouraged by their immediate communication and sharing capabilities. Unfortunately, this accessibility has also enabled the dissemination of hateful messages, including those rooted in historical prejudices like misogyny, often manifesting in memes. The development of automated systems capable of detecting instances of sexism and other hateful expressions in this context poses significant challenges due to the multimodal nature of memes, the presence of irony, diverse categories of hate, and varied author intentions, particularly within the learning with disagreements regime. This paper presents the PINK team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2024. Focused on Task 4, which addresses sexism identification and characterization in memes under the learning with disagreements paradigm, we proposed a unified, multi-modal Transformerbased architecture capable of dealing with multiple languages, namely English and Spanish. Our approach reached the 10th and 20th places in the final ranking for soft-and hard-label evaluations, respectively. This has been possible thanks to the use of well-established, state-of-the-art multilingual models, such as mBERT and CLIP, for feature extraction, as well as comprehensive ablation studies and the design of various model ensemble strategies. The source code of our approaches is publicly available at https://github.com/giulia95/PINK-at-EXIST2024/.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Sexism in online content presents a significant challenge, considering the ease of sharing on social media and the various forms in which abusive messages can be represented (text, image, video, meme, ...). Moreover, the subjectivity of the tasks, typical for hate-related tasks, requires considering different interpretations and perspectives of the conveyed message that might be influenced by cultural differences and beliefs <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Traditional sexism detection systems usually rely on predefined labels derived from a fixed definition of sexism that only represents a single perspective and, therefore, are not able to capture the complexity and the subjectivity of the task. Although detecting sexism is crucial, it becomes even more challenging due to its subjectivity and the various forms in which such messages can be expressed, such as memes. In memes, in fact, a hateful message might be represented by the image, the text, or via their combination. Moreover, due to the presence of irony, the conveyed message might appear harmless at first glance <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>An important contribution in this field that focuses on the problem of sexism identification under the paradigm of learning with disagreements in memes is represented by Task 4 at EXIST 2024: sEXism Identification in Social neTworks <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>. In this paper, we address the task by proposing a unified, multi-modal Transformer-based architecture capable of dealing with multiple languages (English and Spanish). The proposed approach exploits well-established, state-of-the-art multilingual models (i.e., mBERT and CLIP) and ensemble strategies.</p><p>The paper is organized as follows. An overview of the state of the art is provided in Section 2 focusing on Sexism Detection. The proposed method is described in Section 3, including both details about the shared task and dataset, and the description of the architecture of the proposed model. In Section 4, the results achieved by the proposed approaches are reported. Finally, conclusions and future research directions are summarized in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Social media platforms provide a fertile environment for users to share their opinions and ideologies through various forms of expression, including text, memes, and videos. Often encouraged by anonymity, elements that convey hateful messages are just as easily represented and disseminated. As a consequence, hateful messages, also linked to historical aversions (e.g., hatred towards women), have found new ways of expression, for example, in memes <ref type="bibr" target="#b8">[9]</ref>. Likewise, the interest of researchers have as a consequence, researchers' interest has expanded to consider such representative forms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Hateful content identification.</head><p>The research field dedicated to the identification of hateful content towards women is articulated into the identification of misogynistic or sexist content, also considering the different ways in which this type of hate can be expressed (e.g., by means of stereotypes, objectification, etc.). In particular, the majority of work in sexism/misogyny detection focuses on text (mostly Tweets) <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>, and only in recent years it has expanded to include multimodal content, for instance, memes <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>. A first insight to counter sexist memes was proposed in <ref type="bibr" target="#b13">[14]</ref>, in which both unimodal and multimodal approaches are investigated to evaluate the contribution of textual and visual modality in sexism identification. Similarly, in <ref type="bibr" target="#b14">[15]</ref>, the authors also evaluate the information content introduced by both modalities, identifying the visual component as more informative.</p><p>As a consequence of the growing attention of researchers in the sector, challenges and benchmark datasets dedicated to this research area have been proposed. An initial dataset, proposed in <ref type="bibr" target="#b16">[17]</ref>, is composed of 800 memes gathered from the most popular social media platforms, labeled both by domain experts and by annotators from a crowdsourcing platform. A similar benchmark has been realized for the MAMI shared task at SemEval 2022 <ref type="bibr" target="#b17">[18]</ref>. This benchmark is composed of 11k memes divided into train and test, allowing the investigation of two different tasks: (i) the identification of misogynistic memes, and (ii) the recognition of the misogynistic type among Shaming, Stereotype, Objectification, and Violence. The majority of the works in this context exploited pre-trained models-based approaches <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b15">16]</ref> and/or investigated ensemble strategies <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Learning with Disagreements.</head><p>For what concerns the learning with disagreements paradigm, up to our knowledge, EXIST 2024 <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> represents the first insight for multimodal sexism detection in memes, including perspectivism. Previous works under the learning with disagreement paradigm consider only textual expression. In this area, the main contribution is represented by <ref type="bibr" target="#b26">[27]</ref>, and by the previous edition of EXIST <ref type="bibr" target="#b27">[28]</ref>. In particular, in the Learning With Disagreements (LeWiDi) challenge <ref type="bibr" target="#b26">[27]</ref>, four different datasets with different characteristics in terms of types, languages, goals, and annotation methods are proposed, including, for each instance, both hard labels (an aggregated hateful/non-hateful label) and soft labels (representing agreement among annotators). Similarly, the previous edition of EXIST <ref type="bibr" target="#b27">[28]</ref> aimed at capturing sexism in all its forms while considering the perspective of the learning with disagreements paradigm. The challenge, articulated in different tasks, addressed sexism identification at different granularities and perspectives. The approach proposed by the participants of such challenges mostly relied on fine-tuned pre-trained models and/or ensemble methods <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b27">28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Task Description</head><p>The sEXism Identification in Social neTworks task at CLEF 2024 <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> aims at addressing the problem of sexism identification at a broad spectrum of sexism manifestation in social networks. Unlike previous editions of the same challenge <ref type="bibr" target="#b27">[28]</ref>, which focused solely on detecting and classifying sexist textual messages, the current edition also incorporates new tasks that address the same problems in a different form of representation: memes. As for the previous edition, the challenge embraces the Learning with Disagreements paradigm. For both tweets and memes, three different tasks are proposed that address the problem of sexism identification at different granularities, addressing (i) the identification of sexist messages, (ii) the author's intentions, and (iii) sexism categorization. This paper only focuses on Task 4, a binary classification task that consists of deciding whether or not a given meme is sexist.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Dataset</head><p>The meme dataset proposed for EXIST 2024 is composed of more than 3000 memes per language (considering English and Spanish). Memes that compose the dataset were collected via search query on Google Images exploiting 250 terms with varying degrees of use in both sexist and non-sexist contexts, all centered around women. Examples of memes are reported in Figure <ref type="figure" target="#fig_0">1</ref>. The challenge also approaches the sexism identification task from the perspective of the Learning with Disagreements paradigm. For each meme, it provides labels and personal meta-data information (e.g. age, gender, etc.) gathered from six annotators, introducing two different evaluation settings:</p><p>• Hard Evaluation: Systems performances are evaluated considering a hard label derived from the majority class among the different annotators' labels. • Soft Evaluation: Systems performances are evaluated through a soft-soft evaluation that considers the probability distribution derived from the set of human annotators.</p><p>Unlike tasks based on Tweets, automatic sexism detection in memes, such as Task 4 addresses, did not come with a predefined validation dataset. Therefore, to conduct our hyperparameter optimization experiments, we partitioned the official training split to create a validation set, providing a basis for our initial research steps. The official training dataset comprises 4044 samples, evenly balanced in terms of sexism and non-sexism instances across both languages. From this dataset, we selected 20% of the samples for our validation set, using the remaining data for training our models. Once we identified the best-performing settings, we trained our models using the entire official training set for submission. The results discussed in Section 4 are based on the performance of these models on the official test set of the challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Model Architecture</head><p>This section describes the modules that compose the cross-lingual, multi-modal, Transformer-based model proposed in this paper, whose overall architecture is depicted in Figure <ref type="figure" target="#fig_1">2</ref>. Input Data. As described above, the dataset provides both the raw image of the meme and its corresponding text. However, this text does not have to be directly and clearly related to the content represented in the image; in many cases, this relationship depends on subtle and complex social and contextual details. Therefore, to further inform the model, we also incorporated a caption of the image content as an additional input. We used the state-of-the-art BLIP<ref type="foot" target="#foot_0">1</ref>  <ref type="bibr" target="#b30">[31]</ref> model to automatically generate these captions. Consequently, our model processes three types of input data: the raw image, the OCR-based text, and the image caption.</p><p>Feature Extraction Frontend. We considered pre-trained, state-of-the-art models for feature extraction, namely the multilingual mBERT<ref type="foot" target="#foot_1">2</ref>  <ref type="bibr" target="#b31">[32]</ref> and mCLIP<ref type="foot" target="#foot_2">3</ref>  <ref type="bibr" target="#b32">[33]</ref>. These or similar models have been widely studied in the context of sexism detection <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b15">16]</ref> due to the multi-modality nature of memes, where the image and text are closely related or complementary, thus supporting our proposed approach. mBERT was used to extract feature embeddings for the OCR-based text and image-content captions separately, while all three types of input data, including the raw image, were processed by mCLIP to obtain their corresponding latent feature representations. As a result, we extracted five different input modalities. Note that mBERT and mCLIP remained frozen during the training process, so they were not adapted to the task. The rationale behind this decision was not only to lower the computational costs of our proposed approach, but also to investigate how robust this general, task-agnostic, pre-trained representations are in the context of sexism detection without fine-tuning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Random Modality Masking.</head><p>We also employed a modality masking strategy. By randomly masking one of the five input modalities, the model is forced to extract relevant information from all modalities, thus preventing complete reliance on any single modality. This approach enhances the model's robust-ness against scenarios where a modality might be absent. The effectiveness of this strategy is supported by our experimental results. Note that not always one of the input modalities has to be masked.</p><p>Transformer Encoder Backbone. The extracted feature representations are first processed with 1D batch normalization layer followed by a linear layer to project all the feature embeddings to a 256-dimensional space. Furthermore, these projected features are conditioned via learnable embeddings to enable the model to differentiate between the five input modalities and their corresponding languages. This conditioning strategy, inspired by numerous works in the field of natural language processing <ref type="bibr" target="#b31">[32]</ref>, allows the model to effectively handle multiple languages and modalities simultaneously into a unified framework. Specifically, all these input modalities are concatenated and then processed by an encoder backbone based on the Transformer architecture <ref type="bibr" target="#b33">[34]</ref>, where the model performs what can be considered as a soft cross-attention between modalities. Finally, the final classification is predicted through an average pooling of the encoder output sequence, followed by a linear layer projection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Implementation Details</head><p>All our models were developed using the open-source PyTorch library, as well as the corresponding HuggingFace pre-trained models for feature extraction. Experiments were conducted on a GeForce RTX 2080 GPU with 8GM memory.</p><p>Model Architecture. Hyper-parameter optimization was conducted to determine the best-performing model architecture for both hard-and soft-label evaluation approaches. This process involved exploring a range of learning rates, numbers of encoder layers, attention heads, and feed-forward layer sizes. For the hard-label approach, the optimal architecture was defined as a 1-layer encoder with a self-attention module of 8 attention heads and a modality masking probability of 0.8. For the soft-label approach, we found that a 4-layer encoder with 4 attention heads and a modality masking probability of 0.7 yielded optimal results. In both cases, the latent dimension of the Transformer encoder was set to 256, while the inner hidden representation of its feed-forward modules was laid on a 2048-dimensional space.</p><p>Training Settings. In all our experiments, we used the AdamW optimizer <ref type="bibr" target="#b34">[35]</ref> and a one-cycle linear scheduler with a batch size of 16 samples. Based on our prior experiments, we determined the optimum settings for each task. For hard-label evaluation, the model was trained for 5 epochs with a learning rate of 0.0002. For soft-label evaluation, the best results were achieved by training the model for 7 epochs with a learning rate of 0.0005. In both cases, the dropout rate was set to 0.1 and we employed the cross-entropy loss as the objective function.</p><p>Evaluation Metrics. All the reported metrics were computed using the official PyEvALL library <ref type="foot" target="#foot_3">4</ref> . Depending on the type of labels considered for the addressed task, different metrics were used to evaluate our models. According to the challenge instructions, we employed the F 1 -Score, ICM, and ICM-Norm metrics for hard-label evaluation, and the cross entropy, ICM soft , and ICM soft -Norm metrics for soft-label evaluation.</p><p>Soft-Label Evaluation Postprocessing. As described in Subsection 3.2, each sample of the dataset was annotated by six annotators. Consequently, the ground truth probabilities can take one of seven possible values. Therefore, in soft-label settings, the raw logits predicted by our models were rounded to match one of the actual possible output values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Proposed Approaches</head><p>In this work, we proposed three different approaches to address the automatic detection of sexism in memes both for the hard-and soft-label evaluation scenarios. Similar to previous studies <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b23">24,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b25">26]</ref>, we explored not only single-model approaches, but also adopted two model ensemble strategies to provide a more stable and robust solution:</p><p>• Single Model (SM). This approach relies on a single model to obtain the final predictions. For each evaluation setting, we consider the corresponding best-performing model architecture determined through the hyperparameter optimization process described above.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Majority Voting Ensemble (MVE).</head><p>To provide a more stable and robust approach that does not rely solely on one specific training procedure, we adopted a model ensemble strategy. We trained five models using the best-performing architecture, each initialized with a different random seed. The final predictions were then aggregated using a majority voting ensemble. For soft-label evaluation settings, we converted the continuous soft-label predictions into discrete classes by selecting the class with the highest predicted value. • Average Probability Ensemble (APE). Similarly, this model ensemble strategy involves training five models with different seeds based on the best-performing architecture. However, instead of working with discrete classes, we consider the raw logits predicted by the models and aggregate them via a non-weighted average. After combining these probabilities, we obtain the discrete classes by selecting the class with the highest predicted value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results &amp; Discussion</head><p>Preliminary hyperparameter optimization experiments were carried out using the validation split described in Subsection 3.2. Once we determined the best-performing settings for each one of our three proposed approaches, we used the entire training set for final model estimation and submission in both hard-and soft-label evaluation scenarios. Tables <ref type="table">1, 2</ref>, and 3 compare our proposed approaches to the challenge baselines on the official test set in the context of the EXIST2024 Task 4. These challenge baselines are described as follows:</p><p>• Gold. Given that the ICM measure is unbounded, this baseline provides the best possible reference by using an oracle that perfectly predicts the ground truth. • Majority Class. A non-informative baseline that classifies all instances according to the majority class based on the six annotators. • Minority Class. A non-informative baseline that classifies all instances according to the minority class based on the six annotators.</p><p>Note that the ICM-Norm, both for hard-and soft-label scenarios, is considered the official evaluation metric of the challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Overall Results Analysis Table 1</head><p>Overall analysis of our proposed approaches for EXIST2024 Task 4 for both English and Spanish instances. Results reported for the test set, using variants of ICM (↑), F 1 -score (↑), and cross entropy (↓) as evaluation metrics. Best results among our approaches are highlighted in bold for each evaluation metric. Table <ref type="table">1</ref> provides a comparison of the overall performance of our submitted approaches on the official test set, considering both English and Spanish instances. Our best-performing approaches achieved 10th place for soft-label evaluation settings and 20th place for hard-label evaluation settings. In general terms, the model ensemble strategy based on majority voting (MVE) stands as the most robust approach. However, the performance of the single model (SM) does not substantially differ, suggesting that this simpler approach might be more appealing for deployment as it does not require considering multiple model decisions and their associated time cost. Furthermore, the SM approach was the one providing the best results for hard-label settings in terms of ICM-Norm. This reflects the greater complexity of the soft-label prediction task and its reliance on model ensembles to offer a more accurate outcome.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Language-Specific Results Analysis</head><p>Tables <ref type="table">2 and 3</ref> provide a comparison of the overall performance of our submitted approaches on the official test set for the English and Spanish instances, respectively. For Spanish, our best-performing approaches achieved 5th place for soft-label evaluation settings and 13th place for hard-label evaluation settings. For Spanish, we achieved 11th and 21st places for soft-and hard-label settings, respectively. One of the most notable findings is not only the performance gap between both languages, but also the fact that the best-performing approach varies depending on whether we are dealing with English or Spanish memes. While English mostly benefits from model ensembles in all cases, the Spanish language shows a substantial improvement when addressing the hard-label scenario with a single-model approach. For soft-label settings, both languages reflect the same trend toward reliance on ensembles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Overall analysis of our proposed approaches for EXIST2024 Task 4 for the English language. Results reported for the test set, using variants of ICM (↑), F 1 -score (↑), and cross entropy (↓) as evaluation metrics. Best results among our approaches are highlighted in bold for each evaluation metric. One possible explanation for the performance gap between English and Spanish could be attributed to the discrepancy in data availability used to estimate our pre-trained, state-of-the-art feature extractors. Despite these models were designed to be multi-lingual, they may have been better optimized with English data due to its abundance. Consequently, when applied to Spanish memes, these models may not generalize as effectively, resulting in lower performance. This discrepancy highlights the need for further research to address the challenges posed by multiple languages in automatic sexism detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Overall analysis of our proposed approaches for EXIST2024 Task 4 for the Spanish language. Results reported for the test set, using variants of ICM (↑), F 1 -score (↑), and cross entropy (↓) as evaluation metrics. Best results among our approaches are highlighted in bold for each evaluation metric. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In this work, we proposed a cross-lingual and multi-modal Transformer-based approach for sexism identification under the paradigm of learning with disagreements. Although achieved results did not show a significant improvement in performances by the introduction of ensemble methods, more complex aggregation strategies might be investigated for future work to aggregate models with different input configurations. Additionally, more sophisticated ensemble strategies can be explored, such as Bayesian Model Averaging (BMA) <ref type="bibr" target="#b35">[36]</ref>. Finally, the discrepancy in performance between different languages highlights the need for further research to effectively handle multiple languages in the context of automatic sexism detection.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of memes in the EXIST meme dataset, showcasing the inherent challenges of the task. En and Sp refer to English and Spanish, respectively.</figDesc><graphic coords="3,292.34,340.13,108.31,99.21" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Our proposed Cross-Lingual, Multi-Modal Transformer-based architecture extracts high-level features using large-scale, pre-trained models that are kept frozen during training. These features are then normalized and projected into the same dimensional space. They are then conditioned based on the language of the sample and its modality before being processed by a Transformer encoder backbone. The final classification is predicted through average pooling and a linear projection.</figDesc><graphic coords="4,72.00,125.21,451.26,212.30" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/Salesforce/blip-image-captioning-large</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://huggingface.co/google-bert/bert-base-multilingual-cased</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://github.com/UNEDLENAR/PyEvALL</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We acknowledge the support of the PNRR ICSC National Research Centre for High Performance Computing, Big Data and Quantum Computing (CN00000013), under the NRRP MUR program funded by the NextGenerationEU. The work of D. Gimeno-Gómez and C.-D. Martínez Hinarejos was partially supported by Grant CIACIF/2021/295 funded by Generalitat Valenciana and by Grant PID2021-124719OB-I00 under project LLEER funded by MCIN/AEI/10.13039/501100011033/ and by ERDF, EU A way of making Europe.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>(Carlos-D. Martínez-Hinarejos) https://www.unimib.it/giulia-rizzi (G. Rizzi); https://www.prhlt.upv.es/david-gimeno/ (D. Gimeno-Gómez); https://www.unimib.it/elisabetta-fersini (E. Fersini); https://personales.upv.es/carmarhi/ (Carlos-D. Martínez-Hinarejos) 0000-0002-0619-0760 (G. Rizzi); 0000-0002-7375-9515 (D. Gimeno-Gómez); 0000-0002-8987-100X (E. Fersini); 0000-0002-6139-2891 (Carlos-D. Martínez-Hinarejos</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The origin and value of disagreement among data labelers: A case study of individual differences in hate speech annotation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Sang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Stanton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information for a Better World: Shaping the Global Future: 17th International Conference, iConference 2022, Virtual Event</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022-03-04">February 28-March 4, 2022. 2022</date>
			<biblScope unit="page" from="425" to="444" />
		</imprint>
	</monogr>
	<note>Proceedings, Part I</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Why don&apos;t you do it right? analysing annotators&apos; disagreement in subjective tasks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sandri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Leonardelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tonelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jezek</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.eacl-main.178</idno>
		<ptr target="https://aclanthology.org/2023.eacl-main.178.doi:10.18653/v1/2023.eacl-main.178" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Vlachos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Augenstein</surname></persName>
		</editor>
		<meeting>the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics<address><addrLine>Dubrovnik, Croatia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="2428" to="2441" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Social consequences of disparagement humor: A prejudiced norm theory</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Ford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Ferguson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Personality and social psychology review</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="79" to="94" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">More than &quot;just a joke&quot;: The prejudice-releasing function of sexist humor</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Ford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F</forename><surname>Boxer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Armstrong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Edel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Personality and Social Psychology Bulletin</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="159" to="170" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Social or individual disagreement? perspectivism in the annotation of sexist jokes</title>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fontanella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Labadie-Tamayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3494</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Everybody hurts, sometimes overview of hurtful humour at iberlef 2023: Detection of humour spreading prejudice in twitter</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">Labadie</forename><surname>Tamayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Chulvi-Ferriols</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del lenguaje natural</title>
		<imprint>
			<biblScope unit="page" from="383" to="395" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes (Extended Overview)</title>
		<author>
			<persName><forename type="first">Laura</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jorge</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><surname>Víctor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alba</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Berta</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gonzalo</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Julio</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><surname>Roser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Damiano</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">How do we study misogyny in the digital age? a systematic literature review using a computational linguistic approach</title>
		<author>
			<persName><forename type="first">L</forename><surname>Fontanella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ignazzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sarra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tontodimamma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Humanities and Social Sciences Communications</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1" to="15" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Automatic identification and classification of misogynistic language on twitter</title>
		<author>
			<persName><forename type="first">M</forename><surname>Anzovino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Applications of Natural Language to Information Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="57" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Bashar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nayak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Suzor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Information Systems</title>
		<imprint>
			<biblScope unit="volume">62</biblScope>
			<biblScope unit="page" from="4029" to="4054" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Transfer learning from multilingual deberta for sexism identification</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">T</forename><surname>Ta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B S</forename><surname>Rahman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Najjar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">3202</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Enhancing the detection of misogynistic content in social media by transferring knowledge from song phrases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Calderón-Suarez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Ortega-Mendoza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Toxqui-Quitl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Márquez-Vera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="13179" to="13190" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Detecting sexist MEME on the Web: A study on textual and visual cues</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gasparini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Corchs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="226" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">O</forename><surname>Sabat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Ferrer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">G</forename><surname>Nieto</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.02334</idno>
		<title level="m">Hate speech in pixels: Detection of offensive memes towards automatic moderation</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Recognizing misogynous memes: Biased models and tricky archetypes</title>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gasparini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saibene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page">103474</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content</title>
		<author>
			<persName><forename type="first">F</forename><surname>Gasparini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saibene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data in brief</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page">108526</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">SemEval-2022 task 5: Multimedia automatic misogyny identification</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gasparini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saibene</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lees</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sorensen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), Association for Computational Linguistics</title>
				<meeting>the 16th International Workshop on Semantic Evaluation (SemEval-2022), Association for Computational Linguistics<address><addrLine>Seattle, United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="533" to="549" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">DD-TIG at semeval-2022 task 5: Investigating the relationships between multimodal and unimodal information in misogynous memes detection and classification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 16th International Workshop on Semantic Evaluation</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="563" to="570" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">RIT boston at SemEval-2022 task 5: Multimedia misogyny detection by using coherent visual and language features from CLIP model and data-centric AI principle</title>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 16th International Workshop on Semantic Evaluation</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="636" to="641" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">TIB-VA at SemEval-2022 task 5: A multimodal architecture for the detection and classification of misogynous memes</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hakimov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Cheema</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ewerth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 16th International Workshop on Semantic Evaluation</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="756" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">PAIC at SemEval-2022 Task 5: Multi-Modal Misogynous Detection in MEMES with Multi-Task Learning And Multi-model Fusion</title>
		<author>
			<persName><forename type="first">Jin</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhi</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Zhou</forename><surname>Mengyuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mengfei</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dou</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiyang</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lianxin</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yang</forename><surname>Mo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaofeng</forename><surname>Shi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 16th International Workshop on Semantic Evaluation</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="555" to="562" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Bias mitigation in misogynous meme recognition: A preliminary study</title>
		<author>
			<persName><forename type="first">G</forename><surname>Balducci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3596</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Rubcsg at semeval-2022 task 5: Ensemble learning for identifying misogynous memes</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Boenninghoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Röhrig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kolossa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Workshop on Semantic Evaluation</title>
				<meeting>the 16th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="626" to="635" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">taochen at semeval-2022 task 5: Multimodal multitask learning and ensemble learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Kim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Workshop on Semantic Evaluation</title>
				<meeting>the 16th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="648" to="653" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">TechSSN at SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification using Deep Learning Models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sivanaiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Deborah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Rajendram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">T</forename><surname>Mirnalinee</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.semeval-1.78</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th International Workshop on Semantic Evaluation</title>
				<meeting>the 16th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="571" to="574" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Semeval-2023 task 11: Learning with disagreements (lewidi)</title>
		<author>
			<persName><forename type="first">E</forename><surname>Leonardelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Abercrombie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Almanea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Fornaciari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Rieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Uma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Poesio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th International Workshop on Semantic Evaluation</title>
				<meeting>the 17th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="2304" to="2318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2023-learning with disagreement for sexism identification and characterization</title>
		<author>
			<persName><forename type="first">Laura</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jorge</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><surname>Roser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gonzalo</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Julio</forename><surname>Spina</surname></persName>
		</author>
		<author>
			<persName><surname>Damiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="316" to="342" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Umuteam at SemEval-2023 task 11: Ensemble learning applied to binary supervised classifiers with disagreements</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>García-Díaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Alcaráz-Mármol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Marín-Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Valencia-García</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th International Workshop on Semantic Evaluation</title>
				<meeting>the 17th International Workshop on Semantic Evaluation<address><addrLine>SemEval-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="1061" to="1066" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">AI-UPV at EXIST 2023-Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime</title>
		<author>
			<persName><forename type="first">A</forename><surname>De Paula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">3497</biblScope>
			<biblScope unit="page" from="985" to="999" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Blip: Bootstrapping language-image pre-training for unified visionlanguage understanding and generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hoi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="12888" to="12900" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Learning transferable visual models from natural language supervision</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hallacy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v139/radford21a.html" />
	</analytic>
	<monogr>
		<title level="m">Proc. of the 38th International Conference on Machine Learning (ICML)</title>
				<meeting>of the 38th International Conference on Machine Learning (ICML)<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">139</biblScope>
			<biblScope unit="page" from="8748" to="8763" />
		</imprint>
	</monogr>
	<note>Proc. of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="j">NeurIPS</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="6000" to="6010" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Decoupled Weight Decay Regularization</title>
		<author>
			<persName><forename type="first">I</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ICLR</title>
				<meeting>of ICLR</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Sentiment analysis: Bayesian Ensemble Learning</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Messina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Pozzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Decision Support Systems</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<biblScope unit="page" from="26" to="38" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
