<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multilingual Sexism Detection in Memes, A CLIP -Enhanced Machine Learning Approach</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Umera</forename><forename type="middle">Wajeed</forename><surname>Pasha</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">TK33</orgName>
								<orgName type="institution">University of Galway</orgName>
								<address>
									<addrLine>University Road</addrLine>
									<postCode>H91</postCode>
									<settlement>Galway</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multilingual Sexism Detection in Memes, A CLIP -Enhanced Machine Learning Approach</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">EE03BD5FCF10174E2E0DA5EF86605F27</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sexism detection</term>
					<term>Meme Analysis</term>
					<term>Machine Learning (ML)</term>
					<term>Contrastive Learning</term>
					<term>Learning with disagreement</term>
					<term>Multilingual Natural Language Processing (NLP)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this work, we use cutting-edge machine learning approaches to tackle the problem of sexism identification in memes. The study starts by importing and visualising a meme dataset, then pre-processing the images using techniques including cropping, scaling, and normalisation to get them ready for model training. A pre-trained model called CLIP is used to extract features, and the dataset is split into training and validation sets for memes in both Spanish and English. The collected features are used to train and assess a variety of machine learning models, such as Logistic Regression, SVM, XGBoost, Decision Trees, Random Forest, Neural Network, AdaBoost, and SGD. Accuracy scores, classification reports, and confusion matrices are used to evaluate performance. The Random Forest model performed the best out of all of them. After that, a JSON file containing the model's predictions about the occurrence of sexism in a test dataset is created. The results highlight how well-trained models and sophisticated machine learning approaches can identify hazardous content on social media, offering insightful information for future studies and useful applications that will help create safer online spaces.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Social networks have developed into an essential communication tool in the current digital era, enabling people to openly express their ideas and opinions but this transparency has also resulted in the spread of offensive material, such as sexism-a gender-based discrimination that mostly targets women. As sexism on social media is so widespread, automated solutions must be developed to identify and remove such offensive content. In order to solve this problem, the EXIST 2024 shared task challenges participants to develop models that can recognise sexist content in environments that are multilingual, specifically in Spanish and English <ref type="bibr" target="#b0">[1]</ref>.</p><p>The complex and context-dependent nature of the language used makes it difficult to automatically detect sexism. With differing degrees of effectiveness, conventional machine learning techniques like logistic regression and support vector machines (SVM) have been used. Transformer-based models have shown higher performance in natural language processing (NLP) tasks, such as sexism detection, more recently. Examples of these models are BERT <ref type="bibr" target="#b1">[2]</ref>, <ref type="bibr" target="#b2">[3]</ref>, RoBERTa <ref type="bibr" target="#b3">[4]</ref>, and their multilingual variations. This research presents a way for identifying sexism in social networks by combining pre-trained embeddings with machine learning models. Key steps in the approach include loading and exploring datasets, pre-processing images, extracting features using the CLIP model <ref type="bibr" target="#b4">[5]</ref>, separating datasets, training and evaluating models. Memes were classified as sexist or non-sexist using a variety of machine learning methods, such as AdaBoost, SVM, XGBoost, Decision Trees, Random Forest, Logistic Regression, and SGD <ref type="bibr" target="#b5">[6]</ref>. The highest-performing model, Random Forest, was then used to forecast whether sexism will be present in a test dataset. The outcomes were then stored in a JSON file for further analysis.</p><p>The dataset of memes used in this study has been annotated for sexism. To ensure high-quality input for model training, the dataset is subjected to a thorough pre-processing protocol. The proposed method intends to support the ongoing efforts to develop more inclusive and safe online environments by fusing powerful machine learning models with sophisticated feature extraction techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>Finding damaging and sexist content on social media has been a major field of study, with many different strategies and techniques put forth. In the beginning, sexism was frequently studied as a type of harassment or as a subcategory of hate speech. Character-level and word n-grams with logistic regression to classify tweets as racist, sexist, or neither along with other research have used conventional machine learning techniques like Random Forests, TF-IDF, and Support Vector Machines (SVM), utilising manually chosen features like emotion ratings and Bag of Words (BoW) <ref type="bibr" target="#b0">[1]</ref>.</p><p>Deep learning has greatly improved the performance of NLP tasks, including sexism detection. In particular, Transformer-based models like BERT <ref type="bibr" target="#b1">[2]</ref>, RoBERTa <ref type="bibr" target="#b3">[4]</ref>, and their multilingual variations (e.g., XLM-RoBERTa <ref type="bibr" target="#b2">[3]</ref>) have made a substantial contribution to this improvement <ref type="bibr">[7] [8]</ref>. The Transformer architecture enables these models to capture complicated language semantics and context, which is beneficial <ref type="bibr" target="#b8">[9]</ref>. For example, trained on a large multilingual corpus, XLM-RoBERTa has shown greater performance in capturing multilingual context nuances, which makes it especially useful for jobs involving diverse languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Overview</head><p>A number of crucial processes are involved in the proposed method for detecting sexism in memes: importing and exploring datasets, pre-processing images, extracting features using the CLIP model, partitioning datasets, training and evaluating models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset Loading and Exploration</head><p>The collection includes memes as shown in Figure <ref type="figure">1</ref> with sexism annotations in both Spanish and English <ref type="bibr" target="#b9">[10]</ref>. To comprehend the structure and substance of the dataset, the first stages are to load and visualise it. To provide a visual sense of the diversity and dispersion of the data, samples of memes with their related labels are displayed in this stage. More than 5,000 labelled memes in English and Spanish make up the EXIST 2024 Memes Dataset <ref type="bibr" target="#b10">[11]</ref>, <ref type="bibr" target="#b11">[12]</ref>; 4,044 of the memes are categorised as training, and 1,053 as testing. The dataset makes sure that the two languages are distributed equally, which makes thorough multilingual analysis possible. Every meme is organised as a JSON object with comprehensive properties such as a distinct identifier ("id_EXIST"), the meme's language ("lang"), and the text that has been automatically retrieved from the meme ("text"). The filename ("meme") and the file's path ("path_memes") are also included in the dataset. The number of annotators, their unique identifiers, gender, age group, self-reported ethnicity, degree of education, and nation of residency are all carefully documented in the annotator data. Multiple annotators label each meme to indicate whether or not it contains sexist expressions or behaviours. "YES" or "NO" are examples of possible labels. This extensive annotation offers a strong basis for developing and testing machine learning models designed to identify sexism in memes. The organised method to guaranteeing fair and thorough data coverage for both training and testing phases is demonstrated in this detailed perspective of the dataset, which is depicted in the Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Pre-processing</head><p>Preparing images for feature extraction and model training requires a crucial step called pre-processing, which guarantees consistency and ideal input quality. In the pre-processing stage, images undergo several critical transformations to ensure consistency and optimal input quality for the model. First,  the images are resized to a uniform dimension of 256 × 256 pixels, standardizing input sizes to reduce computational complexity and enhance processing performance. The images are then centrally cropped to 224 × 224 pixels, which helps eliminate extraneous background elements and focus on the main content of the memes. Following cropping, the pixel values are normalized to a range typically between 0 and 1, ensuring uniform feature scaling which accelerates the convergence process during training. The pre-processing pipeline also involves converting images to RGB format to maintain color consistency and handling any potential image loading errors. These meticulously crafted steps, performed using Python libraries such as PIL for image handling and Torchvision for transformations, are essential for meeting the input specifications of the CLIP model <ref type="bibr" target="#b4">[5]</ref>, which relies on consistently processed images for precise feature extraction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Feature Extraction</head><p>For feature extraction, we employ the state-of-the-art pre-trained model CLIP (Contrastive Language-Image Pre-training) <ref type="bibr" target="#b4">[5]</ref>, which is renowned for its efficaciousness in encoding text and images into a common feature space. CLIP is able to comprehend and categorise complicated multimodal input by using contrastive learning to align visual and textual representations. We extract high-dimensional feature vectors that capture the semantic content of the pre-processed images by feeding them into the CLIP model. The following machine learning models then use these attributes as inputs. The reason behind the selection of CLIP is its strong ability to capture the subtle correlations between textual and visual data, which makes it especially appropriate for applications like meme categorization where the quality of both text and image material is crucial <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Dataset Splitting</head><p>To guarantee an even distribution of memes in Spanish and English, the dataset is carefully divided into training and validation sets. In order to maintain the representativeness of the training data and ensure that the models trained on it can effectively generalise to new examples across other languages, stratified splitting is essential. The validation set acts as an impartial set to assess how well the machine learning models perform; the training set is utilised to fit the models. This process is necessary to determine how effectively the models will function in practical situations and to adjust hyperparameters to avoid overfitting. Additionally, by preventing the models from becoming biased in favour of any one language, the balanced distribution improves the models cross-linguistic applicability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Model Training and Evaluation</head><p>Different machine learning models were trained to categorise the memes as sexist or non-sexist after feature extraction and dataset splitting. Logistic regression, Support Vector Machines (SVM), XGBoost, Decision Trees, Random Forest, AdaBoost, Neural Networks, and Stochastic Gradient Descent (SGD) are among the models that were assessed. After a thorough training process using the collected features, each model is assessed using confusion matrices, accuracy scores, and classification reports. These measurements offer a thorough evaluation of each model's effectiveness, pointing out both its advantages and disadvantages in terms of sexist content detection.</p><p>The Random Forest model outperformed the other models that were assessed, exhibiting the best classification accuracy and robustness as shown in Figure <ref type="figure" target="#fig_2">6</ref>. This model is excellent at managing complicated datasets and reducing overfitting. It is well-known for its ensemble learning method, which integrates many decision trees. The Random Forest model is then used to predict if sexism will be present in the test dataset after it has been determined to be the top performer. The forecasts are then stored in a JSON file for additional examination, resulting in a structured output that is simple to understand and apply to reports and more study.</p><p>These intricate procedures, which include pre-processing, feature extraction, dataset splitting, model training, and evaluation, guarantee a strong and all-encompassing solution to the problem of sexism detection in memes. Every stage is meticulously crafted to optimise the efficacy and applicability of the models, hence augmenting the system's total efficiency in practical scenarios. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Overall Performance</head><p>In terms of the comprehensive assessment of every case, the system produced the following outcomes: Based on these findings, the model is ranked 36th out of all the participants. The system's capacity to </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Spanish Instances</head><p>The system's performance increased when tested on Spanish instances, proving its capacity to manage multilingual data successfully: The model placed 30th in this category with these results. The pre- processing and feature extraction strategies appear to be especially beneficial for Spanish language memes, based on the enhanced scores in the Spanish context. When it comes to identifying sexist content in Spanish memes, a higher F1_YES score denotes improved memory balance and precision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">English Instances</head><p>In contrast, the evaluation on English instances highlighted areas for improvement in the system's performance: Here, the model ranked 37th. The model's difficulties in managing English language memes are indicated by the lower scores in the English context, which also point to possible areas for improvement in the pre-processing or feature extraction for English content. In this language area, there has to be a greater balance between recall and precision, as indicated by the relatively lower F1_YES score.</p><p>The performance measures in various scenarios highlight the advantages and disadvantages of this methodology. Although it showed diversity among languages, the Random Forest model, which was shown to be the best-performing model during training and validation, exhibited resilient performance overall. Future improvements, concentrating on customised feature extraction and pre-processing strategies to handle the unique qualities of English and Spanish memes, might be guided by the insights gained from these results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Methodology Enhancement</head><p>It is clear from the insights from the existing results that improving the methods might greatly increase the resilience and efficacy of the sexism detection system. This section describes a number of possible improvements, with particular attention on sophisticated feature extraction methods, dynamic pre-processing pipelines, and hybrid approaches that combine numerous techniques for better outcomes.</p><p>1. Improved Feature Extraction: Sophisticated feature extraction methods can be quite helpful in extracting the semantic and contextual details from memes, which frequently contain nuanced and intricate sexism indications. Using more complex models and methods can improve the quality of the features that are retrieved from memes textual and visual components. By aligning visual and textual elements into a shared embedding space, models that are built to handle both visual and textual data, such as VisualBERT or ViLBERT <ref type="bibr" target="#b12">[13]</ref>, can be integrated to provide a more thorough knowledge of memes and enable more accurate classification. Furthermore, by catching minute details and patterns that more basic models can overlook, using cutting-edge convolutional neural networks (CNNs) like EfficientNet, which offers a scalable and effective architecture, can enhance feature extraction from images. Graph Neural Networks (GNNs) can also be used to represent the relationships between various components inside a meme, capturing the dependencies and contextual relationships that are essential for comprehending the content, for memes with rich text-image interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Dynamic Pre-processing:</head><p>Preserving the consistency and quality of input features requires building dynamic pre-processing pipelines that can adjust to various data formats and language combinations. More efficiently, the heterogeneity in meme formats and content can be handled by a pre-processing architecture that is adaptable and versatile. By employing techniques like object identification to recognise and preserve the relevant portions of an image, adaptive scaling and cropping algorithms can guarantee that important portions of the images are not destroyed. Additionally, by handling various alphabets, special characters, and idiomatic expressions with customised approaches, normalisation techniques that take into account the unique characteristics of different languages can improve text processing performance. Increasing the diversity of the training data through the application of data augmentation techniques like random cropping, rotation, and colour modifications can also help to create more resilient models that perform better when applied to previously unseen data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Hybrid Approaches:</head><p>Rule-based systems and machine learning models together can handle edge cases more skillfully and increase the system's overall accuracy. By combining the best features of probabilistic and deterministic techniques, hybrid approaches can offer a more complete solution. By using specified terms, phrases, or patterns that are suggestive of sexist content, rule-based filters might assist in identifying explicit and evident occurrences of sexism that machine learning models would overlook. By combining the predictions from various models, ensemble methods can increase overall performance by combining the strengths of various models, hence increasing the system's robustness and accuracy. Contextual data, such as user interaction patterns and social network metadata, can also offer extra insights that improve the detection of sexist content by illuminating the environment in which memes are shared and their possible effects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future Work</head><p>To significantly enhance the efficacy and robustness of the sexism detection system, several advanced strategies and techniques can be explored. These improvements focus on various aspects of the model development lifecycle, from data augmentation and pre-processing to model architecture and evaluation metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Enhanced Data Augmentation</head><p>Using Generative Adversarial Networks (GANs) is one interesting way to increase the model's robustness. The current dataset can be enhanced by using GANs to produce artificially realistic yet synthetic meme images. By resolving class imbalance and broadening the pool of training samples, this strategy can improve the model's capacity to generalise across various forms of sexist material. Furthermore, the textual material within memes can be made more diverse by utilising textual data augmentation techniques like synonym replacement, paraphrasing, and back-translation. By ensuring that the model picks up strong features from a variety of linguistic expressions, these techniques raise the accuracy of the model even further.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Advanced Model Architectures</head><p>Text analysis performance in the system can be greatly improved by using transformer-based models, such as BERT <ref type="bibr" target="#b1">[2]</ref>, RoBERTa <ref type="bibr" target="#b3">[4]</ref>, and XLM-R. These models are particularly good at capturing contextual subtleties and intricate language semantics, which are essential for identifying nuanced instances of sexism. Additionally, investigating multimodal transformers that incorporate textual and visual inputs, such as VisualBERT or ViLBERT <ref type="bibr" target="#b12">[13]</ref>, can offer a comprehensive meme analysis. Predictions from several models can also be combined by using ensemble techniques like stacking and blending. <ref type="bibr" target="#b6">[7]</ref> By combining the advantages of several models, this method lowers the chance of overfitting while enhancing prediction accuracy. Pre-trained models such as VGG <ref type="bibr" target="#b13">[14]</ref>, ResNet <ref type="bibr" target="#b14">[15]</ref>, or EfficientNet can be utilized for image feature extraction, or custom CNN architectures suited to the unique features of meme images can be created.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3.">Cross-lingual and Multimodal Models</head><p>Effective management of multilingual text data requires the use of cross-lingual embeddings, such as Multilingual BERT (mBERT). These embeddings improve the systems worldwide applicability by guaranteeing consistent performance across many languages. Creating shared embedding spaces for text and images through multimodal learning can greatly enhance the models comprehension of memes, in addition to its cross-lingual capabilities. By capturing complex interactions between textual and visual aspects, pre-training models on big multimodal datasets strengthens the systems ability to interpret meme content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4.">Fine-tuning Pre-trained Models</head><p>General-purpose models can be tailored to the specifics of the target domain by fine-tuning pre-trained models on domain-specific datasets pertaining to sexism detection and social media analytics. This task-specific fine-tuning increases the relevance and accuracy of the models. Furthermore, the models performance on sexism identification can be improved by utilising layer-wise transfer from models that have already been pre-trained on comparable tasks, such hate speech detection, to leverage transfer learning. This method, which makes use of shared features across related domains, cuts down on training time and costs while offering a strong basis for the new work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5.">Multimodal Data Integration</head><p>Experimenting with fusion strategies, such as early, late, and hybrid fusion, is essential to capturing complementing information from memes textual and visual aspects. By fusing textual and visual elements, these methods offer a thorough comprehension of memes. Furthermore, the models comprehension of memes in the context of social media can be improved by employing contextual embedding that take into account the memes larger context, including user metadata and engagement metrics. By using this method, the model is guaranteed to capture the entire range of information included in memes, increasing the accuracy of detection <ref type="bibr" target="#b8">[9]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.6.">Improved Evaluation Metrics</head><p>It is crucial to keep assessing hierarchical and multilabel classification problems using sophisticated metrics like ICM and ICM-Soft. These measures capture the complexities of sexism detection and offer a detailed assessment of model performance. Furthermore, user studies that assess the systems functionality in real-world situations and collect input for future improvement might yield insightful information. Through a user-centric evaluation, it is ensured that the model meets user expectations and works well in real-world applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>This system for detecting sexist content in memes demonstrated moderate performance in the EXIST 2024 Task 4 shared task. The results indicate that while the model is competitive, there is considerable room for improvement, particularly in the English instances where it ranked lower. The system performed better on Spanish instances, which suggests that the pre-processing and feature extraction steps might be more effective for Spanish language content. The outcomes highlight the intricacy of the task and the subtlety of sexist content, which presents serious difficulties for automated detection systems. Although this method, which fused sophisticated feature extraction with the CLIP model with conventional machine learning models, provided a strong basis, further improvements will be needed to increase its accuracy and durability.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>CLEF 2024 :</head><label>2024</label><figDesc>Conference and Labs of the Evaluation Forum, September 09-12, 2024, Grenoble, France U.Pasha1@universityofgalway.ie (U. W. Pasha)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :Figure 2 : 3 :Figure 4 : 5 :</head><label>12345</label><figDesc>Figure 1: Dataset: EXIST: sEXism Identification in Social neTworks<ref type="bibr" target="#b10">[11]</ref>,<ref type="bibr" target="#b11">[12]</ref> </figDesc><graphic coords="3,73.99,89.93,216.62,218.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Model Performance Summary Chart</figDesc><graphic coords="5,72.00,87.93,451.28,174.27" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Description of Datasets and Image Ranges</figDesc><table><row><cell cols="3">Dataset Language Image Count</cell><cell>Range</cell></row><row><cell>Training</cell><cell>Spanish</cell><cell>2034</cell><cell>110001-112034</cell></row><row><cell></cell><cell>English</cell><cell>2010</cell><cell>210001-212010</cell></row><row><cell>Testing</cell><cell>Spanish</cell><cell>540</cell><cell>310001-310540</cell></row><row><cell></cell><cell>English</cell><cell>513</cell><cell>410001-410513</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell cols="2">Overall Performance Metrics for Sexist Content Detection</cell><cell></cell></row><row><cell cols="3">ICM-Hard ICM-Hard Norm F1_YES</cell></row><row><cell>-0.3083</cell><cell>0.3432</cell><cell>0.5956</cell></row></table><note>manage intricate, hierarchical classification tasks is shown by the ICM-Hard score, a metric that takes into account the information content of both correct and incorrect classifications. A standardised view of this performance is given by the normalised ICM-Hard score (ICM-Hard Norm), while the F1_YES score emphasises the recall and precision for the positive class-in this example, the identification of sexist memes.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Performance Metrics for Spanish Instances in Sexist Content Detection</figDesc><table><row><cell cols="3">ICM-Hard ICM-Hard Norm F1_YES</cell></row><row><cell>-0.2216</cell><cell>0.3871</cell><cell>0.6032</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc></figDesc><table><row><cell cols="3">Performance Metrics for English Instances in Sexist Content Detection</cell></row><row><cell cols="3">ICM-Hard ICM-Hard Norm F1_YES</cell></row><row><cell>-0.3953</cell><cell>0.2993</cell><cell>0.5882</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We appreciate the platform that EXIST 2024 shared task organisers provided to further study on the identification of sexist content in social networks. I also like to express my appreciation to the annotators for their work in labelling the dataset, which made it possible to perform this study.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sexism identification in social networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kumar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="891" to="900" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><forename type="middle">C</forename><surname>Kenton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">K</forename><surname>Toutanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of naacL-HLT</title>
				<meeting>naacL-HLT</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Bernice: A multilingual pretrained encoder for twitter</title>
		<author>
			<persName><forename type="first">A</forename><surname>Delucia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Aguirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Resnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dredze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 conference on empirical methods in natural language processing</title>
				<meeting>the 2022 conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="6191" to="6205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<title level="m">Roberta: A robustly optimized bert pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Learning transferable visual models from natural language supervision</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hallacy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mishkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="8748" to="8763" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1711.05101</idno>
		<title level="m">Decoupled weight decay regularization</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Sexism identification using bert and data augmentation-exist2021</title>
		<author>
			<persName><forename type="first">S</forename><surname>Butt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ashraf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Gelbukh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IberLEF@ SEPLN</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="381" to="389" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations</title>
				<meeting>the 2020 conference on empirical methods in natural language processing: system demonstrations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.02116</idno>
		<title level="m">Unsupervised cross-lingual representation learning at scale</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Sexism prediction in spanish and english tweets using monolingual and multilingual bert and ensemble models</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F M</forename><surname>De Paula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">F</forename><surname>Da Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">B</forename><surname>Schlicht</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2111.04551.arXiv:2111.04551" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes (Extended Overview)</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yatskar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Hsieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-W</forename><surname>Chang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1908.03557</idno>
		<title level="m">Visualbert: A simple and performant baseline for vision and language</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556</idno>
		<title level="m">Very deep convolutional networks for large-scale image recognition</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
