<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Sexism Detection in Tweets with Annotator-Integrated Ensemble Methods and Multimodal Embeddings for Memes Notebook for the EXIST Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Martha</forename><surname>Paola Jimenez-Martinez</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigación Científica y de Educación Superior de Ensenada</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Joan</forename><forename type="middle">Manuel</forename><surname>Raygoza-Romero</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigación Científica y de Educación Superior de Ensenada</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlos</forename><forename type="middle">Eduardo</forename><surname>Sánchez-Torres</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Universidad Autónoma de Baja California</orgName>
								<address>
									<settlement>Ensenada</settlement>
									<region>Baja California</region>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Irvin</forename><surname>Hussein Lopez-Nava</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Centro de Investigación Científica y de Educación Superior de Ensenada</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Universidad Autónoma de Baja California</orgName>
								<address>
									<settlement>Ensenada</settlement>
									<region>Baja California</region>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Manuel</forename><surname>Montes-Y-Gómez</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Óptica y Electrónica</orgName>
								<orgName type="institution">Instituto Nacional de Astrofísica</orgName>
								<address>
									<country key="MX">Mexico</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Sexism Detection in Tweets with Annotator-Integrated Ensemble Methods and Multimodal Embeddings for Memes Notebook for the EXIST Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">86252BD35207730DE681027B74B0F079</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sexism detection, Sexism identification, Sexism classification, Social media, Transformer models Montes-y-Gómez) 0009-0005-8701-9875 (M. P. Jimenez-Martinez)</term>
					<term>0000-0003-3085-5678 (J. M. Raygoza-Romero)</term>
					<term>0000-0001-5799-4067 (C. E. Sánchez-Torres)</term>
					<term>0000-0003-3979-9465 (I. H. Lopez-Nava)</term>
					<term>0000-0002-7601-501X (M. Montes-y-Gómez)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper details MMICI's participation in the EXIST challenge at CLEF 2024, focusing on the identification and categorization of sexism in social media and memes. For tweets, we employed pre-trained transformer models and ensemble voting approaches. For memes, we utilized CLIP embeddings using a Vision Transformer (ViT) model and two types of classifiers: feed-forward neural networks and factorization machines. The tasks encompassed detecting sexism in tweets and memes, as well as categorizing their type and the author's intention. Our methodology for tweets integrates annotator profiles, such as gender and age, to enhance the accuracy of sexism identification, source intention, and sexism categorization. For memes, we utilized all annotator features (gender, age, ethnicity, study level, and country) for the same tasks. The results demonstrate the effectiveness of our models across various tasks, emphasizing the integration of diverse perspectives. Notably, our best performances include a 10th place ranking in Task 1, a 15th place ranking in Task 2, and a 13th place ranking in Task 3 for Spanish tweets. For memes, we achieved a 3rd place ranking in Task 4 for English memes, two 1st place rankings in Task 5 for both English and Spanish memes, and a 2nd place ranking in Task 6 for English memes. These results underscore the importance of incorporating the demographic factors of annotators and taking advantage of multimodal embeddings for robust performance in sexism detection.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>According to the Cambridge Dictionary, sexism is defined as "(actions based on) the belief that the members of one sex are less intelligent, able, skilful, etc. than the members of the other sex, especially that women are less able than men [1]". In contrast, the Royal Spanish Academy defines sexism as "discrimination against individuals based on their sex" (in spanish: discriminación de las personas por razón de sexo) <ref type="bibr">[2]</ref>. Both interpretations, based on the meaning and expression in both languages, agree that sexism not only reflects but also communicates and perpetuates the stereotypes and roles historically assigned to women and men in society. This perpetuation of stereotypes is a significant factor in the struggle for gender equity <ref type="bibr" target="#b0">[3]</ref>.</p><p>Research on gender ideologies employs the Ambivalent Sexism Inventory and the Ambivalence toward Men Inventory. The Ambivalent Sexism Inventory measures hostile sexism, which reflects antagonistic attitudes towards women, and benevolent sexism, which consists of subjectively favorable but patronizing beliefs about women. The Ambivalence toward Men Inventory assesses hostility toward men, rooted in the resentment of men's perceived greater power, and benevolence toward men, which involves favorable views of men as protectors and providers. Ambivalent sexism theory posits that hostile sexism and benevolent sexism arise due to social and biological factors common across cultures, such as patriarchy, gender differentiation, and heterosexuality. Systemically, hostile sexism and benevolent sexism function as complementary ideologies that justify and perpetuate gender inequality, showing a strong correlation across cultures. This underscores the necessity of addressing both hostile and benevolent forms of sexism in the pursuit of gender equality <ref type="bibr" target="#b1">[4]</ref>.</p><p>This paper details MMICI's participation in the "sEXism Identification in Social neTworks" (EXIST) shared task at CLEF 2024. EXIST aims to broadly capture instances of sexism, ranging from overt misogyny to subtler expressions of implicit sexist behavior, a task it has been undertaking since 2021. The goal of utilizing automatic tools is not only to detect and alert against sexist behaviors and discourses but also to estimate the prevalence of sexist and abusive situations on social media platforms, identify the most common forms of sexism, and understand how sexism manifests in these media <ref type="bibr" target="#b0">[3]</ref>.</p><p>Over the years, EXIST has evolved significantly. In 2021 and 2022, it provided a dataset with definitive (hard) labels for each tweet. However, starting from 2023 and continuing into 2024, the task expanded to generate six different labels per tweet, each derived from six distinct annotator profiles. These profiles include three women and three men from distinct age groups: 18-22, 23-45, and 46+. Furthermore, the most recent edition incorporates the demographic parameters of the annotators, such as gender, age, level of education, ethnicity, and country of residence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Dataset EXIST 2024</head><p>In its fourth edition <ref type="bibr" target="#b2">[5]</ref>, the task has incorporated new challenges involving images, specifically memes. The six tasks are as follows:</p><p>• Task 1: Sexism Identification in Tweets involves identifying whether a tweet is sexist or not.</p><p>• Task 2: Source Intention in Tweets follows, where once a tweet is classified as sexist, it involves categorizing the intention of the author-whether the intention is direct, reported, or judgmental. These tasks aim to enhance the understanding and detection of sexism across various forms of social media content in both English and Spanish, ultimately supporting efforts to combat sexism online. Given that information is provided from expressions in different languages, it cannot be assumed that models for detecting sexism in one language can be applied directly to another. This is due to the syntax and semantic differences in the manifestations of sexism across various countries and contexts <ref type="bibr" target="#b3">[6]</ref>. To better understand the differences between the expressions in both languages, Table <ref type="table" target="#tab_3">1</ref> provides some examples of the labels given for the different Dataset tasks where all annotators reached a consensus on that label. @messyworldorder it's honestly so embarrassing to watch and they'll be like "not all white women are like that"</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TASK 2: Source Intention in Tweets</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Direct</head><p>Una mujer necesita amor, llenar la nevera, si un hombre puede darle esto a cambio de sus servicios (tareas domésticas, cocinar, etc.), no veo qué más necesita.</p><p>Women shouldn't code. . . perhaps be influencer/creator instead. . . it's their natural strength.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reported</head><p>Me duermo en el metro, abro los ojos sintiendo algo raro: la mano del hombre sentado a mi lado en mi pierna #SquealOnYourPig.</p><p>Today, one of my year 1 class pupils could not believe he'd lost a race against a girl.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Judgemental</head><p>Como de costumbre, la mujer fue la que dejó su trabajo por el bienestar de la familia. . . "</p><p>21st century and we are still earning 25% less than men #Idonotrenounce.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TASK 3: Sexism Categorization in Tweets</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ideological and Inequality</head><p>Mi hermana y mi madre se burlan de mí por defender todo el tiempo los derechos de todos y me acaban de decir feminazi, la completaron.</p><p>I think the whole equality thing is getting out of hand. We are different, thats how were made! Stereotyping and Dominance @Paula2R @faber_acuria A las mujeres hay que amarlas. . . solo eso. . . Nunca las entenderás.</p><p>Most women no longer have the desire or the knowledge to develop a high quality character, even if they wanted to.</p><p>Objectification "Pareces una puta con ese pantalón" -Mi hermano de 13 cuando me vio con un pantalón de cuero. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Overview of the proposal</head><p>Previous research has developed methods to model annotators in subjective tasks, allowing for the prediction of personalized labels for each annotator. For instance, Akhtar et al. <ref type="bibr" target="#b4">[7]</ref> conducted an exhaustive search to classify annotators into two groups based on their annotation patterns. Their study demonstrated that an ensemble model, composed of two distinct classifiers representing the perspectives of each group, outperformed the traditional single-task model that only considers aggregated labels. Additionally, traditional classification methods typically aggregate labels through majority voting or averaging before training. However, this approach has been found to potentially "silence the voices" of socio-demographic minority groups <ref type="bibr" target="#b4">[7]</ref>. One of the objectives of this study is to leverage the individual opinions of annotators, or group them based on specific demographic characteristics, to ensure that their "voices" are effectively integrated into the sexism detection models.</p><p>Building on previous concepts, our approach to addressing the EXIST task encompasses multiple strategies across various runs and tasks: • Run 1 for Tasks 1, 2, and 3: The model predicts labels by employing an ensemble method that combines outputs based on different age groups and gender. • Run 2 for Tasks 1, 2, and 3: The model predicts labels by employing an ensemble method that integrates outputs from various profiles of the six annotators. • Run 3 for Tasks 1, 2, and 3: The model predicts labels using a majority vote approach, where the final prediction is based on the consensus among all annotators. • Runs 1 and 2 for Tasks 4 and 5: For these tasks, our approach involves using embeddings for both the text and the image of each meme. These embeddings represent deep features of the meme. Additionally, annotator attributes are incorporated to develop a model capable of predicting labels for each annotator.</p><p>The final label is determined by a voting mechanism among the predictions of the annotators. • Runs 1 and 2 for Task 6: A specialized model is trained for each label using only sexism data, with the data balanced for each class. Embeddings for the meme text and image are utilized. The final output combines the model's prediction for non-sexist cases (from Task 4) with the outputs of the specialized models for each sexism category to produce a single prediction. • Run 3 for Tasks 4, 5, and 6: The system predicts labels by concatenating the profile annotator's embedding with an image embedding in the same space. A multimodal embedding model assesses the relationship between annotators and items, and a voting mechanism is then applied to determine the final score.</p><p>Our general approach is presented in Figure <ref type="figure" target="#fig_0">1</ref> and integrates text and visual processing using transformer models to extract features and perform classifications. Texts (tweets) are preprocessed and fed into a transformer model to generate text embeddings, while images (memes) are processed through a vision transformer model to produce visual embeddings. Annotator features are extracted from the text embeddings, and a classifier is trained using these features along with the text and visual embeddings. An ensemble technique is applied to combine the outputs of the models, enhancing the accuracy of the classifier. The performance is then evaluated across several specific tasks to ensure comprehensive assessment and optimization of the results.</p><p>In the domain of text analysis in Spanish, our dataset was meticulously constructed, comprising 2526 samples for training purposes, with an additional 639 samples reserved for validation, and a final set of 490 samples designated for comprehensive testing. Conversely, for the English language domain, our dataset consisted of 1832 meticulously curated samples for training. An additional 574 samples were allocated for validation. Furthermore, our local test set, comprising 978 meticulously selected samples, served as a crucial benchmark for evaluating the generalization capabilities of our models in real-world applications. The metrics used for each task are as follows: For Task 1 and Task 4, we employed the ICM-Hard Norm F1-score for the positive class (sexism). For Task 2, Task 3, Task 5, and Task 6, we used the ICM-Hard Norm F1-score macro, which is the average of the F1-scores for all classes. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Sexism Detection in Tweets</head><p>Firstly, for the detection of sexism in tweets, we focus on integrating annotator information, particularly considering their profiles such as gender and age, as summarized in Figure <ref type="figure" target="#fig_1">2</ref>.</p><p>• Text Preprocessing: Mentions within the tweets were substituted with '@USER,' while any URLs were replaced with 'HTTPURL. ' • Transformer Model: We decided to use pre-trained models specifically for tweets:</p><p>"cardiffnlp/twitter-roberta-base-sentiment" for English and "pysentimiento/robertuito-baseuncased" for Spanish, both from Hugging Face, since these models were trained with data in the respective languages. • Ensemble: We employed two different ensembles. The first ensemble used a majority vote from the outputs of six different models, one for each annotator. The second ensemble used a majority vote from the outputs of five different models, focusing on gender and age (females, males, 18-22, 23-45, and 46+).</p><p>As mentioned in the previous Section, our runs for the first three text-focused tasks were:</p><p>1. Run 1: An ensemble was created from the outputs of five different models, focusing on gender and age. A majority vote was taken from the outputs of these five models, with the label being assigned if three or more groups agreed. 2. Run 2: An ensemble was created from the outputs of six different models, one for each annotator.</p><p>A majority vote was taken from the outputs of these six models, with the label being assigned if four or more annotators agreed. 3. Run 3: A majority vote was taken initially from the six annotators' inputs, serving as the baseline.</p><p>Similarly, the label was assigned if four or more annotators agreed.</p><p>To ensure a decision was always made in each ensemble without ties, we used probabilistic voting rather than hard voting from each model. This means that even if three models classify a tweet as sexist and three do not, the probabilities are compared, and the decision is made based on the highest probability, ensuring a definitive decision for all predictions in the ensembles.</p><p>For Task 2, which requires determining the intention of the tweets (single label), the label was assigned based on the highest probability prediction among the types of intentions if the tweet was sexist. To achieve this, a binary model was trained for each label. This approach ensures that the classification is both precise and comprehensive, taking into account the nuanced nature of the intentions expressed in the tweets.</p><p>For Task 3, which involves identifying the types of sexism in a tweet (multi-label), the ensemble takes into account all types of sexism indicated by the annotators. For example, if one annotator labels a tweet as objectification and another labels it as misogynistic and sexual violence, all three types of sexism are included in our ensemble prediction Furthermore, we employ multiple binary classification models for each label, allowing us to address each facet of identified sexism with specificity and precision.</p><p>To analyze in depth the impact of considering individualized annotators' opinions (A1-A6), grouped opinions before the classification models (All), opinions by demographic group (Females, Males, 18-22, 23-45, 46+), or assembled at the end (Ensemble Annotators, Ensemble Groups), Figure <ref type="figure" target="#fig_3">3</ref> presents the results based on the performance of the sexism identification, intention, and categorization models, respectively.</p><p>The selection of group ensemble and annotator ensemble approaches as Run 1 and Run 2, respectively, is grounded in their ability to integrate a wide range of perspectives and individual judgments. The group ensemble, by combining different demographics, offers an enriched and balanced overview, which is crucial for tackling the complexity of the tasks at hand. On the other hand, the annotator ensemble capitalizes on the diversity of individual judgments, ensuring a robust and competitive performance. Finally, the direct majority vote of annotators is established as the baseline (Run 3) due to its simplicity and effectiveness, providing a clear reference for evaluating ensemble methods. These choices are backed by the best results obtained in each task, where the group ensemble consistently outperforms others in terms of performance and ability to capture the inherent complexity in the datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Sexism Detection in Memes</head><p>We chose a different path for tasks 4, 5, and 6 as shown in Figure <ref type="figure" target="#fig_4">4</ref>. Although we leveraged the annotator data from the dataset, the text preprocessing steps depicted in Figure <ref type="figure" target="#fig_0">1</ref> were not applied. Instead, embeddings were extracted directly from the raw data using different approaches. We utilized CLIP embeddings for the memes and text along with annotator features. It was decided to address the tasks from the textual domain due to the high variability in the representation and graphic styles of the Memes (see examples in Table <ref type="table" target="#tab_3">1</ref>).</p><p>In Runs 1 and 2, annotator features were represented using one-hot encoding for gender, age range, ethnicity, study level, and country. The reader can explore this approach in the subsection 5.1. In Run 3, a descriptive text was created for the annotator features, from which embeddings were extracted. For Runs 2 and 3, the classifier used was a feed-forward neural network (FNN) with two hidden layers, containing 4096 and 512 neurons, respectively. Following these layers, a dropout layer with a dropout rate of 0.1 was applied. In Run 3, we further leverage the annotation and meme relationship and propose a Factorization Machine model, a Collaborative filtering technique, to predict the annotation based on annotator and meme CLIP embeddings. We explain more about this approach in the subsection 5.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Feed-forward neural network with CLIP embeddings</head><p>For Task 4, the output layer in the FNN consisted of a single neuron to produce the probability of the sexism class. We evaluated various approaches for the model: using only text embeddings, using only image embeddings, using both text and image embeddings, utilizing a general model (without annotator characteristics), and some combinations of the outputs of some of these models. Table <ref type="table" target="#tab_6">2</ref> outlines the features of each evaluated model.</p><p>It is essential to define two concepts: early fusion and late fusion. In early fusion, the model simultaneously receives both text and image embeddings, meaning the model's input includes annotator features, text embeddings, and image embeddings (as seen in the "Text+Image" and "Text+Image General" models). In late fusion, the outputs of two models are combined. For example, in the "Text|Image" model, the outputs of the "Text" model (trained only with text embeddings) and the "Image" model (trained only with image embeddings) are combined by averaging their outputs. Similarly, the "Text|Image &amp; Text|Image General" model combines the outputs of the "Text|Image" and "Text+Image General" models by averaging their outputs.    achieve higher mean F1 scores and low variations of the performance. These models correspond to Run 1 and Run 2, respectively. For task 5, the output layer in the FNN consisted of a 3 neurons to yield the probability of every label. Similar to task 4, we evaluated some approaches for the model: using only text embeddings, using only image embeddings, using both text and image embeddings, and a combination of the outputs of "Text" and "Image" models by averaging their outputs. Figure <ref type="figure" target="#fig_6">6</ref> displays the F1 score macro across 10 runs for each model in Task 5.</p><p>The results in Figure <ref type="figure" target="#fig_6">6</ref> indicate the "Text+Image" model achieves higher mean F1 scores and low variations of the performance. These models correspond to Run 1. For Run 2, the "Text|Image" model was selected. Although it did not achieve the highest F1 score, it demonstrated a strong MSE score comparable to the "Text+Image" model.</p><p>For Task 6, the output layer in the FNN consisted of one neuron to yield the probability for each label of sexism categorization. We created 5 models, each trained exclusively on data from sexism memes and with a random subset of training data for negative cases equal to the amount of training data for positive cases. Consequently, each model was trained on a balanced dataset. The probability output from the model for Task 4 was used to determine the probability of the not sexism label and then combined with the outputs from each of these 5 models to produce a final prediction.</p><p>There are two exceptional cases to consider: i) If the probability of not sexism is higher than 0.5, as well as one of the 5 categories of sexism, the final prediction is always not sexism. ii) If the probability of not sexism is lower than 0.5, as well as one of the 5 categories of sexism, the meme is classified as sexist, and the category of sexism with the highest probability is selected. Similar to Tasks 4 and 5, we evaluated various approaches for the model. Figure <ref type="figure" target="#fig_7">7</ref> presents the macro F1 scores for each model. We observed similar performance among the "Text", "Text+Image", and "Text|Image" models. Based on these results, we selected the "Text|Image" model for Run 1 and the "Text+Image" model for Run 2. The "Text" model was not chosen, as we believe that the combination of text and image embeddings yields better results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Multimodal Collaborative Filtering employing CLIP embeddings and Factorization Machines</head><p>Loni2018FactorizationMF In this approach, we model similar to how to assign a score in a recommendation system or to predict links between nodes in a bipartite graph, leveraging the fact that we have the annotator and the item features. Given known subject-item preferences, predict new subject-item preferences. Formally, let 𝑈 a set of all subjects 𝑈 and 𝑉 a set of all items, our core task is to find a real-valued scalar function 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) where 𝑢 ∈ 𝑈 and 𝑣 ∈ 𝑉 . To provide a hard label or multi-label, 𝑘 subjects vote with their encoded scores. Hence, we've reduced our problem into a score prediction problem. For each user 𝑢 ∈ 𝑈 , let 𝑢 ∈ R D for its 𝐷-dimensional embedding. For each item 𝑣 ∈ 𝑉 , let 𝑣 ∈ R 𝐷 be its 𝐷-dimensional embedding. So, 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) ≡ 𝑓 : R 𝐷 × R 𝐷 → R. In this approach, memes and annotators are transformed into the same embedding space using CLIP. Specifically, user demographics such as age, gender, and interests are encoded with a phrase such as "A female aged 18-22, of Hispanic or Latino ethnicity, with a high school degree or equivalent, and located in Mexico" into one CLIP embedding. In contrast, the meme, which may include both image and text components, is encoded into another CLIP embedding. These embeddings capture the nuanced features of both the user and the meme content. We then concatenate these two embeddings into a single embedding that represents the combined features of the user and the meme.</p><p>For instance, the table 3 illustrates a complete utility matrix for Task 4 with known score entries 𝑓 (𝑢, 𝑣) where 0 represents the label "NO" and 1 represents the label "YES". Our encoding method ignores "UNKNOWN" labels, but other encodings are possible. In this case, the voting policy is the selection of the class annotated by more than 3 subjects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>An example of utility matrix for the task 4</p><formula xml:id="formula_0">𝑉1 𝑉2 𝑉3 𝑈1 1 0 1 𝑈2 0 1 1 𝑈3 1 0 0 𝑈4 1 1 1 𝑈5 0 0 0 𝑈6 1 0 0 𝑉 𝑜𝑡𝑖𝑛𝑔 1 0 Undefined 𝐿𝑎𝑏𝑒𝑙 YES NO</formula><p>For Task 5, the 𝑠𝑐𝑜𝑟𝑒 function is encoded similarly to task 4, with the addition of a voting policy and a method to define similarity to hard labels. The voting policy is the arithmetic mean of votes 𝑠𝑐𝑜𝑟𝑒 which entails us into the encoding to predict the hard label as follows 𝑠𝑐𝑜𝑟𝑒 ∈ [0, 0.67] =⇒ No 𝑠𝑐𝑜𝑟𝑒 ∈ (0.67, 1.34] =⇒ Direct 𝑠𝑐𝑜𝑟𝑒 ∈ (1.34, 2] =⇒ Judgemental We apply softmax over the votes to find the probabilities, thus solving the soft-soft task.</p><p>For Task 6, the different combinations are encoded into a compact bit set as follows: each 𝑙𝑎𝑏𝑒𝑙 𝑖 is a bit 2 𝑖 where 𝑖 ≥ 0. The union gives us the bit set across the different combinations. We provide an example below:</p><formula xml:id="formula_1">𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) = 0𝑏000001 =⇒ - 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) = 0𝑏000010 =⇒ IDEOLOGICAL-INEQUALITY 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) = 0𝑏000100 =⇒ MISOGYNY-NON-SEXUAL-VIOLENCE 𝑠𝑐𝑜𝑟𝑒(𝑢, 𝑣) = 0𝑏00010|0𝑏00001 = 0𝑏00011 =⇒ -, IDEOLOGICAL-INEQUALITY</formula><p>Similarly to Task 5, we count the number of common bits and apply softmax to find the probability distribution.</p><p>We've defined how to decode 𝑠𝑐𝑜𝑟𝑒 to solve the tasks, but how can we learn 𝑠𝑐𝑜𝑟𝑒 from annotator and memes CLIP embeddings? Different embedding-based models include memory-based CF, model-based CF, Neighborhood methods, Neural Graph Collaborative Filtering, Factorization Machines <ref type="bibr" target="#b5">[8]</ref>, and GCN-based CF. Among those, the Factorization Machine models stands out for being efficient and accurate, enabling it to effectively predict the score <ref type="bibr" target="#b6">[9]</ref> from concatenated embedding. Figure <ref type="figure" target="#fig_8">8</ref> shows how well this approach performs on our validation dataset after 10 runs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Outcomes of the Evaluation Phase</head><p>Table <ref type="table" target="#tab_7">4</ref> presents the combined results for both English and Spanish submissions in the sexism detection challenge across six different tasks. Each task involves several runs evaluated using two metrics: Hard-Hard and Soft-Soft. Below, we describe the results, focusing on the best runs for each task. For Task 1 (Tweets), the best performance was achieved by running MMICI_3, which ranked 17th in the Hard-Hard metric with an ICM-Hard Norm of 0.7676, and an F1 score of 0.7637. In the Soft-Soft metric, this run ranked 21st with an ICM-Soft Norm of 0.5736, indicating it was the most effective in both metrics for this task.</p><p>For Task 4 (Memes), run MMICI_2 excelled, ranking 8th in the Hard-Hard metric with an ICM-Hard Norm of 0.5515, and an F1 score of 0.7261. For Task 5, the top run was MMICI_1, which ranked 7th in the Hard-Hard metric with an ICM-Hard Norm of 0.3934, and an F1 score of 0.4179. In the Soft-Soft metric, this run performed even better, ranking 2nd with an ICM-Soft Norm of 0.3654, making it the most effective in both categories. Lastly, for Task 6, the best run was MMICI_1, which ranked 3rd in the Hard-Hard metric with an ICM-Hard Norm of 0.2954, and an F1 score of 0.4342. The results for the Spanish submissions are showcased in Table <ref type="table" target="#tab_8">5</ref>. Hereafter, we delve into these outcomes, centering our attention on the most successful executions for each task. For Task 1, the best performance was achieved by running MMICI_3, which ranked 10th in the Hard-Hard metric with an ICM-Hard Norm of 0.7802, and an F1 score of 0.7892. In Task 2, the best run was MMICI_1, ranking 15th in the Hard-Hard metric with an ICM-Hard Norm of 0.5522, and an F1 score of 0.5133. For Task 3, the top run was MMICI_1, ranking 13th in the Hard-Hard metric with an ICM-Hard Norm of 0.4586, and an F1 score of 0.5486. In Task 4, run MMICI_2 excelled, ranking 14th in the Hard-Hard metric with an ICM-Hard Norm of 0.4900, and an F1 score of 0.6997. For Task 5, the top run was MMICI_1, which ranked 7th in the Hard-Hard metric with an ICM-Hard Norm of 0.3945, and an F1 score of 0.4198. In the Soft-Soft metric, this run performed even better, ranking 1st with an ICM-Soft Norm of 0.3461, making it the best-performing in both categories. Lastly, in Task 6, the best run was MMICI_1, which ranked 4th in the Hard-Hard metric with ICM-Hard Norm of 0.2473, and an F1 score of 0.3868. The outcomes for the English submissions are outlined in Table <ref type="table" target="#tab_9">6</ref>. We elaborate on these results, specifically highlighting the top performances for each task. In Task 4, run MMICI_2 excelled, ranking 3rd in the Hard-Hard metric with an ICM-Hard Norm of 0.6129, and an F1 score of 0.7559. For Task 5, the top run was MMICI_3, which ranked 1st in the Hard-Hard metric with an ICM-Hard Norm of 0.4413, and an F1 score of 0.4094. Lastly, in Task 6, the best run was MMICI_1, which ranked 2nd in the Hard-Hard metric with an ICM-Hard Norm of 0.3419, and an F1 score of 0.4726.</p><p>The strong results achieved with memes can be attributed to the use of CLIP (Contrastive Language-Image Pre-training) embeddings. CLIP effectively learns visual concepts from natural language descriptions, aligning images and text within a shared embedding space. This alignment is achieved by training on a vast dataset of images paired with their corresponding textual descriptions, enabling the model to understand and relate visual and textual information seamlessly. Using CLIP, Vision Transformers can be employed for image encoding and Text Transformers for text encoding, resulting in a unified model that excels in multi-modal tasks. The Vision Transformer processes the image data, while the Text Transformer processes the text data. Both sets of embeddings are then projected into a common space where their similarities can be measured and aligned, allowing the model to leverage the strengths of both visual and textual information effectively. This approach enabled the extraction of sexist expressions from memes in the dataset across both languages. By transferring the representation to the textual domain, it became possible to adopt state-of-the-art techniques for the classification tasks.</p><p>In summary, the combined analysis of English and Spanish submissions in the sexism detection challenge illuminates diverse approaches and performances across tasks. Each language cohort showcased distinct strengths, with notable runs such as MMICI_1 and MMICI_3 consistently demonstrating effectiveness across multiple tasks. These results underscore the complexity of sexism detection and highlight the importance of multilingual evaluation frameworks. Further exploration and refinement of these methodologies promise continued advancements in combating bias and fostering inclusivity in online content. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>This paper has detailed MMICI's participation in the EXIST shared task at CLEF 2024, focusing on the detection and categorization of sexism in social media content. By leveraging various innovative methodologies, including ensemble approaches that incorporate diverse annotator profiles and multimodal embeddings, our models have demonstrated substantial efficacy in identifying and understanding sexism in both tweets and memes. The results of our evaluation phase reveal that our ensemble methods, particularly those combining annotator profiles with text and image embeddings, achieve robust performance across multiple tasks. Specifically, our runs have shown competitive results in detecting sexism, discerning the intent behind sexist content, and categorizing different types of sexism. For instance, the ensemble approaches used in Runs 1 and 2 consistently outperformed traditional majority voting methods, highlighting the value of integrating diverse perspectives in addressing complex subjective tasks like sexism detection. Our approach emphasizes the importance of considering individual annotator characteristics, such as gender and age, to ensure that our models capture a wide range of viewpoints and avoid silencing minority voices. In most tasks, our baseline strategy performed the best. However, for tasks 2 and 3 in Spanish, our ensembles surpassed the baseline by capturing a broader range of perspectives. This nuanced understanding of sexism, facilitated by advanced machine learning techniques and diverse data representation, is crucial for effectively combating sexist behaviors and discourses online.</p><p>In related work, there is significant potential in exploring additional data collected on annotators in the EXIST 2024 dataset, including their ethnicities, study levels, and countries of origin, to enhance the cross-lingual and cross-cultural analysis capabilities of sexism detection systems. Developing models that effectively handle multiple languages and cultural contexts, possibly through cross-lingual transfer learning and the creation of culturally nuanced models, would improve global applicability. Additionally, further exploration of Transformer-based models and the creation of ensembles can leverage their strengths to improve detection accuracy. Expanding the dataset to include more diverse and underrepresented demographic groups would also contribute to building more robust and generalizable models. This could involve collecting additional annotated data from various social media platforms and cultural contexts. Moreover, improving multimodal techniques by leveraging advanced neural network architectures and incorporating additional features can further enhance model performance in detecting sexism.</p><p>Overall, our participation in the EXIST task underscores the potential of advanced ensemble methods and multimodal analysis in improving the detection and categorization of sexism in social media. These methods not only enhance the accuracy of automatic tools but also contribute to a deeper understanding of how sexism manifests in various forms, thereby supporting broader efforts to promote gender equity and reduce discrimination in digital spaces.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the proposal for Sexism Detection in EXIST 2024.</figDesc><graphic coords="5,72.00,65.61,451.29,214.82" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Leveraging Annotator Consensus and Profiles for Sexism Detection in Tweets.</figDesc><graphic coords="6,72.00,65.60,451.26,149.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 5</head><label>5</label><figDesc>displays the F1 score for the positive case (sexism) across 10 runs for each model in Task 4. The results indicate that the "Text|Image" model and the "Text|Image &amp; Text|Image General" model (a) Task 1: Sexism Identification in Tweets. (b) Task 2: Source Intention in Tweets. (c) Task 3: Sexism Categorization in Tweets.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Classification results for Sexism Detection in Tweets</figDesc><graphic coords="8,101.33,505.36,392.62,195.15" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Leveraging Annotator Consensus and Profiles for Sexism Detection in Memes.</figDesc><graphic coords="9,72.00,65.60,451.28,155.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Classification results of different approaches for task 4.</figDesc><graphic coords="9,94.57,443.52,406.16,146.51" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Classification results of different approaches for task 5.</figDesc><graphic coords="10,162.25,65.61,270.77,122.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Classification results of different approaches for task 6.</figDesc><graphic coords="10,150.98,383.26,293.33,133.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: F1 score of hard-hard task 4, 5, 6 employing Collaborative Filtering.</figDesc><graphic coords="12,139.69,65.61,315.90,103.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>• Task 3: Sexism Categorization in Tweets focuses</head><label></label><figDesc></figDesc><table /><note>on classifying sexist tweets into specific categories such as ideological and inequality, stereotyping and dominance, objectification, sexual violence, misogyny, and non-sexual violence.• Task 4: Sexism Identification in Memes is similar to Task 1 but applied to memes, determining whether a meme is sexist. • Task</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>5: Source Intention in Memes mirrors</head><label></label><figDesc>Task 2 but for memes, categorizing them based on the author's intention, either direct or judgmental. •</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Task 6: Sexism Categorization in Memes parallels</head><label></label><figDesc></figDesc><table /><note>Task 3, classifying sexist memes into the same categories as tweets.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 1 :</head><label>1</label><figDesc>Examples of Tweets and Memes from the dataset EXIST 2024</figDesc><table><row><cell>Task</cell><cell>Label</cell><cell>Example 1 (Spanish)</cell><cell>Example 2 (English)</cell></row><row><cell>TASK 1: Sexism</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Identification in</cell><cell></cell><cell></cell><cell></cell></row><row><cell>Tweets</cell><cell></cell><cell></cell><cell></cell></row></table><note>SexistMujer al volante, tenga cuidado! People really try to convince women with little to no ass that they should go out and buy a body. Like bih, I don't need a fat ass to get a man. Never have.Continued on next page</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 1 -</head><label>1</label><figDesc>Continued from previous page</figDesc><table><row><cell>Task</cell><cell>Label</cell><cell>Example 1 (ES)</cell><cell>Example 2 (EN)</cell></row><row><cell></cell><cell>Not Sexist</cell><cell>Alguien me explica que zorra hace la gente en el cajero que se demora tanto.</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>TASK 4: Sexism Identification in Memes Sexist Not Sexist TASK 5: Source Intention in Memes Direct Judgemental Continued on next pageTable 1 -Continued from previous page Task Label Example 1 (ES) Example 2 (EN) TASK 6: Sexism Categorization in Memes</head><label>1</label><figDesc></figDesc><table><row><cell>Ideological and</cell><cell></cell><cell></cell></row><row><cell>Inequality</cell><cell></cell><cell></cell></row><row><cell>Stereotyping</cell><cell></cell><cell></cell></row><row><cell>and Dominance</cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>Don't get married than blame all woman for</cell></row><row><cell>Objectification</cell><cell></cell><cell>your poor investment. You should of got a hooker but instead you choose to go get a wed-</cell></row><row><cell></cell><cell></cell><cell>ding ring.</cell></row><row><cell>Sexual Violence</cell><cell>#MeToo Estas 4 no han conseguido su objetivo. El juez estima que se abrieron de patas</cell><cell>Fuck that cunt, I would with my fist.</cell></row><row><cell>Misogyny and Non-Sexual Vio-lence</cell><cell>Las mujeres de hoy en dia te enseñar a querer. . . estar soltero</cell><cell>Some woman are so toxic they don't even know they are draining everyone around them in poison. If you lack self awareness you won't even notice how toxic you really are.</cell></row><row><cell>Sexual Violence</cell><cell></cell><cell></cell></row><row><cell>Misogyny and</cell><cell></cell><cell></cell></row><row><cell>Non-Sexual Vio-</cell><cell></cell><cell></cell></row><row><cell>lence</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 2</head><label>2</label><figDesc>Features of different models for the task 4.</figDesc><table><row><cell>Model Name</cell><cell cols="5">Annotator Features Text Embeddings Image Embbedings Early Fusion Late Fusion</cell></row><row><cell>Text</cell><cell>Yes</cell><cell>Yes</cell><cell>No</cell><cell>N/A</cell><cell>N/A</cell></row><row><cell>Image</cell><cell>Yes</cell><cell>No</cell><cell>Yes</cell><cell>N/A</cell><cell>N/A</cell></row><row><cell>Text+Image</cell><cell>Yes</cell><cell>Yes</cell><cell>Yes</cell><cell>Yes</cell><cell>No</cell></row><row><cell>Text General</cell><cell>No</cell><cell>Yes</cell><cell>No</cell><cell>N/A</cell><cell>N/A</cell></row><row><cell>Image General</cell><cell>No</cell><cell>No</cell><cell>Yes</cell><cell>N/A</cell><cell>N/A</cell></row><row><cell>Text+Image General</cell><cell>No</cell><cell>Yes</cell><cell>Yes</cell><cell>Yes</cell><cell>No</cell></row><row><cell>Text|Image</cell><cell>Yes</cell><cell>Yes</cell><cell>Yes</cell><cell>No</cell><cell>Yes</cell></row><row><cell>Text|Image &amp; Text|Image General</cell><cell>Yes&amp;No</cell><cell>Yes</cell><cell>Yes</cell><cell>No</cell><cell>Yes</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 4</head><label>4</label><figDesc>Results of Submission on Leaderboard for both Spanish and English (ALL)</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell cols="2">Hard-Hard</cell><cell></cell><cell></cell><cell>Soft-Soft</cell></row><row><cell>Task</cell><cell>Run</cell><cell cols="3">Ranking ICM-Hard ICM-Hard Norm</cell><cell>F1</cell><cell cols="3">Ranking ICM-Soft ICM-Soft Norm</cell></row><row><cell cols="2">Task1 MMICI_1</cell><cell>31</cell><cell>0.4705</cell><cell>0.7365</cell><cell>0.7455</cell><cell>29</cell><cell>-0.3394</cell><cell>0.4456</cell></row><row><cell cols="2">Task1 MMICI_2</cell><cell>28</cell><cell>0.4780</cell><cell>0.7402</cell><cell>0.7460</cell><cell>30</cell><cell>-0.3622</cell><cell>0.4419</cell></row><row><cell cols="2">Task1 MMICI_3</cell><cell>17</cell><cell>0.5324</cell><cell>0.7676</cell><cell>0.7637</cell><cell>21</cell><cell>0.4589</cell><cell>0.5736</cell></row><row><cell cols="2">Task2 MMICI_1</cell><cell>27</cell><cell>-0.0987</cell><cell>0.4679</cell><cell>0.4548</cell><cell>24</cell><cell>-4.5753</cell><cell>0.1314</cell></row><row><cell cols="2">Task2 MMICI_2</cell><cell>32</cell><cell>-0.2406</cell><cell>0.4218</cell><cell>0.4383</cell><cell>25</cell><cell>-4.6285</cell><cell>0.1271</cell></row><row><cell cols="2">Task2 MMICI_3</cell><cell>28</cell><cell>-0.1076</cell><cell>0.4650</cell><cell>0.4525</cell><cell>20</cell><cell>-3.6350</cell><cell>0.2071</cell></row><row><cell cols="2">Task3 MMICI_1</cell><cell>27</cell><cell>-1.4509</cell><cell>0.1631</cell><cell>0.4026</cell><cell>22</cell><cell>-7.9356</cell><cell>0.0809</cell></row><row><cell cols="2">Task3 MMICI_2</cell><cell>28</cell><cell>-1.5003</cell><cell>0.1516</cell><cell>0.4017</cell><cell>23</cell><cell>-7.9380</cell><cell>0.0808</cell></row><row><cell cols="2">Task3 MMICI_3</cell><cell>23</cell><cell>-0.8105</cell><cell>0.3118</cell><cell>0.4805</cell><cell>20</cell><cell>-7.6413</cell><cell>0.0965</cell></row><row><cell cols="2">Task4 MMICI_1</cell><cell>12</cell><cell>0.0751</cell><cell>0.5382</cell><cell>0.7202</cell><cell>17</cell><cell>-0.6189</cell><cell>0.4005</cell></row><row><cell cols="2">Task4 MMICI_2</cell><cell>8</cell><cell>0.1014</cell><cell>0.5515</cell><cell>0.7261</cell><cell>16</cell><cell>-0.6183</cell><cell>0.4006</cell></row><row><cell cols="2">Task4 MMICI_3</cell><cell>24</cell><cell>-0.0361</cell><cell>0.4816</cell><cell>0.6781</cell><cell>19</cell><cell>-0.6410</cell><cell>0.3970</cell></row><row><cell cols="2">Task5 MMICI_1</cell><cell>7</cell><cell>-0.3066</cell><cell>0.3934</cell><cell>0.4179</cell><cell>2</cell><cell>-1.2660</cell><cell>0.3654</cell></row><row><cell cols="2">Task5 MMICI_2</cell><cell>10</cell><cell>-0.3868</cell><cell>0.3655</cell><cell>0.3770</cell><cell>3</cell><cell>-1.3738</cell><cell>0.3539</cell></row><row><cell cols="2">Task5 MMICI_3</cell><cell>8</cell><cell>-0.3297</cell><cell>0.3854</cell><cell>0.3814</cell><cell>13</cell><cell>-3.4751</cell><cell>0.1304</cell></row><row><cell cols="2">Task6 MMICI_1</cell><cell>3</cell><cell>-0.9863</cell><cell>0.2954</cell><cell>0.4342</cell><cell>19</cell><cell>-16.1248</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_2</cell><cell>7</cell><cell>-1.3446</cell><cell>0.2210</cell><cell>0.4453</cell><cell>20</cell><cell>-19.3246</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_3</cell><cell>24</cell><cell>-3.8341</cell><cell>0.0000</cell><cell>0.2347</cell><cell>21</cell><cell>-45.0237</cell><cell>0.0000</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 5</head><label>5</label><figDesc>Results of Submission on Leaderboard for Spanish</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell cols="2">Hard-Hard</cell><cell></cell><cell></cell><cell>Soft-Soft</cell></row><row><cell>Task</cell><cell>Run</cell><cell cols="3">Ranking ICM-Hard ICM-Hard Norm</cell><cell>F1</cell><cell cols="3">Ranking ICM-Soft ICM-Soft Norm</cell></row><row><cell cols="2">Task1 MMICI_1</cell><cell>16</cell><cell>0.5323</cell><cell>0.7662</cell><cell>0.7817</cell><cell>24</cell><cell>0.0894</cell><cell>0.5143</cell></row><row><cell cols="2">Task1 MMICI_2</cell><cell>22</cell><cell>0.5007</cell><cell>0.7504</cell><cell>0.7705</cell><cell>25</cell><cell>0.0170</cell><cell>0.5027</cell></row><row><cell cols="2">Task1 MMICI_3</cell><cell>10</cell><cell>0.5603</cell><cell>0.7802</cell><cell>0.7892</cell><cell>15</cell><cell>0.6706</cell><cell>0.6076</cell></row><row><cell cols="2">Task2 MMICI_1</cell><cell>15</cell><cell>0.1670</cell><cell>0.5522</cell><cell>0.5133</cell><cell>23</cell><cell>-4.1728</cell><cell>0.1658</cell></row><row><cell cols="2">Task2 MMICI_2</cell><cell>26</cell><cell>0.0064</cell><cell>0.5020</cell><cell>0.4933</cell><cell>24</cell><cell>-4.2127</cell><cell>0.1626</cell></row><row><cell cols="2">Task2 MMICI_3</cell><cell>29</cell><cell>-0.1146</cell><cell>0.4642</cell><cell>0.4779</cell><cell>20</cell><cell>-3.4962</cell><cell>0.2200</cell></row><row><cell cols="2">Task3 MMICI_1</cell><cell>13</cell><cell>-0.1853</cell><cell>0.4586</cell><cell>0.5486</cell><cell>24</cell><cell>-7.8261</cell><cell>0.0927</cell></row><row><cell cols="2">Task3 MMICI_2</cell><cell>14</cell><cell>-0.2269</cell><cell>0.4493</cell><cell>0.5446</cell><cell>25</cell><cell>-7.8356</cell><cell>0.0922</cell></row><row><cell cols="2">Task3 MMICI_3</cell><cell>22</cell><cell>-0.5870</cell><cell>0.3689</cell><cell>0.5165</cell><cell>22</cell><cell>-7.4291</cell><cell>0.1134</cell></row><row><cell cols="2">Task4 MMICI_1</cell><cell>17</cell><cell>-0.0591</cell><cell>0.4699</cell><cell>0.6906</cell><cell>14</cell><cell>-0.6655</cell><cell>0.3939</cell></row><row><cell cols="2">Task4 MMICI_2</cell><cell>14</cell><cell>-0.0196</cell><cell>0.4900</cell><cell>0.6997</cell><cell>15</cell><cell>-0.6689</cell><cell>0.3933</cell></row><row><cell cols="2">Task4 MMICI_3</cell><cell>26</cell><cell>-0.1848</cell><cell>0.4059</cell><cell>0.6470</cell><cell>18</cell><cell>-0.8361</cell><cell>0.3667</cell></row><row><cell cols="2">Task5 MMICI_1</cell><cell>7</cell><cell>-0.3028</cell><cell>0.3945</cell><cell>0.4198</cell><cell>1</cell><cell>-1.4813</cell><cell>0.3461</cell></row><row><cell cols="2">Task5 MMICI_2</cell><cell>9</cell><cell>-0.4077</cell><cell>0.3580</cell><cell>0.3728</cell><cell>2</cell><cell>-1.5486</cell><cell>0.3392</cell></row><row><cell cols="2">Task5 MMICI_3</cell><cell>10</cell><cell>-0.4875</cell><cell>0.3302</cell><cell>0.3545</cell><cell>13</cell><cell>-4.0400</cell><cell>0.0804</cell></row><row><cell cols="2">Task6 MMICI_1</cell><cell>4</cell><cell>-1.2346</cell><cell>0.2473</cell><cell>0.3868</cell><cell>18</cell><cell>-14.9495</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_2</cell><cell>13</cell><cell>-1.6925</cell><cell>0.1536</cell><cell>0.4141</cell><cell>20</cell><cell>-18.0902</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_3</cell><cell>24</cell><cell>-3.8686</cell><cell>0.0000</cell><cell>0.2225</cell><cell>21</cell><cell>-42.6540</cell><cell>0.0000</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 6</head><label>6</label><figDesc>Results of Submission on Leaderboard for English</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell cols="2">Hard-Hard</cell><cell></cell><cell></cell><cell>Soft-Soft</cell></row><row><cell>Task</cell><cell>Run</cell><cell cols="3">Ranking ICM-Hard ICM-Hard Norm</cell><cell>F1</cell><cell cols="3">Ranking ICM-Soft ICM-Soft Norm</cell></row><row><cell cols="2">Task1 MMICI_1</cell><cell>40</cell><cell>0.3840</cell><cell>0.6960</cell><cell>0.6971</cell><cell>32</cell><cell>-0.8805</cell><cell>0.3586</cell></row><row><cell cols="2">Task1 MMICI_2</cell><cell>33</cell><cell>0.4402</cell><cell>0.7246</cell><cell>0.7141</cell><cell>31</cell><cell>-0.8349</cell><cell>0.3659</cell></row><row><cell cols="2">Task1 MMICI_3</cell><cell>25</cell><cell>0.4912</cell><cell>0.7507</cell><cell>0.7315</cell><cell>21</cell><cell>0.1413</cell><cell>0.5227</cell></row><row><cell cols="2">Task2 MMICI_1</cell><cell>33</cell><cell>-0.4572</cell><cell>0.3418</cell><cell>0.3680</cell><cell>23</cell><cell>-5.0641</cell><cell>0.0861</cell></row><row><cell cols="2">Task2 MMICI_2</cell><cell>36</cell><cell>-0.5728</cell><cell>0.3018</cell><cell>0.3570</cell><cell>24</cell><cell>-5.1264</cell><cell>0.0810</cell></row><row><cell cols="2">Task2 MMICI_3</cell><cell>30</cell><cell>-0.1384</cell><cell>0.4521</cell><cell>0.4087</cell><cell>19</cell><cell>-3.8024</cell><cell>0.1892</cell></row><row><cell cols="2">Task3 MMICI_1</cell><cell>32</cell><cell>-2.8962</cell><cell>0.0000</cell><cell>0.2357</cell><cell>22</cell><cell>-7.9094</cell><cell>0.0666</cell></row><row><cell cols="2">Task3 MMICI_2</cell><cell>34</cell><cell>-2.9573</cell><cell>0.0000</cell><cell>0.2373</cell><cell>21</cell><cell>-7.9059</cell><cell>0.0668</cell></row><row><cell cols="2">Task3 MMICI_3</cell><cell>26</cell><cell>-1.1024</cell><cell>0.2298</cell><cell>0.4287</cell><cell>19</cell><cell>-7.7476</cell><cell>0.0755</cell></row><row><cell cols="2">Task4 MMICI_1</cell><cell>5</cell><cell>0.2094</cell><cell>0.6063</cell><cell>0.7538</cell><cell>20</cell><cell>-0.5779</cell><cell>0.4062</cell></row><row><cell cols="2">Task4 MMICI_2</cell><cell>3</cell><cell>0.2224</cell><cell>0.6129</cell><cell>0.7559</cell><cell>19</cell><cell>-0.5735</cell><cell>0.4069</cell></row><row><cell cols="2">Task4 MMICI_3</cell><cell>18</cell><cell>0.1131</cell><cell>0.5574</cell><cell>0.7122</cell><cell>17</cell><cell>-0.4621</cell><cell>0.4250</cell></row><row><cell cols="2">Task5 MMICI_1</cell><cell>6</cell><cell>-0.3112</cell><cell>0.3920</cell><cell>0.4156</cell><cell>2</cell><cell>-1.1089</cell><cell>0.3790</cell></row><row><cell cols="2">Task5 MMICI_2</cell><cell>8</cell><cell>-0.3657</cell><cell>0.3731</cell><cell>0.3815</cell><cell>3</cell><cell>-1.2447</cell><cell>0.3642</cell></row><row><cell cols="2">Task5 MMICI_3</cell><cell>1</cell><cell>-0.1691</cell><cell>0.4413</cell><cell>0.4094</cell><cell>13</cell><cell>-2.9704</cell><cell>0.1760</cell></row><row><cell cols="2">Task6 MMICI_1</cell><cell>2</cell><cell>-0.7441</cell><cell>0.3419</cell><cell>0.4726</cell><cell>15</cell><cell>-18.3643</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_2</cell><cell>7</cell><cell>-1.0095</cell><cell>0.2855</cell><cell>0.4752</cell><cell>16</cell><cell>-21.6764</cell><cell>0.0000</cell></row><row><cell cols="2">Task6 MMICI_3</cell><cell>20</cell><cell>-3.8687</cell><cell>0.0000</cell><cell>0.2447</cell><cell>17</cell><cell>-49.2040</cell><cell>0.0000</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially supported by CONAHCYT (The National Council of Humanities, Sciences, and Technologies of Mexico), which promotes scientific and technological development in the country.</p><p>Additionally, we acknowledge the support provided through the following scholarships: Martha Paola Jimenez-Martinez (scholarship number 828539) and Joan Manuel Raygoza-Romero (scholarship number 806073).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Comisión Nacional para Prevenir y Erradicar la Violencia Contra las Mujeres , ¿qué es el lenguaje sexista y por qué es importante visibilizarlo?</title>
		<ptr target="Https://www.gob.mx/conavim/articulos/que-es-el-lenguaje-sexista-y-por-que-es-importante-visibilizarlo?idiom=es" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Ambivalent sexism</title>
		<author>
			<persName><forename type="first">P</forename><surname>Glick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Fiske</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in experimental social psychology</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="115" to="188" />
			<date type="published" when="2001">2001</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Detecting misogyny in spanish tweets. an approach based on linguistics features and word embeddings</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>García-Díaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cánovas-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Colomo-Palacios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Valencia-García</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Future Generation Computer Systems</title>
		<imprint>
			<biblScope unit="volume">114</biblScope>
			<biblScope unit="page" from="506" to="518" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Akhtar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Patti</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.15896</idno>
		<title level="m">Whose opinions matter? perspective-aware models to identify opinions of hate speech victims in abusive language detection</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Factorization machines for data with implicit feedback</title>
		<author>
			<persName><forename type="first">B</forename><surname>Loni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Larson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanjalic</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:56517380" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Factorization machines</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rendle</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICDM.2010.127</idno>
	</analytic>
	<monogr>
		<title level="m">2010 IEEE International Conference on Data Mining</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="995" to="1000" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
