<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">VerbaNex AI at CLEF EXIST 2024: Detection of Online Sexism using Transformer Models and Profiling Techniques ⋆ Notebook for the VerbaNex AI Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Elizabeth</forename><surname>Martinez</surname></persName>
							<email>jcmartinezs@utb.edu.co</email>
							<affiliation key="aff0">
								<orgName type="department">School of Engineering</orgName>
								<orgName type="institution">Universidad Tecnológica de Bolívar</orgName>
								<address>
									<addrLine>Cartagena de Indias</addrLine>
									<postCode>130010</postCode>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Juan</forename><surname>Cuadrado</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Engineering</orgName>
								<orgName type="institution">Universidad Tecnológica de Bolívar</orgName>
								<address>
									<addrLine>Cartagena de Indias</addrLine>
									<postCode>130010</postCode>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Juan</forename><surname>Carlos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Engineering</orgName>
								<orgName type="institution">Universidad Tecnológica de Bolívar</orgName>
								<address>
									<addrLine>Cartagena de Indias</addrLine>
									<postCode>130010</postCode>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Martinez</forename><surname>Santos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Engineering</orgName>
								<orgName type="institution">Universidad Tecnológica de Bolívar</orgName>
								<address>
									<addrLine>Cartagena de Indias</addrLine>
									<postCode>130010</postCode>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Edwin</forename><surname>Puertas</surname></persName>
							<email>epuerta@utb.edu.co</email>
							<affiliation key="aff0">
								<orgName type="department">School of Engineering</orgName>
								<orgName type="institution">Universidad Tecnológica de Bolívar</orgName>
								<address>
									<addrLine>Cartagena de Indias</addrLine>
									<postCode>130010</postCode>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">VerbaNex AI at CLEF EXIST 2024: Detection of Online Sexism using Transformer Models and Profiling Techniques ⋆ Notebook for the VerbaNex AI Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B932E7D25BFC0059B9F8AF4D5377C021</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:01+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Online Sexism Detection</term>
					<term>Profiling Techniques</term>
					<term>Natural Language Processing</term>
					<term>Social Media Analysis</term>
					<term>Binary Classification</term>
					<term>Transformer Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The integration of social networks into modern life has revolutionized global communication, allowing instantaneous interaction. However, this convenience has also been misused, leading to the proliferation of inappropriate and often sexist remarks on social media. To address this, the field of natural language processing has been developing techniques to identify and mitigate such content. Our research, conducted as part of the CLEF EXIST 2024 competition, introduces a novel approach. We combined features from the 'twitter-roberta-base-sentiment-latest' transformer model with traditional lexical elements and profiling. The profiling involved grouping profiles by gender, age, and education level. Then, we categorized them based on their positive response rate to sexism and trained classifiers accordingly. This method was evaluated using the testing profiles, achieving an F1 score of 0.745. In the evaluation phase, our approach yielded an F1 score of 0.63. The effective combination of linguistic, transformer-based features and profiling was crucial to achieving these results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the modern era, social media has become an integral part of daily life, captivating nearly 80% of individuals through its ubiquitous digital platforms <ref type="bibr" target="#b0">[1]</ref>. Social media facilitates communication among citizens, corporations, and governments, making its impact far-reaching and undeniable. However, this digital space has also seen a troubling rise in hate speech, particularly manifesting as genderbased inequities and injustices that disproportionately affect women <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. The prevalence of sexist content exacerbates feelings of vulnerability and insecurity among women, both in online and offline environments <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>Addressing this issue, our research is part of the CLEF EXIST 2024 competition <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>, focusing on Task 1, which involves identifying sexist expressions and behaviors in tweets and memes. Task 1 goal intent to develop effective techniques to detect and mitigate online sexism. In our approach, we combine features from the 'twitter-roberta-base-sentiment-latest' transformer model with traditional lexical elements and profiling techniques. Profiling involves grouping users based on gender, age, and education level, and further categorizing them according to their positive response rates to sexism. We then train classifiers based on these groups to enhance detection accuracy.</p><p>Our method includes rigorous pre-processing, the integration of lexical and transformer-based feature extraction, and the application of profiling techniques. This comprehensive strategy is evaluated using the testing profiles, achieving an F1 score of 0.745. During the evaluation phase, our approach yielded an F1 score of 0.63. The integration of these diverse features and profiling techniques demonstrates the potential to significantly improve the detection of online sexism.</p><p>The following sections will detail our methodology, including the pre-processing steps, feature extraction techniques, regularization methods, and the evaluation metrics used to assess our system's performance. Through this research, we aim to contribute to the broader efforts of combating online sexism, providing insights and tools that can be used to create a safer and more equitable digital environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Online sexism poses a significant problem, impacting women profoundly and creating a sense of insecurity both online and offline <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b11">12]</ref>. Addressing this issue necessitates the development of robust strategies to foster safer online environments while maintaining freedom of speech. In response to this need, several competitions and initiatives have emerged, focusing on the detection and mitigation of hate speech and sexism on social media platforms.</p><p>Competitions such as EVALITA and IberEval have been pivotal in this effort, leveraging diverse datasets from various social media platforms, including Twitter, Reddit, and Gab <ref type="bibr" target="#b12">[13]</ref>. These datasets are crucial for developing and evaluating models aimed at detecting online hate speech and sexism. For instance, datasets developed by Wasem and Hovy contain annotations for both sexist and racist content and have served as foundational resources for numerous studies.</p><p>Research efforts utilizing these datasets have explored various methodologies. For example, <ref type="bibr" target="#b13">[14]</ref> employed word vectors and contextual analysis to detect sexism and racism, using five Long Short-Term Memory (LSTM) networks as classifiers, achieving a precision of 0.9334. Similarly, <ref type="bibr" target="#b14">[15]</ref> combined LSTM networks with random embeddings to extract features for Gradient Boosting Decision Trees (GBDT), achieving a precision of 0.930.</p><p>The Student Research Workshop (SRW) dataset, highlighted in <ref type="bibr" target="#b15">[16]</ref>, focuses specifically on sexist hate speech. In this study, a combination of bag-of-words and sequential word features was used with a Support Vector Machine (SVM) classifier <ref type="bibr" target="#b16">[17]</ref>, resulting in an accuracy of 0.8932. Other studies have experimented with techniques such as sentence embeddings, term frequency-inverse document frequency (TF-IDF) <ref type="bibr" target="#b17">[18]</ref>, and bag-of-words (BoW) methods, though these approaches generally achieved lower accuracy, with a maximum of 0.704 <ref type="bibr" target="#b18">[19]</ref>.</p><p>The continuous development and release of these datasets through various competitions have facilitated significant advancements in the field. Recently, the integration of transformer models from libraries like Transformers has shown considerable promise in enhancing the detection of online sexism. These sophisticated models represent a critical advancement in improving the accuracy and efficiency of detection systems <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data</head><p>For the CLEF EXIST 2024 competition, we utilized the dataset provided by the organizers, focusing specifically on the identification of sexism in tweets for Task 1. This dataset builds on the EXIST 2023 dataset, incorporating both English and Spanish tweets. The dataset includes a curated lexicon of 250 terms indicative of sexist content. These terms were used to gather a comprehensive collection of over 10,000 annotated tweets, with a balanced representation of English and Spanish content. To achieve a balanced dataset, excessively imbalanced terms were discarded, resulting in approximately 5,000 tweets labeled as sexist and 5,000 tweets labeled as non-sexist, ensuring an even distribution for training and testing. Six annotators from the Prolific app, guided by experts in gender issues, labeled each tweet, considering gender and age to mitigate label bias. Additional demographic details such as education level, ethnicity, and country of residence were also included for the 2023 and 2024 datasets. A learning with disagreements approach was employed, providing all annotations per instance rather than aggregated labels, to capture a diversity of perspectives. In this study, we developed a comprehensive system to detect sexism in online text, specifically tweets. Our architecture integrates multiple stages of data processing, feature extraction, and regularization to enhance the detection accuracy. The following subsections provide a detailed overview of each component within our system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Architecture</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Pre-Processing</head><p>In the preprocessing stage, we aimed to standardize and clean the textual data to ensure consistency and clarity. Using the Natural Language Toolkit (NLTK) library <ref type="bibr" target="#b21">[22]</ref>, we performed a series of transformations on the text data. Hashtags were replaced with the term "hashtag, " and user mentions were substituted with "mention. " URLs within the text were replaced by the placeholder "URL, " and emojis were converted to their corresponding UTF-8 encoded descriptions, labeled as "emoji. " Following these substitutions, we further refined the text by removing punctuation, converting all characters to lowercase, and eliminating common stopwords to reduce noise and enhance the quality of the data for subsequent analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Profiling</head><p>In the provided dataset each message was annotated by six different individuals. Due to the even number of annotators, we sometimes faced situations where three annotators labeled a message as sexist and the other three labeled it as non-sexist, resulting in a tie. To address this, we implemented a profiling approach based on demographic factors: gender, education level, and age.</p><p>We grouped the annotators' responses according to these demographic profiles. For each message, we calculated the total number of responses per profile and the number of times a message was labeled as sexist. This allowed us to categorize the profiles into four groups based on their likelihood of labeling messages as sexist or non-sexist.</p><p>We analyzed these profiles to predict the probability of a message being labeled as sexist based on the annotators' demographic tendencies. This profiling approach helped us resolve ties and make more informed decisions regarding the classification of messages.</p><p>Following the profiling, we performed feature extraction and trained four distinct systems based on the grouped profiles and their responses. These systems were then evaluated to determine their effectiveness in accurately detecting sexist messages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Lexical Feature Extraction</head><p>To begin our analysis, we focused on extracting traditional lexical features to gain insights into the linguistic patterns present in the data. This process involved identifying various lexical elements as described by Puertas et al. <ref type="bibr" target="#b22">[23]</ref>. We categorized these features into 27 distinct groups, including word usage, hashtags, URLs, emojis, frequently used Part-of-Speech (POS) tags, adverbs, and adjectives. This comprehensive extraction allowed us to conduct a thorough examination of the corpus, providing a solid foundation for understanding the data's linguistic characteristics.</p><p>To enhance our approach, we integrated modern techniques by incorporating the Twitter-roBERTabase model specifically fine-tuned for sentiment analysis <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b20">21]</ref>. This variant of RoBERTa-base was trained on a large collection of tweets from January 2018 to December 2021 and evaluated using the TweetEval benchmark. By leveraging this model, we were able to extract sentiment-based features alongside traditional lexical features. The combination of these two feature sets-lexical and transformer-based-was achieved through concatenation, resulting in a robust and comprehensive feature representation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Transformer Integration</head><p>To further enhance our feature extraction process, we incorporated the Twitter-roBERTa-base model, specifically fine-tuned for sentiment analysis. This transformer model, a variant of RoBERTa-base, was trained on a large dataset of tweets collected from January 2018 to December 2021 and evaluated using the TweetEval benchmark. The primary advantage of using this model lies in its ability to capture nuanced sentiment features from the text, which are crucial for identifying subtle expressions of sexism.</p><p>The integration process involved several key steps. First, we preprocessed the text data as described in the Pre-Processing section, ensuring consistency and clarity. Next, we passed the cleaned text through the Twitter-roBERTa-base model to extract sentiment-based features. These features encapsulate the emotional tone and contextual sentiment of each tweet, providing a deeper understanding of the underlying sentiment patterns.</p><p>By combining these sentiment features with the traditional lexical features extracted earlier, we created a comprehensive feature set. This combined feature set was achieved through a concatenation process, where both sets of features were merged to form a unified representation. This approach allowed us to leverage the strengths of both traditional lexical analysis and modern transformer-based sentiment analysis.</p><p>The resulting feature representation was then used as input for our classification models. This hybrid approach not only improved the accuracy of sexism detection but also provided a richer and more nuanced understanding of the text data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Regularization</head><p>Regularization was an essential step in our methodology to ensure that our model performed well and was not prone to overfitting. First, we divided the dataset into training and validation sets to facilitate model training and evaluation. To address the issue of class imbalance, we employed techniques to generate synthetic instances, ensuring that both classes were adequately represented in the training data. Specifically, we used the K-Fold Stratified Shuffle-Split technique, as described by Sandoval et al. <ref type="bibr" target="#b24">[25]</ref>, to create multiple splits of the data. This approach allowed us to maintain the original distribution of classes within each fold, enhancing the robustness and generalizability of our model through effective cross-validation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation</head><p>The performance of our system was evaluated using two approaches: one without profiling and one with profiling. We assessed the system's effectiveness based on four key metrics: F1 score, precision, recall, and accuracy. The results for each approach during the training phase are presented in Table <ref type="table" target="#tab_0">1</ref>.</p><p>The results from the approach without profiling demonstrate a solid performance across all metrics. The higher precision and recall indicate that the system is capable of effectively identifying sexist messages, while maintaining a reasonable balance between false positives and false negatives. The results from the approach with profiling show a balanced performance, with precision slightly higher than recall. This suggests that the profiling method contributed to a consistent detection of sexist messages while maintaining a moderate false positive rate.</p><p>Overall, the training phase evaluation results from both approaches indicate that our system is capable of reliably detecting sexism in online text. While the metrics are satisfactory, they highlight areas for further improvement and refinement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Competition Evaluation</head><p>The final evaluation of our system was conducted during the CLEF EXIST 2024 competition, where the performance was assessed using the competition's official metrics. The results for both approaches are as follows: The approach without profiling achieved a position of 54 in the competition, with an ICM-Hard score of 0.1064, an ICM-Hard Norm score of 0.5532, and an F1_YES score of 0.6320. This indicates a relatively strong performance in detecting sexist messages.</p><p>The approach with profiling, on the other hand, achieved a position of 56, with an ICM-Hard score of 0.0390, an ICM-Hard Norm score of 0.5195, and an F1_YES score of 0.6221. While this approach showed slightly lower performance metrics, it still demonstrated the system's capability in the competition context.</p><p>Overall, the competition evaluation results suggest that both approaches have their strengths, with the non-profiling approach slightly outperforming the profiling approach. These results provide valuable insights for further refinement and optimization of our detection system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This study presented a novel approach to detecting online sexism in tweets by combining transformer models and profiling techniques. By integrating features from the 'twitter-roberta-base-sentimentlatest' model with traditional lexical elements, and grouping annotator profiles based on demographic factors such as gender, age, and education level, we aimed to improve the accuracy of sexism detection. Our methodology included rigorous pre-processing, comprehensive feature extraction, and robust regularization techniques to ensure the reliability of our model.</p><p>The evaluation results demonstrated that both approaches, with and without profiling, performed satisfactorily in the training phase, achieving F1 scores of 74.5 and 75.68, respectively. The competition results showed that the approach without profiling slightly outperformed the profiling approach, with higher scores in all metrics. While the metrics indicate a solid performance, there is still room for improvement, especially in refining the profiling techniques to better capture demographic biases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Future Work</head><p>Future work will focus on several areas to enhance the detection of online sexism. First, we plan to refine our profiling techniques by incorporating more detailed demographic data and exploring additional factors that may influence annotator biases. Second, we aim to experiment with other transformer models and fine-tune them specifically for sexism detection to improve the performance further.</p><p>Additionally, expanding the dataset to include a wider variety of social media platforms and languages could provide a more comprehensive understanding of online sexism. We also intend to investigate the use of advanced regularization techniques and ensemble methods to increase the robustness of our models.</p><p>Finally, collaborating with experts in gender studies and psychology could provide valuable insights into the nuances of sexist language, leading to more accurate and sensitive detection systems. By addressing these areas, we hope to contribute to the development of more effective tools for combating online sexism and promoting safer online environments.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: System Pipeline.</figDesc><graphic coords="3,83.28,99.57,428.70,137.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Training Phase Evaluation Metrics for Different Approaches</figDesc><table><row><cell>Metric</cell><cell cols="2">Without Profiling With Profiling</cell></row><row><cell>F1 Score</cell><cell>75.68</cell><cell>74.5</cell></row><row><cell>Precision</cell><cell>75.85</cell><cell>74.71</cell></row><row><cell>Recall</cell><cell>75.71</cell><cell>74.55</cell></row><row><cell>Accuracy</cell><cell>75.71</cell><cell>74.55</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Competition Evaluation Metrics for Different Approaches</figDesc><table><row><cell>Approach</cell><cell cols="4">Position ICM-Hard ICM-Hard Norm F1_YES</cell></row><row><cell>Without Profiling</cell><cell>54</cell><cell>0.1064</cell><cell>0.5532</cell><cell>0.6320</cell></row><row><cell>With Profiling</cell><cell>56</cell><cell>0.0390</cell><cell>0.5195</cell><cell>0.6221</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The authors would like to acknowledge the support provided by the master's degree scholarship program in engineering at the Universidad Tecnologica de Bolivar (UTB) in Cartagena, Colombia.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Young people&apos;s uses and understandings of online social networks in their everyday lives</title>
		<author>
			<persName><forename type="first">F</forename><surname>Awan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gauntlett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Young</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="111" to="132" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Automatic classification of sexism in social networks: An empirical study on twitter data</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rodríguez-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2020.3042604</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="219563" to="219576" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Social identity theory: Past achievements, current problems and future challenges</title>
		<author>
			<persName><forename type="first">R</forename><surname>Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European journal of social psychology</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="745" to="778" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Gender bias and sexism in language</title>
		<author>
			<persName><forename type="first">M</forename><surname>Menegatti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rubini</surname></persName>
		</author>
		<idno type="DOI">10.1093/ACREFORE/9780190228613.013.470</idno>
		<ptr target="https://cris.unibo.it/handle/11585/623058.doi:10.1093/ACREFORE/9780190228613.013.470" />
	</analytic>
	<monogr>
		<title level="j">Oxford Research Encyclopedia of Communication</title>
		<imprint>
			<biblScope unit="page" from="451" to="468" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Categorizing sexism and misogyny through neural approaches</title>
		<author>
			<persName><forename type="first">P</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Abburi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Chhaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on the Web (TWEB)</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="1" to="31" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of exist 2021: sexism identification in social networks</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rodríguez-Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Comet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Donoso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procesamiento del Lenguaje Natural</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="page" from="195" to="207" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Online hate, digital discourse and critique: Exploring digitallymediated discursive practices of gender-based hostility</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khosravinik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lodz Papers in Pragmatics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="45" to="68" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Overview of EXIST 2024 -Learning with Disagreement for Sexism Identification and Characterization in Social Networks and Memes (Extended Overview)</title>
		<author>
			<persName><forename type="first">L</forename><surname>Plaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carrillo-De-Albornoz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maeso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Amigó</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Morante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Spina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2024 -Conference and Labs of the Evaluation Forum</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Faggioli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Galuščáková</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">G S</forename><surname>Herrera</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Swsr: A chinese dataset and lexicon for online sexism detection</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zubiaga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Online Social Networks and Media</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page">100182</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The rising fourth wave: feminist activism on digital platforms in india</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ORF Issue Brief</title>
		<imprint>
			<biblScope unit="volume">384</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Online misogyny</title>
		<author>
			<persName><forename type="first">K</forename><surname>Barker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Jurasz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of International Affairs</title>
		<imprint>
			<biblScope unit="volume">72</biblScope>
			<biblScope unit="page" from="95" to="114" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Overview of the evalita 2018 task on automatic misogyny identification (ami)</title>
		<author>
			<persName><forename type="first">E</forename><surname>Fersini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nozza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EVALITA Evaluation of NLP and Speech Tools for Italian Proceedings of the Final Workshop 12-13</title>
				<meeting><address><addrLine>Naples</addrLine></address></meeting>
		<imprint>
			<publisher>Accademia University Press</publisher>
			<date type="published" when="2018-12">December 2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Effective hate-speech detection in twitter data using recurrent neural networks</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Pitsilis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ramampiaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Langseth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Intelligence</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="4730" to="4742" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Deep learning for hate speech detection in tweets</title>
		<author>
			<persName><forename type="first">P</forename><surname>Badjatiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Varma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th international conference on World Wide Web companion</title>
				<meeting>the 26th international conference on World Wide Web companion</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="759" to="760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Andreas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lazaridou</surname></persName>
		</author>
		<title level="m">Proceedings of the naacl student research workshop</title>
				<meeting>the naacl student research workshop</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>Proceedings of the NAACL Student Research Workshop</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Text classification by augmenting bag of words (bow) representation with co-occurrence feature</title>
		<author>
			<persName><forename type="first">K</forename><surname>Soumya George</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Joseph</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IOSR Journal of Computer Engineering</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="34" to="38" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Ullman</surname></persName>
		</author>
		<title level="m">Mining of massive datasets</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mathew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mukherjee</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1812.06700</idno>
		<title level="m">Hateminers: Detecting hate speech against women</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Unified benchmark and comparative evaluation for tweet classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Neves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">T</forename><surname>Espinosa-Anke</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.12421</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">TimeLMs: Diachronic language models from Twitter</title>
		<author>
			<persName><forename type="first">D</forename><surname>Loureiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Neves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.acl-demo.25</idno>
		<ptr target="https://aclanthology.org/2022.acl-demo.25.doi:10.18653/v1/2022.acl-demo.25" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Basile</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kozareva</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Stajner</surname></persName>
		</editor>
		<meeting>the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="251" to="260" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Nltk: the natural language toolkit</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions</title>
				<meeting>the COLING/ACL 2006 Interactive Presentation Sessions</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="69" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Bots and gender profiling on twitter using sociolinguistic features</title>
		<author>
			<persName><forename type="first">E</forename><surname>Puertas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">G</forename><surname>Moreno-Sandoval</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Plaza-Del Arco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Alvarado-Valencia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pomares-Quimbaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Alfonso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">TweetNLP: Cutting-edge natural language processing for social media</title>
		<author>
			<persName><forename type="first">J</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rezaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Riahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ushio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Loureiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Antypas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Boisson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Espinosa Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">Martínez</forename><surname>Cámara</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.emnlp-demos.5</idno>
		<ptr target="https://aclanthology.org/2022.emnlp-demos.5.doi:10.18653/v1/2022.emnlp-demos.5" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Shutova</surname></persName>
		</editor>
		<meeting>the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics<address><addrLine>Abu Dhabi, UAE</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="38" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Sandoval</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Puertas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Quimbaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Valencia</surname></persName>
		</author>
		<title level="m">Assembly of polarity, emotion and user statistics for detection of fake profiles</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>CLEF</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
