<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ANLP-Uniso at EXIST 2025: Sexism Identification and Characterization in Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ghada Ben Amor</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nawres Medimagh</string-name>
          <email>medimaghnaoures@isgs.u-sousse.tn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saoussen Ben Chaabene</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Omar Trigui</string-name>
          <email>omar.trigui@isgs.u-sousse.tn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Miracl laboratory, University of Sfax</institution>
          ,
          <country country="TN">Tunisia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Sousse, The Higher Institute of Management of Sousse</institution>
          ,
          <addr-line>Sousse 4000</addr-line>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The rise of sexist discourse on social media platforms, especially Twitter, has become a pressing societal concern, necessitating the development of automated detection systems. In this study, we present a comprehensive approach to detecting sexist content in Spanish tweets as part of the EXIST 2025 shared task at CLEF. Leveraging the multilingual T5 (mT5) model for contextual embeddings, our system integrates a variety of machine learning and deep learning classifiers, including traditional machine learning approaches (Logistic Regression and SVM) and neural networks (RNN, GRU, and hybrid FNN+GRU). To enhance classification accuracy, we apply extensive preprocessing, feature normalization, dimensionality reduction via PCA, and data balancing techniques such as SMOTE and class weighting. Our experiments show that while simpler models like Logistic Regression achieve strong performance, ensemble strategies further improve robustness. The results underscore the value of combining transformer-based embeddings with classical and neural classifiers to address the nuanced challenge of online sexism detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sexism detection</kwd>
        <kwd>Social media</kwd>
        <kwd>Twitter</kwd>
        <kwd>Spanish tweets</kwd>
        <kwd>EXIST 2025</kwd>
        <kwd>CLEF</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid proliferation of social media platforms has transformed the way individuals
communicate and express opinions. However, this digital revolution has also given rise to significant
challenges, particularly in the form of online abuse and discrimination. Among these, sexist content
ranging from explicit harassment to subtle gender-based bias has become alarmingly prevalent,
especially on platforms like Twitter. The automatic detection of such harmful language is thus a
crucial task in the broader effort to foster safer and more inclusive online environments.</p>
      <p>Traditional natural language processing (NLP) techniques have struggled to accurately identify
sexism due to its nuanced and context-dependent nature. Modern approaches, however, increasingly
rely on machine learning (ML) and deep learning (DL) methods that can capture more complex
linguistic patterns. In particular, transformer-based models such as mT5, which leverage large-scale
pretraining and contextual embeddings, have demonstrated significant advances in understanding
and processing human language. This study aims to develop an effective sexism detection system by
combining state-of the-art transformer models with traditional ML/DL classifiers, enhanced through
robust preprocessing, feature extraction, and handling of class imbalance.</p>
      <p>By applying this hybrid approach to social media data, we seek not only to improve detection
accuracy but also to gain deeper insights into the linguisticcharacteristics of sexist discourse online.</p>
      <p>With the exponential rise of social media platforms, especially Twitter, sexist discourse has found
new ways to spread rapidly and widely. Whether explicit or implicit, such content significantly
impacts gender perception and reinforces stereotypes and online abuse.</p>
      <p>Automatically detecting this type of language has thus become a critical task. However, it remains
particularly challenging due to the subjective nature of language, cultural and linguistic diversity, and
the often-subtle manifestations of sexism. Additionally, the class imbalance in the dataset where
sexist tweets are fewer than non-sexist ones further complicate the development of effective machine
learning systems. The central research question of this study is: How can we develop a reliable,
robust, and multilingual system for the automatic detection of sexist content on social media
particularly Spanish-language tweets while ensuring both interpretability and generalization?</p>
      <p>The main objective of this study is to design and implement an automatic sexism detection system
for tweets, as part of the EXIST 2025 shared task at the CLEF campaign. To achieve this, we rely on:
• Contextual text representations generated using the mT5 multilingual transformer model.
•
dataset.</p>
      <p>• A combination of traditional machine learning models (such as Logistic Regression and SVM)
and deep learning architectures (such as RNN, GRU, FNN, and hybrid models).</p>
      <p>Rigorous text preprocessing techniques to ensure the linguistic consistency and quality of the
• Techniques for dimensionality reduction (PCA), data balancing (SMOTE, class weighting),
and robust evaluation (F1-score, confusion matrices, accuracy, precision, macro avg, weighted avg,
support).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The automatic detection of online sexism has become a growing area of research within Natural
Language Processing, particularly due to the proliferation of gender-based hate speech on social
media platforms. Since the launch of the EXIST shared task in 2021, the community has developed
diverse methodologies for handling the binary and fine-grained classification of sexist content in
multiple languages. Early approaches to sexism detection primarily relied on classical machine
learning algorithms such as logistic regression, support vector machines [SVMs], and random forests
using hand-crafted features like TF-IDF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although these models provided lightweight and
interpretable solutions, their performance often lagged behind neural approaches, especially in
capturing implicit and contextual sexism. The widespread adoption of transformer-based
architectures, such as BERT, mBERT, and XLMRoBERTa, has significantly improved the performance
of sexism detection systems [2]. These models leverage contextual embeddings to better understand
the subtle and nuanced nature of online discourse. Khan et al. [3] addressed multilingual sexism
detection by leveraging transformer models like XLM-RoBERTa and mBERT. They proposed an
ensemble approach combining multiple fine-tuned models to identify explicit and implicit sexism in
both English and Spanish texts, achieving excellent performance in the EXIST 2024 shared task. The
SemEval-2023 Task 10 further expanded the classification framework by introducing a hierarchical
taxonomy of sexism, distinguishing between threats, stereotypes, and derogatory remarks. These
systems often combine fine-tuned transformer models (e.g., DeBERTa-v3, TwHIN-BERT) with
auxiliary techniques like multi-label learning and contextual data augmentation [4]. Hybrid
approaches have also been proposed to integrate textual and non-textual signals. A notable example is
the use of ByT5 a byte-level multilingual transformer combined with TabNet for incorporating
structured metadata such as platform, language, and readability [5]. While promising, such systems
remain computationally expensive and require extensive fine-tuning to outperform simpler baselines.
Finally, Azadi et al, tackled bilingual sexism detection using fine-tuned XLM-RoBERTa and GPT-3.5
few-shot learning. Their approach showed high performance in English and Spanish, demonstrating
the effectiveness of both fine-tuning and prompt-based methods for identifying sexism with minimal
annotated data [6].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Description: EXIST 2025 Tweets Dataset</title>
        <p>As part of our research on automatic sexism detection in social media content, we utilized the EXIST
2025 Tweets Dataset, released by the organizers of the CLEF 2025 conference under the shared task
entitled Explainable Detection of Sexism in Social Networks. This multilingual dataset focuses
primarily on tweets written in Spanish and English, with the goal of enabling the development of
explainable AI models capable of identifying and classifying sexist content. The dataset is organized
into three main subdirectories.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Text Preprocessing</title>
        <p>The preprocessing phase was essential to ensure the quality and linguistic consistency of the dataset
prior to model training. Since the EXIST 2025 dataset contains multilingual content, including both
English and Spanish tweets, we applied a language filtering step to retain only tweets written in
Spanish, in line with the objectives of our study. To prepare the text data, we first performed a
normalization process that involved the removal of irrelevant textual elements such as hyperlinks,
user mentions, and hashtags. This was followed by the conversion of emojis into their textual
representations to preserve semantic information, which were subsequently cleaned to avoid
introducing noise. Next, we eliminated all special characters and non-alphanumeric symbols, except
for characters specific to the Spanish language, such as accented vowels and the letters ñ and ü. This
step helped maintain the integrity of Spanish orthography while reducing unnecessary variability in
the data. We also removed common Spanish stopwords to reduce redundancy and focus on the most
informative components of the tweets. Finally, any tweets that could not be reliably identified as
Spanish or became empty after preprocessing were excluded from the dataset. This cleaning process
resulted in a more homogeneous and relevant dataset, optimized for the detection of sexist content in
Spanish-language tweets.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Libraries and Tools</title>
        <p>This study employed a range of Python libraries to handle data preprocessing, feature extraction,
model training, and evaluation for sexism detection in tweets.</p>
        <p>• Emoji</p>
        <p>The emoji library was used to process emojis, which are common in social media texts and can
carry semantic or emotional information [7]. Emojis were either removed or analyzed as
features, as certain emojis may correlate with sexist language.
• CatBoost, LightGBM, and XGBoost</p>
        <p>These gradient boosting libraries were used to train classification models [8]. CatBoost is
particularly efficient with categorical features, while LightGBM and XGBoost offer fast,
scalable implementations. All were evaluated for their performance in binary classification and
feature importance analysis.
• Transformers (mT5)</p>
        <p>The transformers library from Hugging Face provided access to the pre-trained mT5 model,
which was used to generate contextualized text embeddings [5]. These embeddings capture
deep semantic meaning and are well-suited for nuanced NLP tasks like sexism detection.
• Scikit-learn (sklearn)</p>
        <p>Used for essential machine learning tasks including data splitting, preprocessing, baseline
modeling, and evaluation through metrics such as accuracy and F1-score [9].
• TensorFlow, PyTorch, and SciKeras</p>
        <p>These libraries supported the development and evaluation of deep learning models. SciKeras
enables integration of Keras models within scikit-learn pipelines, combining deep learning
with traditional ML workflows [10].
• Nltk</p>
        <p>Provided basic NLP functions such as tokenization, stop word removal, and lemmatization to
prepare tweets for modeling [11].
• Langdetect</p>
        <p>Automatically identified the language of each tweet, ensuring that only Spanish language
content was processed [12].
• Imbalanced-learn (imblearn)</p>
        <p>Addressed class imbalance using techniques such as SMOTE to improve the model’s ability to
detect minority (sexist) classes [13].
• Matplotlib and seaborn</p>
        <p>These visualization libraries were used for exploratory data analysis and to graphically present
classification results and feature distributions [14].</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Applied Techniques</title>
        <p>• Text Embedding Using mT5 Encoder</p>
        <p>We used the multilingual T5 (mT5) model from Google to generate contextual embeddings of
the tweets. The model extracts dense vector representations from the text using the encoder
part of mT5, allowing the input data to be transformed into meaningful numerical features
suitable for machine learning models. Each embedding corresponds to a sentence-level
representation averaged over the token dimension.
• Feature Normalization</p>
        <p>Before dimensionality reduction, the extracted embeddings were normalized using z-score
normalization (Standard scaling). This technique ensures that each feature has a mean of zero
and a standard deviation of one, which is essential for models sensitive to scale, such as PCA.
• Dimensionality Reduction Using Principal Component Analysis (PCA)
PCA was applied to reduce the dimensionality of the embedding vectors from their original size
(e.g. 768 dimensions for mt5-base) to 50 principal components. This reduces computational
Front matter complexity, alleviates noise, and allows for better visualization and training
efficiency without significant loss of information.
• Label Encoding</p>
        <p>Categorical labels (e.g., "sexist", "non-sexist") were converted into numeric format using label
encoding. This step is necessary for machine learning models that require numerical input for
training.
• Data Balancing Using SMOTE</p>
        <p>The Synthetic Minority Over-sampling Technique (SMOTE) was applied to the training data to
address class imbalance. It generates synthetic examples of the minority class by interpolating
between existing minority class samples, improving model generalization and reducing bias
toward the majority class.
• Class Weighting</p>
        <p>In addition to SMOTE, class weighting was used to handle imbalance by assigning higher
weights to the minority class during model training. This encourages the classifier to pay more
attention to underrepresented samples, improving performance on the minority class without
altering the dataset itself.
• Cosine Similarity Analysis</p>
        <p>A cosine similarity matrix was generated using the final training embeddings. This matrix
measures the pairwise similarity between tweets in the embedding space, useful for
exploratory data analysis or understanding the semantic relationships between samples.
• Classification Models</p>
        <p>To perform the classification task, a diverse set of models was trained and evaluated. Among
the traditional machine learning models, we employed Logistic Regression, Support Vector
Machines (SVM), Random Forest, Naive Bayes, k-Nearest Neighbors (KNN), LightGBM,
XGBoost, and CatBoost. These models were trained using PCA reduced embeddings, and class
weights were applied to address the imbalance in the dataset. In addition to these, neural
network architectures were also explored. We implemented Feedforward Neural Networks
(FNN), which consist of multiple dense layers with dropout regularization, as well as recurrent
models such as RNNs and GRUs, capable of capturing sequential dependencies in the input
data. Furthermore, we tested hybrid architectures combining FNN with GRU and RNN with
GRU to assess potential synergies between feedforward and recurrent mechanisms.
• Evaluation Metrics and Confusion Matrices</p>
        <p>Models were evaluated using standard classification metrics: accuracy, precision, recall, and
F1-score. Confusion matrices were visualized to better understand each model’s performance
on the validation set.
• Ensemble Prediction with Fallback Mechanism</p>
        <p>Final predictions on the test set were made using a majority voting strategy across all trained
models. In case all predictions failed for a sample, a fallback to a subset of more robust models
was applied. This ensemble approach improves robustness and leverages the diversity of
models.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Logistic Regression</title>
        <p>Logistic Regression achieved the highest performance among all models, with an accuracy of 0.618,
precision of 0.619, recall of 0.618, and F1-score of 0.617. The confusion matrix shows that the model
correctly predicted 176 "NO" instances and 152 "YES" instances, while misclassifying 89 "NO" and 114
"YES" cases. The balanced precision and recall values indicate that the model generalizes well without
significant bias toward either class. This suggests that Logistic Regression is robust for this dataset,
likely due to its ability to handle linear decision boundaries effectively.
4.2. SVM
The SVM model achieved an accuracy of 0.595 and an F1-score of 0.594. While the performance is
comparable to other models, it does not stand out. The similar precision and recall values indicate
balanced classification, but the overall metrics suggest that the kernel or parameters used may not be
optimal for this dataset. Experimentation with different kernels or regularization parameters could
yield better results.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Naive Bayes</title>
        <p>Naive Bayes achieved an accuracy of 0.595 and an F1-score of 0.589. The precision for "YES" (0.602) is
higher than for "NO" (0.596), but the recall values are balanced. The model’s simplicity and
assumptions of feature independence may limit its performance, especially if the data violates these
assumptions. Despite this, it performs comparably to more complex models like SVM.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. Random Forest</title>
        <p>Random Forest performed poorly, with an accuracy of 0.571 and an F1-score of 0.571. The low metrics
suggest that the ensemble approach did not generalize well for this dataset. This could be due to
overfitting or suboptimal hyperparameters. Techniques like feature selection or increasing the
number of trees might enhance results.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.5. LightGBM</title>
        <p>LightGBM showed modest performance, with an accuracy of 0.569 and an F1-score of 0.569. The
confusion matrix reveals 149 correct "NO" and 153 correct "YES" predictions, with balanced precision
and recall. While the results are not outstanding, LightGBM’s efficiency and scalability make it a
viable option for larger datasets.
4.6. MLP
The MLP model underperformed, with an accuracy of 0.561 and an F1-score of 0.559. The low metrics
suggest that the neural network architecture or training process may need optimization, such as
adjusting layers, activation functions, or learning rates.
4.7. KNN
KNN performed the
accuracy of 0.55 and
The confusion matrix
"NO" and 156 correct
with high
The low recall for
indicates poor
class. This could be
k or distance metric,
well with the data</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.8. CatBoost</title>
        <p>CatBoost yielded an accuracy of 0.584 and an F1-score of 0.583. The confusion matrix shows 147
correct "NO" and 163 correct "YES" predictions, with moderate misclassifications. The recall for "YES"
(0.61) is higher than for "NO" (0.55), indicating a slight bias toward the "YES" class. Hyperparameter
tuning or addressing class imbalance might improve its performance.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.9. XGBoost</title>
        <p>XGBoost achieved an accuracy of 0.540 and an F1-score of 0.540, the second-lowest among all models.
The results suggest that the default parameters or training setup were ineffective. Parameter tuning
or feature engineering might be necessary to leverage XGBoost’s potential.
4.10. RNN
The RNN model
performance, with an
score of 0.597. The
values are nearly
balanced performance
the results are slightly
Regression, which may
nature of the data does
predictions for this task.
tuning or feature
improve its
achieved moderate
accuracy of 0.597 and
F1precision and recall
identical, indicating
across classes. However,
lower than Logistic
imply that the sequential
not significantly enhance
Further hyperparameter
engineering might
performance.
4.11. FNN
The FNN model showed competitive results, with an accuracy of 0.599 and an F1-score of 0.596. The
confusion matrix reveals 137 correct "NO" predictions and 181 correct "YES" predictions, but with
higher misclassifications (128 and 85, respectively). The recall for "YES" (0.68) is notably higher than
for "NO" (0.52), suggesting the model is more sensitive to the "YES" class. This imbalance could be
addressed by adjusting class weights or using techniques like oversampling.
Figure 11 illustrates the confusion matrix for FNN, while Table 11 presents the corresponding
classification report.
4.12. GRU
The GRU model underperformed with an accuracy of 0.589 and an F1-score of 0.589. The low metrics
suggest that the GRU’s ability to capture temporal dependencies did not translate into better
performance for this task. This could indicate that the dataset does not contain significant sequential
patterns or that the model requires deeper architecture or more training data.
4.13. FNN+GRU
The hybrid FNN+GRU model achieved an accuracy of 0.573 and an F1-score of 0.570. The performance
is similar to Random Forest, indicating that combining these architectures did not provide a
significant advantage. This suggests that the added complexity did not capture additional meaningful
patterns in the data.
4.14. RNN+GRU
The RNN+GRU
the lowest accuracy
score (0.550). The
shows 148 correct
correct "YES"
high
This poor
indicates that the
RNN and GRU is
this dataset,
overfitting or
training data.
hybrid model had
(0.550) and
F1confusion matrix
"NO" and 144
predictions, with
misclassifications.
performance
combination of
not suitable for
possibly due to
insufficient</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.15. Model performance comparison</title>
        <p>Logistic Regression emerged as the best performing model, demonstrating robustness and balance
across metrics. Simpler models like Naive Bayes and SVM performed comparably to more complex
ones, suggesting that the dataset may not benefit significantly from advanced architectures. Hybrid
models (e.g., FNN+GRU, RNN+GRU) underperformed, indicating that their added complexity did not
translate into better predictions figure 15. Future work could focus on hyperparameter tuning,
addressing class imbalance, or exploring alternative feature representations to improve model
performance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Our experimental findings reveal that traditional models like Logistic Regression and SVM can
perform surprisingly well in the task of sexism detection when combined with high-quality
contextual embeddings from mT5. Despite their simplicity, these models offer strong generalization
and efficiency. In contrast, deep learning and hybrid models did not consistently outperform simpler
methods, suggesting that complex architectures are not always advantageous, especially when
embeddings already capture rich semantic information. Models like KNN and XGBoost struggled,
highlighting challenges related to high-dimensionality and class imbalance. Despite using SMOTE
and class weighting, the imbalance between sexist and non-sexist tweets remained a challenge. This
points to the need for more advanced data balancing techniques. Finally, ensemble methods showed
potential for improving robustness, indicating multiple models may be a promising future direction
for enhanced performance.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This paper presented a comprehensive system for the automatic detection of sexist content in Spanish
tweets, developed as part of the EXIST 2025 shared task. Our approach combined the powerful
contextual representations from the multilingual mT5 transformer model with a diverse set of
traditional machine learning and deep learning classifiers. Experimental results highlighted that
Logistic Regression achieved the best overall performance, demonstrating robustness and
generalization across classes. Interestingly, simpler models such as Naive Bayes and SVM performed
comparably to more complex architectures, indicating that when combined with high-quality
embeddings and preprocessing, lightweight models can effectively address the nuanced task of
sexism detection. Nevertheless, several challenges remain. The relatively modest performance of
deep learning and hybrid models suggests the need for further architectural refinement or the
integration of additional linguistic and semantic features. Moreover, the persistent issue of class
imbalance underscores the importance of advanced sampling and augmentation techniques to
improve model fairness and generalization. For future work, we aim to explore dynamic ensemble
strategies that adapt model contributions based on context, incorporate multimodal signals such as
images and metadata, and enhance interpretability through explainable AI tools like SHAP and LIME.
We also plan to deploy the system in a real-time environment and extend its evaluation to other its
cross lingual robustness and applicability.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors made limited use of ChatGPT, DeepSeek, and
Microsoft Copilot. These tools were employed exclusively for grammar and spelling checking, for
paraphrasing and rewording sentences in order to improve clarity and style, and for providing
occasional code suggestions. All outputs from these tools were carefully reviewed, edited, and
validated by the authors to ensure accuracy, originality, and scientific integrity. The authors take full
responsibility for the entire content of this publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]</source>
          [13]
          <string-name>
            <surname>Amaan</surname>
            <given-names>Rizvi</given-names>
          </string-name>
          , Anupam Jamatia, «
          <article-title>NIT-Agartala-NLP-Team at EXIST 2022: Sexism» IberLEF</article-title>
          , Spain,
          <year>2022</year>
          . URL: https://anupamjamatia.github.io/assets/pdf/rizvi2022.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Schutz</given-names>
            <surname>Mina</surname>
          </string-name>
          , Boeck Jaqueline, Liakhovets Daria, Slijepcevic Djordje, Kirchknopf Armin, Hecht Manuel, Bogensperger Johannes, Schlarb Sven, Schindler Alexander, and Zeppelzauer Matthias «
          <article-title>Automatic Sexism Detection with Multilingual Transformer Models» IberLEF, Spain</article-title>
          .,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2943</volume>
          /exist_paper1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Sahrish</given-names>
            <surname>Khan</surname>
          </string-name>
          , Gabriele Pergola and Arshad Jhumka, «
          <article-title>Multilingual Sexism Identification via Fusion of Large Language Models» Conference and Labs of the Evaluation Forum</article-title>
          , France,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-99.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Hannah</given-names>
            <surname>Kirk</surname>
          </string-name>
          , Wenjie Yin, Bertie Vidgen, Paul Röttger, «
          <source>Explainable Detection of Online Sexism» The 17th International Workshop on Semantic Evaluation (SemEval-2023)</source>
          , Toronto, Canada,
          <year>2023</year>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .semeval-
          <volume>1</volume>
          .305/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Arjumand</given-names>
            <surname>Younus</surname>
          </string-name>
          , Muhammad Atif Qureshi, «
          <article-title>A Framework for Sexism Detection on Social Media via ByT5 and TabNet» IberLEF</article-title>
          , Coruña, Spain.,
          <year>2022</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3202</volume>
          /exist-paper15.
          <fpage>pdf</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>AmirMohammad Azadi</surname>
          </string-name>
          , Baktash Ansari, Sina Zamani and Sauleh Eetemadi, «
          <article-title>Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning» Conference and Labs of the Evaluation Forum</article-title>
          , Grenoble, France,
          <year>2024</year>
          . URL: https://arxiv.org/pdf/2406.07287.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Zi</given-names>
            <surname>Yun</surname>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , Ziqing Zhang, Yisong Miao, «
          <article-title>The ELCo Dataset: Bridging Emoji and Lexical Composition» The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING</article-title>
          <year>2024</year>
          ), Torino, Italia,
          <year>2024</year>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lrec-main.
          <volume>1381</volume>
          /? Rajkiran, «
          <article-title>XGBoost vs LightGBM vs CatBoost</article-title>
          » Medium,
          <fpage>27</fpage>
          <lpage>06</lpage>
          2025. [Online]. Available: https://medium.com/%40rajkiranrao205/
          <article-title>xgboost-vs-lightgbm-vs-catboost-a-practicalcomparison-with-coffee-cats-code-5fab396ed39d.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Dan</given-names>
            <surname>Wang</surname>
          </string-name>
          ,Yanbo Shen, Dong Ye,Yanchao Yang,
          <article-title>Xuanfang Da and Jingyue Mo, «Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction» Atmosphere</article-title>
          , vol.
          <volume>15</volume>
          , n° %
          <fpage>18</fpage>
          ,
          <year>2024</year>
          . URL: https://doi.org/10.3390/atmos15080994.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis,
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Devin</surname>
          </string-name>
          , Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, «
          <article-title>TensorFlow: A system for large-scale machine learning» 2016</article-title>
          . URL: https://arxiv.org/pdf/1605.08695.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Uzair</given-names>
            <surname>Adamjee</surname>
          </string-name>
          , «Introduction to NLTK library in Python»
          <year>2020</year>
          10
          <fpage>19</fpage>
          . [Online]. Available: https://python.plainenglish.io/introduction-to
          <article-title>-nltk-library-in-python-6fa729b54ad.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Kunjal</given-names>
            <surname>Chawhan</surname>
          </string-name>
          , «
          <article-title>Language Detection in Python Using LangDetect: A Quick Guide» 01 11</article-title>
          <year>2024</year>
          . [Online]. Available: https://www.decodedigitalmarket.
          <article-title>com/language-detection-usinglangdetect-library-in-python/? «SMOTE for Imbalanced Classification with Python» 24 04</article-title>
          <year>2025</year>
          . [Online]. Available: https://www.analyticsvidhya.com/blog/2020/10/overcoming-class
          <article-title>-imbalance-using-smotetechniques/? Ankur Kumar, «</article-title>
          <source>Seaborn: Visualize data beyond matplotlib» 02 09</source>
          <year>2024</year>
          . [Online]. Available: https://techifysolutions.com/blog/seaborn-vs matplotlib/?
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>