<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Models for Detecting Ofensive Content and Quantifying Prejudice in Online Platforms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Borregón Sacristán</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Pérez Muñoz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucas Sebastián Peris</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Detection, Prejudice Target Detection, and Degree of Prejudice Prediction.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a comprehensive project focused on developing models that address three critical tasks related to prejudicial humour recognition: Hurtful Humour Detection, Prejudice Target Detection, and Degree of Prejudice Prediction. Ofensive and prejudiced language are prevalent in online platforms, posing significant challenges for content moderation and fostering inclusive communities. Therefore, our work aims to contribute to the identification and quantification of such problematic content through the application of state-of-the-art natural language processing (NLP) techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>ofensive content</kwd>
        <kwd>hurtful humour detection</kwd>
        <kwd>prejudice target detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The proliferation of online platforms has revolutionized communication and information sharing,
enabling individuals from diverse backgrounds to connect and engage in virtual communities.
However, this increased accessibility and connectivity has also exposed society to the darker side
of online interactions, including ofensive content and prejudiced language. The prevalence of
hurtful humour, hate speech, and discrimination on digital platforms poses significant challenges
in fostering inclusive environments and ensuring the well-being of users.</p>
      <p>Addressing these challenges requires the development of efective methods for detecting
and quantifying ofensive content and prejudice. The field of natural language processing
(NLP) ofers powerful tools and techniques to analyze and understand text data, making it a
promising avenue for combating online harassment and promoting respectful communication.
In these working notes on our submission to the HUHU IberLEF Shared Task 2023, we present a
comprehensive work aimed at building models that address three vital tasks: Hurtful Humour</p>
      <p>Hurtful Humour Detection is the first task we tackle, as it serves as a foundational step in
identifying ofensive content. Distinguishing between humourous content that is innocuous
(L. Sebastián Peris)
(L. Sebastián Peris)
and content that aims to demean or target individuals or groups is crucial in understanding the
potential harm caused by certain forms of humour. By developing models capable of recognizing
hurtful humour, we can lay the groundwork for subsequent analyses and interventions.</p>
      <p>Building upon the foundation of Hurtful Humour Detection, we delve into the task of Prejudice
Target Detection. Identifying the specific targets of prejudice within ofensive content allows us
to gain insights into the various marginalized groups or individuals who are disproportionately
afected by online discrimination. By shedding light on these targets, we aim to raise awareness,
promote empathy, and encourage targeted interventions to counteract prejudiced discourse.</p>
      <p>Lastly, we explore the Degree of Prejudice Prediction task, which involves quantifying the
severity of prejudice within ofensive language. Recognizing that not all instances of ofensive
content carry the same degree of harm, our models provide fine-grained analysis to assess the
intensity and harmfulness of prejudiced language. This nuanced understanding empowers
content moderators and platform administrators to make informed decisions regarding content
policies and community guidelines.</p>
      <p>The importance of this work lies in its potential to contribute to the field of NLP, ofering
practical solutions to mitigate the negative impact of ofensive content and prejudice in online
environments. By developing accurate and eficient models for hurtful humour detection,
prejudice target detection, and degree of prejudice prediction, we aim to support online platforms
in implementing proactive measures to foster inclusive communities. Through this research, we
hope to contribute to the creation of safer online spaces, promoting positive interactions and
protecting vulnerable individuals or groups from the detrimental efects of prejudiced content.</p>
      <p>The remainder of this paper is structured as follows: Section 2 delves into the state of the
art, describing the most used tools for the tasks at hand and naming related papers on hurtful
humour detection and prejudice analysis. Section 3 describes the methodology and dataset
used in our work. In Section 4, we present detailed results and analyses for each of the three
tasks. Finally, Section 5 discusses the implications of our findings, limitations of the study, and
avenues for future research, concluding with the significance of our project in addressing the
challenges of ofensive content and prejudice detection in online platforms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the art</title>
      <p>
        The field of natural language processing has witnessed remarkable advancements in recent
years, particularly with the emergence of transformer-based models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] such as BERT, GPT, and
RoBERTa. These models have revolutionized various NLP tasks, including ofensive content
detection and prejudice analysis. Transformers excel in capturing contextual dependencies and
semantic nuances in text, enabling them to comprehend and interpret complex language patterns.
Researchers have leveraged pre-trained transformer models to develop fine-tuned architectures
that exhibit superior performance in tasks related to hurtful humour detection, prejudice target
detection, and degree of prejudice prediction. Additionally, specialized datasets annotated with
ofensive language and prejudice markers have been curated to train and evaluate these models
efectively. The combination of transformer architectures and domain-specific datasets has
greatly enhanced the state of the art in NLP tasks related to prejudice in humour, empowering
the development of more accurate and robust models for content moderation and fostering
inclusive online communities.
      </p>
      <p>
        In fact, previous tasks have investigated the use of ofensive language in humour, in particular
for Spanish HAHA at IberEval 2018 (Castro et al., 2018) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and IberLEF 2019 y 2021 (Chiruzzo
et al., 2019; Chiruzzo et al., 2021) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] or the dissemination of stereotypes using irony
(OrtegaBueno et al., 2022) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and previous work was done to study the hurtfulness of other types
of figurative language such as sarcasm (Frenda et al., 2022) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. More related work followed
with the use of linguistic features to foster explainability (Merlo et al., 2022) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and HUHU at
IberLEF 2023 (Labadie-Tamayo et al., 2023) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Nonetheless, in HUHU, our focus is on examining the use of humour to express prejudice
towards minorities, specifically analyzing Spanish tweets that are prejudicial towards:
• Women and feminists
• LGBTIQ community
• Immigrants and racially discriminated people
• Overweight people</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our dataset comprises a diverse collection of tweets obtained from the shared task website,
providing a rich and varied corpus for analysis. The dataset consists of a structured format
with several key features that capture important information about each tweet. These features
include the following:
• index: A unique identifier for each tweet.
• tweet: The actual content of the tweet, which may include text, hashtags, mentions, and</p>
      <p>URLs.
• humour: A binary label indicating whether the tweet contains hurtful humour or not.
• fatphobia: A binary label indicating whether the prejudice expressed in the tweet is
targeting overweight people.
• prejudice_woman: A binary label indicating whether the prejudice expressed in the
tweet is targeting women.
• prejudice_lgbtiq: A binary label indicating whether the prejudice expressed in the tweet
is targeting the LGTBIQ+ collective.
• prejudice_inmigrant: A binary label indicating whether the prejudice expressed in the
tweet is targeting inmigrants.</p>
      <p>• mean_prejudice: A real number indicating the degree of prejudice present in the tweet.</p>
      <sec id="sec-3-1">
        <title>3.1. Preprocessing</title>
        <p>Before delving into the specific tasks, it is important to discuss the preprocessing steps we
performed on the “tweet” column. We carried out two distinct preprocessing approaches.</p>
        <p>The first preprocessing step focused on basic text cleaning techniques, including converting
the text to lowercase, removing punctuation marks, and eliminating words such as hashtags,
mentions, URLs, and stopwords. This initial preprocessing was performed to eliminate potential
sources of noise that could hinder the efectiveness of subsequent techniques, such as models
based on the transformers architecture.</p>
        <p>
          The second preprocessing step, which was more in-depth, aimed to prepare the data for
techniques that do not have their own embeddings, unlike transformers. In this step, we repeated
the aforementioned basic cleaning processes and additionally performed tasks such as stopword
removal, lemmatization, and obtaining the keyed vector representation for each tweet. In our
case, the chosen approach to vectorize the words was based on word embedding, a technique by
which each word is associated with an n-dimensional vector, which means that closely related
words have very close vector representations, and allows the models to understand the meaning
of the words in a certain way. Specifically, the fastText algorithm [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] was employed to compute
the word embeddings. This algorithm was preferred due to its optimized nature for obtaining
word embeddings quickly. Additionally, its open-source nature, availability, and cost-free usage
made it accessible to all, facilitating replication of our experiments.
        </p>
        <p>These additional preprocessing steps allowed us to apply techniques that rely on traditional
feature engineering approaches. By understanding the diferent preprocessing methods
employed throughout our research, we can now delve into the specific tasks at hand.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Addressing Task 1: Hurtful Humour Detection</title>
        <p>
          The first task consists in determining whether a prejudicial tweet is intended to cause humour,
using the “humour” feature from our dataset as the target feature. Prior to undertaking the task,
it was of particular interest to study the distribution of classes in the “humour” feature, revealing
an extreme class imbalance. Despite the potential information loss, undersampling was applied
to the data since models seemed to classify predominantly into the majority class when this
step was omitted. Moreover, employing oversampling techniques like SMOTE considerably
worsened the results, as generating meaningful synthetic text samples proved to be highly
delicate [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Consequently, after lightly preprocessing the data and applying undersampling, we
utilized the tokenizer provided by the transformer model chosen, specifically BETO [ 11], which
exhibited slightly superior performance compared to ROBERTA [12] after multiple trials. We
then computed the embeddings for the data and performed fine-tuning of the transformer [ 13]
through a 5-fold cross-validation process, incorporating early stopping techniques to prevent
overfitting and obtain a genuine assessment of the model’s performance.
        </p>
        <p>To further enhance the predictions provided by BETO, we compared its errors with those of
other techniques that also yielded satisfactory results, such as SVM and XGBoost. Given the
discernible diferences in error patterns between these models, and considering their relatively
similar precision levels, we decided to combine them to achieve improved results. After
finetuning the hyperparameters for SVM and XGBoost, the best outcomes were obtained using a
voting ensemble method based on soft voting. In this approach, given an input, we extracted
the probabilities assigned to each class by the models and ultimately determined the class of
the observation based on the highest percentage returned by the three models.</p>
        <p>The approach described in detail above can be simplified in a diagram, as shown in Figure 1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Addressing Task 2a: Prejudice Target Detection</title>
        <p>For the second task of Prejudice Target Detection, a multilabel classification problem, we
followed a systematic methodology to build an efective model. The steps involved in this
process are as follows. First, we performed basic cleaning on the ’tweet’ column to preprocess
the text data. This step ensured that the input data was in a standardized format and ready for
further processing. Next, we computed sentence embeddings using BETO (a BERT-based model
specifically trained for Spanish) for the preprocessed tweets. Sentence embeddings transform
the textual data into numerical representations that capture the semantic meaning of the text.
These embeddings serve as input to the neural network model for training and prediction.</p>
        <p>To optimize the model’s hyperparameters, we created an Optuna study [14]. Optuna is a
hyperparameter optimization framework that explores diferent combinations of hyperparameters
to find the best configuration. We defined an objective function to maximize the evaluation
metric’s performance (e.g., F1 score) and utilized Optuna to search for the optimal
hyperparameters. The best trial’s value and corresponding parameters were printed for reference. It
is important to note that in developing this function we took two additional precautionary
measures to avoid overfitting: adjusting the learning rate during training and triggering an
early stop if the validation loss does not improve for a certain number of epochs.</p>
        <p>Based on the optimal hyperparameters, we constructed a Keras Sequential model [15]. The
model architecture consisted of dense layers, dropout layers, and an output layer. Dense layers
learn complex patterns and representations from the input data, while dropout layers help
prevent overfitting by randomly dropping out a fraction of neurons during training. The output
layer, with a sigmoid activation function, produced a probability distribution over the diferent
prejudice groups being predicted. All this information can be compressed in a diagram like the
one shown in Figure 2.</p>
        <p>To assess the model’s performance and ensure its generalizability, we employed 10-fold
cross-validation. This technique involved splitting the dataset into 10 subsets, training the
model on 9 subsets, and evaluating its performance on the remaining subset. By performing
k-fold (“k” being 10, in this case) cross-validation, we obtained more reliable estimates of the
model’s performance and identified any potential overfitting or underfitting issues.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Addressing Task 2b: Degree of Prejudice Prediction</title>
        <p>One of the labels available in the provided dataset was “prejudice_degree”, representing the
degree of prejudice reflected in each tweet. This information is valuable as not all tweets have
the same impact on readers. Identifying the degree of prejudice allows us to distinguish between
mildly ofensive and highly ofensive messages. The “prejudice_degree” label is a numerical
feature ranging from 0 to 5, where 0 signifies a mildly ofensive tweet and 5 represents a highly
ofensive one.</p>
        <p>The task at hand involves predicting the degree of prejudice in text using regression models,
with evaluation based on the root mean squared error (RMSE) on the test set. Our initial
approach was relatively simple: finding individual models that could provide acceptably small
RMSE values and combining them using an ensemble technique. It was preferable for these
models to be as diverse as possible so that any errors in one model could be compensated by
the others, preventing overfitting. Some of the models that yielded the best results were SVR
(Support Vector Machine for Regression) [16], XGBoost for regression, and Random Forest. The
next step was to tune the hyperparameters of each model and select an ensemble method. Since
it is a regression task, we considered stacking as one of the best options for the ensemble method
[17, 18, 19]. We experimented with L1 and L2 penalties and found that L2 with Ridge regression
produced the best results. We also considered using ElasticNet, but the combination of penalties
worsened the results with higher percentages of L1 penalties. In the end, we presented the
results of the stacking model with L2 penalty using Ridge regression, employing SVR, XGBoost,
and RandomForest models.</p>
        <p>Given two submissions were admitted to the shared task, we aimed to be more innovative
in the next iteration and opted for a more complex method that made conceptual sense to us.
During our tests with regression models, we noticed that due to the normal distribution of the
data, the models tended to make predictions around the mean for the majority of instances.
Consequently, extreme value tweets (between 1 and 2, and between 4 and 5) had a higher
classification error. To address this, we trained three distinct models [ 20]: one for central values
(2-4), one for low prejudice tweets (1-2), and one for high prejudice messages (4-5). Predictions
from the model were then obtained through stacking, considering the decisions made by the
three models. The main challenge of this model was that, when training with diferent subsets
of data, there were instances where two models had to predict a degree of prejudice that was
not present in their training samples.</p>
        <p>Another version of this model involved creating a classifier model first. In this approach, a
sample was classified into one of the three groups, and the corresponding model within that
group was responsible for regression. However, the main issue with this model was its strong
dependence on having a good classifier since even if the regression models were perfect, a poor
prior classification would result in a significant diference in RMSE. The proposed approaches
can be summarized in a scheme, as shown in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>This section provides an overview of the results obtained from the trained models for each
task. For the first task (Hurtful Humour Detection), after fine-tuning the hyperparameters for
SVM and XGBoost, the best results were achieved using a soft voting ensemble method. This
method involves extracting the class probabilities provided by each model when given an input
and making the final decision for the class of an observation based on the highest percentage
returned by the three models. By employing this approach, we obtained an average F1 score
of 0.83 using cross-validation on the test data. However, our model performed worse than
expected when used on the unseen data, obtaining an F1 score of 0.744.</p>
      <p>Our approach for the second task (Prejudice Target Detection) showed promising results
when tested on the data provided by the shared task organizers, achieving a macro F1 score of
0.79. That being said, our model did not prove to be as reliable as we had first thought, as the
published results revealed it had only achieved a 0.607 macro F1 score, leaving us to think if we
could have done more to avoid overfitting.</p>
      <p>As for the third task, Degree of Prejudice Prediction, we experimented with three main
models. Regarding the results, our best-performing model, based on our validation process,
was the original stacking model. The subsequent models involved stacking of subtasks with
a prior classifier. The models submitted to the competition were the first two mentioned,
achieving RMSEs of 0.99 and 1.07, respectively. Nonetheless, we believe that the model utilizing
the classifier has the potential to deliver promising results with some improvements in its
implementation.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The implications of our work are twofold: practical and societal. On a practical level, our models
for hurtful humour detection, prejudice target detection, and degree of prejudice prediction
have significant implications for content moderation on online platforms. By accurately
identifying ofensive content, platforms can take proactive measures to remove or flag such content,
promoting a safer and more inclusive environment for users. Additionally, the ability to detect
the specific targets of prejudice enables a targeted approach in combating discrimination and
promoting empathy and understanding.</p>
      <p>Furthermore, our models provide a nuanced understanding of the degree of prejudice within
ofensive language. This insight empowers platform administrators and policymakers to make
informed decisions regarding content moderation policies and community guidelines. By
quantifying the severity of prejudice, platforms can prioritize the enforcement of stricter
measures for highly ofensive content, thereby reducing the potential harm inflicted upon
marginalized individuals or groups.</p>
      <p>Societally, our work contributes to raising awareness about the prevalence and impact of
ofensive content and prejudice in online discourse. By shedding light on these issues, we foster
conversations surrounding the responsible use of language, respect for diverse perspectives,
and the importance of fostering inclusive communities. The availability of accurate models for
detecting and quantifying prejudice can lead to a more informed public discourse, challenging
harmful narratives and promoting social harmony.</p>
      <p>Moreover, our research opens avenues for further exploration and development in the field
of NLP and content moderation. As online platforms continue to grapple with the challenges
posed by ofensive content, our work provides a foundation for future research in refining and
expanding the capabilities of models in detecting and combating diferent forms of harmful
language.</p>
      <p>In summary, the implications of our work encompass practical advancements in content
moderation, societal awareness and discourse, and the potential for continued research and
development. By addressing the challenges of ofensive content and prejudice detection, we
contribute to creating more inclusive online spaces that foster respectful interactions and protect
individuals from the adverse efects of prejudiced language.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to express our sincere gratitude to Professor Paolo Rosso and Professor Reynier
Ortega for their invaluable guidance, support, and expertise throughout the duration of this
project. Their insights and knowledge in the field of natural language processing have been
instrumental in shaping our research and methodologies.</p>
      <p>We would also like to extend our appreciation to IberLEF for organizing the shared task that
provided us with the opportunity to enhance our NLP skills and contribute to the advancement
of the field. The platform’s dedication to fostering collaboration and innovation in NLP research
is commendable, and we are grateful for the opportunity to participate in this meaningful
project.
[11] J. Cañete, S. Donoso, F. Bravo-Marquez, A. Carvallo, V. Araujo, ALBETO and DistilBETO:</p>
      <p>Lightweight Spanish Language Models, 2023. arXiv:2204.09145.
[12] J. Cañete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Spanish Pre-Trained BERT</p>
      <p>Model and Evaluation Data, in: PML4DC at ICLR 2020, 2020.
[13] Fine-tuning a BERT model | Text |  TensorFlow, 2023. URL: https://www.tensorflow.org/
tfmodels/nlp/fine_tune_bert.
[14] Optuna - A hyperparameter optimization framework, 2023. URL: https://optuna.org/.
[15] El modelo secuencial | TensorFlow Core, 2023. URL: https://www.tensorflow.org/guide/
keras/sequential_model?hl=es-419.
[16] T. Sharp, An Introduction to Support Vector Regression (SVR), 2023. URL: https://
towardsdatascience.com/an-introduction-to-support-vector-regression-svr-a3ebc1672c2.
[17] V. Margot, An original method to combine regression
estimators in Python, 2022. URL: https://towardsdatascience.com/
an-original-method-to-combine-regression-estimators-in-python-b9247141263.
[18] Y. Verma, A beginner’s guide to stacking ensemble deep learning models, 2022. URL: https://
analyticsindiamag.com/a-beginners-guide-to-stacking-ensemble-deep-learning-models/.
[19] 1.11. Ensemble methods, 2023. URL: https://scikit-learn/stable/modules/ensemble.html.
[20] Building an Ensemble Learning Based Regression Model using
Python, 2023. URL: https://www.section.io/engineering-education/
ensemble-learning-based-regression-model-using-python/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosá</surname>
          </string-name>
          ,
          <source>Overview of the HAHA Task: Humor Analysis Based on Human Annotation at IberEval</source>
          <year>2018</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Etcheverry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosá</surname>
          </string-name>
          ,
          <source>Overview of HAHA at IberLEF 2019: Humor Analysis Based on Human Annotation</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Góngora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosá</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Meaney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          , Overview of HAHA at Iberlef 2021:
          <article-title>Detecting, Rating and Analyzing Humor in Spanish</article-title>
          ,
          <source>in: Procesamiento del Lenguaje Natural (SEPLN)</source>
          , volume
          <volume>67</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ortega-Bueno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , E. Fersini,
          <source>Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO) at PAN</source>
          <year>2022</year>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Frenda</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. A.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Patti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <source>The Unbearable Hurtfulness of Sarcasm. Expert Systems with Applications (ESWA)</source>
          , volume
          <volume>193</volume>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Merlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ortega-Bueno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , When Humour Hurts: Linguistic Features to Foster Explainability,
          <source>in: Procesamiento del Lenguaje Natural (SEPLN)</source>
          , volume
          <volume>70</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Labadie-Tamayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Everybody Hurts, Sometimes. Overview of HUrtful HUmour at IberLEF 2023:
          <article-title>Detection of Humour Spreading Prejudice in Twitter</article-title>
          ,
          <source>in: Procesamiento del Lenguaje Natural (SEPLN)</source>
          , volume
          <volume>71</volume>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <source>Enriching Word Vectors with Subword Information, arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wongvorachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bulut</surname>
          </string-name>
          ,
          <article-title>A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining</article-title>
          ,
          <source>Information</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <article-title>54</article-title>
          . URL: https://www.mdpi.com/2078-2489/14/1/54. doi:
          <volume>10</volume>
          .3390/ info14010054, number: 1 Publisher: Multidisciplinary Digital Publishing Institute.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>