CLEF 2024 JOKER Task 2 : Using RoBERTa and
                         Bert-uncased for Humour Classification According to
                         Genre and Technique
                         Notebook for the Joker Lab at CLEF 2024

                         Sarvesh Narayanan1,† , Jayasimman J1,† and Shiva Ganesh V1,†
                         1
                             SSN College of Engineering


                                        Abstract
                                        Humor classification is a complex task in natural language processing (NLP) that involves identifying and
                                        categorizing humor based on its various forms and techniques. This study explores the use of two pre-trained
                                        transformer models, RoBERTa and BERT-uncased, for humor classification according to genre and technique. The
                                        models are fine-tuned on a dataset annotated with different humor genres (such as sarcasm, irony, exaggerations
                                        and witty jokes). The performance of each model is evaluated based on accuracy, precision, recall, and F1-score.
                                        Our results demonstrate that both RoBERTa and BERT-uncased are effective in capturing the nuances of humor,
                                        with RoBERTa showing a slight edge in overall performance. This research highlights the potential of transformer
                                        models in advancing the field of humor classification and provides insights into their applicability for more
                                        nuanced and context-aware NLP tasks.

                                        Keywords
                                        Humor Classification, Natural Language Processing (NLP), RoBERTa, BERT-uncased


                         1. Introduction
                         Humor is an intricate aspect of human communication that poses significant challenges for natural
                         language processing (NLP). The ability to recognize and classify humor is crucial for enhancing human-
                         computer interactions, particularly in applications like virtual assistants, chatbots, and social media
                         analysis. Despite its importance, automatic humor classification remains a difficult task due to the
                         subjective and context-dependent nature of humor.

                         1.1. Motivation for Research:
                         The motivation for this research stems from the need to improve the accuracy and sophistication of
                         humor classification systems. Previous approaches have often struggled to capture the subtle nuances
                         and diverse techniques of humor, which can vary significantly across different genres. By leveraging
                         advanced transformer models such as RoBERTa and BERT-uncased, we aim to develop more robust
                         and nuanced humor classification methods. These models, pre-trained on vast amounts of text data,
                         have shown remarkable performance in various NLP tasks and offer the potential to enhance humor
                         recognition.

                         1.2. Task Description:
                         This paper focuses on humor classification according to genre and technique, as outlined in the CLEF
                         2024 JOKER task 2 [1][2]. The task involves categorizing humorous texts into predefined genres (such
                         as satire, sarcasm, and puns) and techniques (such as wordplay, incongruity, and exaggeration). The


                          CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         †
                           These authors contributed equally.
                          $ sarveshnarayanan4@gmail.com (S. Narayanan); jayasimmanj27@gmail.com (J. J); shivaviswanathan07@gmail.com
                          (S. G. V)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
JOKER lab, introduced in several key publications, provides a comprehensive dataset for this task,
allowing for rigorous evaluation of our models.

1.3. State-of-the-Art Overview:
Recent advancements in NLP have been dominated by transformer-based models, which have achieved
state-of-the-art results in numerous tasks. RoBERTa, a robustly optimized BERT approach, has demon-
strated superior performance due to its extensive pre-training and fine-tuning capabilities. BERT-
uncased, known for its bidirectional training of transformers, excels in understanding context and
nuance in text. Prior studies have explored these models for various classification tasks, but their
application to humor classification remains under-explored. This study seeks to address this gap by
evaluating the performance of RoBERTa and BERT-uncased models in humor classification tasks.
   This structured approach aims to provide a comprehensive understanding of the application of
transformer models in humor classification, highlighting their strengths and limitations in this nuanced
task.


2. Approach
The dataset for this task is provided by the CLEF 2024 JOKER lab, which focuses on automatic humor
analysis. The dataset consists of humorous texts categorized according to genre (such as sarcasm, irony,
exaggerations and witty jokes). This diverse dataset enables the development and evaluation of models
capable of distinguishing between different types and techniques of humor. The two datasets train-input
and train-qrels are combined to have the format as in Table 1 .

Table 1
Table with ID, Text, and Class columns.
                   ID     Text                                                    Class
                  1661    How do you organize a space party? You planet.           AID

   The dataset is pre-processed to ensure consistent formatting, with texts tokenized and annotated for
genre and technique. This structured data allows for the effective training and evaluation of machine
learning models.
   To tackle the humor classification task, we employed two transformer-based models: BERT-uncased
and RoBERTa. These models were chosen for their proven capabilities in understanding contextual and
semantic nuances in text.

2.0.1. BERT-uncased
BERT [3] (Bidirectional Encoder Representations from Transformers) is a transformer model pre-
trained on a large corpus of English text. The uncased version ignores case distinctions, making it
robust to variations in capitalization. BERT-uncased is fine-tuned for humor classification by adding a
classification layer on top of the pre-trained model.

2.0.2. RoBERTa
RoBERTa [4] (A Robustly Optimized BERT Pretraining Approach) builds on BERT’s architecture with
improvements in pre-training methodology. RoBERTa undergoes more extensive training on a larger
dataset, resulting in enhanced performance.

2.1. Training Setup
    • Data Split: The dataset is split into training, validation, and test sets with an 80-20 ratio.
    • Batch Size: 8
    • Learning Rate: 5e-5
    • Epochs: 3
    • Optimizer: Adam
    • Loss Function: Sparse Categorical Cross-Entropy Loss

2.2. Fine-Tuning Process
The models are fine-tuned using the training set, with hyperparameters optimized based on performance
on the validation set. Fine-tuning involves adjusting the weights of the pre-trained model to minimize
the classification error.

2.3. Model Evaluation
The performance of the models is evaluated based on their accuracy in distinguishing between different
genres and techniques of humor. By evaluating the models based on accuracy, we aim to assess
their ability to capture the nuances of humor and classify texts correctly. This metric provides a
straightforward measure of the models’ performance and their effectiveness in humor classification
tasks.

   This methodological approach ensures a rigorous and reproducible framework for humor classifica-
tion, leveraging state-of-the-art NLP models to capture the complexities of humor in text.


3. Results
Our study delved into the realm of humor classification using state-of-the-art transformer models,
specifically RoBERTa and BERT-uncased. The results of our experiment showcase the effectiveness of
these models in deciphering the intricacies of humor, albeit with some limitations.
   Overall, our models demonstrated competitive performance, with RoBERTa slightly surpassing BERT-
uncased by achieving an accuracy of 70% compared to 67%. This indicates that both models are adept
at grasping the nuances of humor, as evidenced by their ability to classify jokes based on genre and
technique with reasonable accuracy.

Table 2
Test Results
               model      accuracy    SD_precision      SD_recall    SD_f1-score     SD_support
              roberta        0.70           0.37           0.66          0.47           38.00
           bert uncased      0.67           0.69           0.84          0.76           91.00

   Despite their overall effectiveness, both RoBERTa and BERT-uncased exhibited challenges when
confronted with certain types of humor. Exaggeration-style jokes and those heavily reliant on cultural
references proved to be particularly problematic for the models. These types of humor often require a
deep understanding of context and cultural nuances, which may have posed difficulties for the models’
classification abilities.
   To address these challenges and further enhance the performance of transformer models in humor
classification, several strategies could be explored. One approach could involve incorporating additional
contextual information, such as cultural context or background knowledge, into the models. This could
potentially help the models better understand and classify jokes that rely heavily on cultural references.
   Furthermore, fine-tuning strategies tailored to these specific types of humor could also prove beneficial.
By focusing on optimizing the models’ performance on exaggeration-style jokes and culturally nuanced
humor, we may be able to improve their overall classification accuracy.
   As a result, while transformer models like RoBERTa and BERT-uncased show promise for humor
classification, there is still room for improvement, especially when dealing with more nuanced forms
of humor. Future research in this area could focus on developing more sophisticated models and
fine-tuning strategies to better capture the complexities of humor in text.


4. Discussion
The experiments conducted demonstrate the potential of transformer-based models in humor classifi-
cation. Both BERT-uncased and RoBERTa showed considerable effectiveness, but several challenges
remain. These models struggled with jokes heavily dependent on cultural references or requiring an
understanding of specific exaggerations. This indicates a need for incorporating more diverse training
data and possibly integrating external knowledge sources to handle such cases better. Additionally, the
importance of model interpretability should not be overlooked, as understanding why a model classifies
a text as humorous is crucial for further development and trust in AI systems.


5. Future Work
Future work could involve exploring more sophisticated ensemble methods, incorporating additional
features such as sentiment and semantic similarity, and applying these models to larger and more diverse
datasets. Further investigation into interpretability and the handling of ambiguous or context-dependent
humor could also provide valuable insights for enhancing humor classification systems. Enhancing the
dataset with more varied and culturally rich humorous texts could also improve the models’ robustness
and applicability across different contexts and populations.


6. Conclusion
In this paper, we explored the use of advanced transformer-based models, specifically BERT-uncased
and RoBERTa, for humor classification according to genre and technique. By leveraging these pre-
trained models and fine-tuning them on the provided dataset, we aimed to improve the accuracy
and robustness of humor classification systems. Our experiments demonstrated that both models are
effective in capturing the nuances of humor, with RoBERTa slightly outperforming BERT-uncased in
overall performance metrics.
   Key observations from our experiments indicate that transformer models, with their deep contextual
understanding and extensive pre-training, can significantly enhance the ability to classify humorous
texts. The results also highlight the importance of fine-tuning and hyperparameter optimization in
achieving optimal performance.


References
[1] L. Ermakova, T. Miller, A.-G. Bosser, V. M. P. Preciado, G. Sidorov, A. Jatowt, Overview of joker -
    clef-2024 track on automatic humor analysis, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab,
    L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro (Eds.),
    Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth
    International Conference of the CLEF Association (CLEF 2024), 2024.
[2] V. M. P. Preciado, et al., Overview of the clef 2024 joker task 2: Humour classification according to
    genre and technique, in: G. Faggioli, et al. (Eds.), Working Notes of the Conference and Labs of the
    Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, CEUR-WS.org, 2024.
[3] M. Geetha, D. K. Renuka, Improving the performance of aspect based sentiment analysis using
    fine-tuned bert base uncased model, International Journal of Intelligent Networks 2 (2021) 64–69.
[4] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
    Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
    URL: https://doi.org/10.48550/arXiv.1907.11692.