CLEF 2024 JOKER Task 2: Using BERT and Random Forest Classifier for Humor Classification According to Genre and Technique ⋆

CLEF 2024 JOKER Task 2: Using BERT and Random Forest Classifier for Humor Classification According to Genre and Technique ⋆ MSaipranav saipranav2310324@ssn.edu.in Sri Sivasubramaniya Nadar College Of Engineering

Chennai

JaswanthSridharan Sri Sivasubramaniya Nadar College Of Engineering

Chennai

GauthamNarayan gauthamnarayan2310332@ssn.edu.in Sri Sivasubramaniya Nadar College Of Engineering

Chennai

AngelDeborah angeldeborahs@ssn.edu.in Sri Sivasubramaniya Nadar College Of Engineering

Chennai

SamyuktaaSivakumar samyuktaa2210189@ssn.edu.in Sri Sivasubramaniya Nadar College Of Engineering

Chennai

Evaluation Forum

September 09-12 2024 Grenoble France

CLEF 2024 JOKER Task 2: Using BERT and Random Forest Classifier for Humor Classification According to Genre and Technique ⋆ 1613-0073 A5C335067DDA3131E5C08D3152B110E1 GROBID - A machine learning software for extracting information from scholarly documents Humor Genre Classification BERT TF-IDF Vectors Sentence Embedding Random Forest

In this paper, we present our work for the Automatic Humour Analysis (JOKER) Lab at CLEF 2024. The objective of the JOKER Lab is to research the automated processing of humour that includes tasks such as retrieval, classification, and interpretation of various forms of humorous texts. Our task involved the classification of humorous texts into different genres for which we undertook two different approaches. These approaches involved the usage of BERT (a transformer architecture) and a traditional machine learning model such as a Random Forest classifier. Out of the two models, BERT had a higher accuracy score of 0.6731. From this, we concluded BERT is better for most Natural Language Processes. We showcase our experiments on the training data and the results on the provided test dataset are presented in the forthcoming pages.

Introduction

Humor plays a crucial role in human communication and social interaction. However, it is multifaceted and elicits different types of responses from various types of audiences. Accurate classification of humor not only enhances our understanding of its various forms but also has practical application in fields such as sentiment analysis, human-computer interaction and social media content moderation.

Traditional humor classification techniques can be labor and time consuming. Automating this process through NLP and ML techniques can improve the efficiency and accuracy of humor classification, benefiting academic research. With the proliferation of digital media, humor is more pervasive and varied than ever, presenting a challenge to even state of the art models to discern the differences between various genres of humor.

The CLEF 2024 JOKER [1][2] [3] Track comprised of 3 tasks, which were: Task 1-Humor-aware information retrieval [1], Task 2-Humour classification according to genre and technique [1] and Task 3-Translation of puns from English to French [1]. We participated in task 2.

By leveraging some advanced natural language processing techniques and fine-tuning some of the well-known pre-trained models, this study for the chosen task -2 aims to develop a system capable of accurately classifying text into the following humor categories

• IR -Irony relies on a gap between the literal meaning and the intended meaning, creating a humorous twist or reversal. • SC -Sarcasm involves using irony to mock, criticize, or convey contempt.

• EX -Exaggeration involves magnifying or overstating something beyond its normal or realistic proportions.

• AID -Incongruity refers to the unexpected or contradictory elements that are combined in a humorous way and Absurdity involves presenting situations, events, or ideas that are inherently illogical, irrational, or nonsensical. • SD -Self-deprecating humor involves making fun of oneself or highlighting one's own flaws, weaknesses, or embarrassing situations in a lighthearted manner. • WS -Wit refers to clever, quick, and intelligent humor and Surprise in humor involves introducing unexpected elements, twists, or punchlines that catch the audience off guard.

This automated approach significantly benefits various fields by providing deeper insights into the mechanics of humor and enhancing the way machines understand and respond to human emotions.

Approach

We took up 2 approaches for the humor classification task: multiclass classification using BERT base uncased and classification using Random Forest classifier. Preprocessing of the data was done differently for both methods.

Data Preparation

The provided dataset [3] consisted of 1742 examples of text that must be classified into the abovementioned 7 genres of humor. We partitioned the dataset into an 80% training dataset and a 20% validation dataset. The content from the dataset was of the following format The winter drive -by shooting was a slay ride.

WS 537

Basic text preprocessing was done to the provided dataset. Firstly, the class identifiers for each humorous text were mapped with respective numerical values. All texts were stripped of punctuation, stop words, and other special characters. These texts were then lemmatized. This preprocessed dataset was directly used for BERT (see figure 1)

For the approach involving the use of the Random Forest classifier, the preprocessed text data were further prepared by combining Sentence Transformer, a pre-trained model, and TfidfVectorizer, a scikit-learn tool, to generate sentence embeddings and TF-IDF feature vectors, respectively (see figure 2 )

SentenceTransformer: This pre-trained model (multi-qa-mpnet-base-dot-v1) from the sentencetransformers library is utilized to generate sentence embeddings. This model captures the semantic meaning of text at the sentence level, effectively embedding the contextual nuances and relationships between words within sentences. The target labels (classes) are extracted from the data frame to prepare the target variable for model training and evaluation. This extraction isolates the dependent variable, which the machine learning model will learn to predict based on the input feature set which is a combination of the TF-IDF vectors and sentence embeddings.

Methodology

BERT

BERT [4] stands for Bidirectional Encoder Representations from Transformers. It is faster and is better at capturing context than normal Long Short Term Memory or other traditional models. BERT is pretrained on a large corpus of text using two unsupervised learning tasks namely Masked Language Model (MLM) and Next Sentence Prediction(NSP). In MLM, a percentage of the input tokens are randomly masked, and the model is trained to predict the original tokens based on the context of the surrounding words. This bidirectional context allows BERT to learn representations that capture deeper semantic meaning. For NSP, pairs of sentences are sampled from the corpus, and the model is trained to predict whether the second sentence follows the first one. This exercise helps BERT to understand relationships between sentences and improves its ability to handle tasks like question answering and natural language inference.

BERT [5] consists of a stack of Transformer encoder layers. In the case of BERT Base Uncased, it has 12 such layers. Each layer contains self-attention mechanisms and feedforward neural networks.

At every layer, BERT calculates the attention scores for each token in the input sequence, indicating the importance of other tokens about it. This allows BERT to understand contextual information by trying to understand all tokens in the input sequence simultaneously, in both directions. After selfattention, the output is passed through a feedforward neural network, typically with a ReLU activation function. This network helps find complex patterns in the data and further improves the representations learned by the self-attention mechanism (see figure 3).

Before inputting text into BERT, it undergoes tokenization into subword units using WordPiece tokenization. This allows BERT to handle out-of-vocabulary words effectively. Each input sequence is then represented as a combination of three types of embeddings namely token, segment, and positional embedding. Token Embedding represents the identity of each token in the input sequence. These embeddings are learned during the pre-training stage and understand the semantic meaning of individual words. Segment Embedding indicates whether a token belongs to the first sentence or the second sentence in a pair of sentences. This helps BERT understand the relationship between sentences, especially in tasks like question answering and natural language processing. Positional Embedding encodes the position of each token in the input sequence allowing BERT to capture sequential information and understand the order of words in a sentence.

After pre-training, BERT can be fine-tuned on specific tasks using task-specific labeled data [6]. During fine-tuning, the pre-trained parameters are adjusted to optimize performance on the task. Fine-tuning BERT on specific tasks enables it to achieve state-of-the-art results across various natural language processing tasks. Random Forest [7] is an ensemble classifier that contains several decision trees. Instead of using a single decision tree, this ensemble method leverages the decision-making ability of multiple decision trees and based on the majority number of predictions, the final output is predicted. The prepared input feature set is passed to the Random Forest classifier comprising 1500 decision trees. (see figure 5) The use of out-of-bag samples is also enabled to estimate the generalization accuracy of the model. This provides an internal cross-validation measure of the model performance. Decision trees make up the most fundamental component of the Random Forest classifier. Each decision tree works to find the best split to divide the data into multiple subsets and is trained through the Classification and Regression Tree (CART) algorithm. Gini impurity, information gain, or mean square error are some of the commonly used metrics to evaluate the quality of the split. A single decision tree can be prone to bias and over-fitting, hence an ensemble classifier consisting of multiple decision trees is used to improve the accuracy of the predictions. Random Forest algorithm (see figure 4 ) makes use of bagging and feature randomness to create an uncorrelated forest of decision trees. Each tree in the ensemble comprises of data sample drawn from the provided training data set with replacement. One-third of it is set as the out-of-bag sample. The diversity of the dataset is increased and correlation among the decision trees is reduced through feature bagging. For a classification task, such as the one performed, the most frequent categorical variable will yield the predicted class. Finally, the out-of-bag sample is used for cross-validation.

Random Forest

Results

The metrics of precision, recall, accuracy, and f1-score are reported for the two models that were used to complete the given task. Precision is calculated mathematically as the ratio of true positives and the sum of true and false positives. Accuracy is the ratio of the number of correct predictions to the total number of data points. Recall is calculated as the ratio of true positive and the sum of true positive and false negative. The F1 score is calculated from the values of precision and recall. It mathematically, is equal to twice the ratio of the product of precision and recall to the sum of precision and recall.

Tables 2 and 3 summarise the results of our runs as sent by the Joker lab for the fore mentioned approaches. These were carried out on the provided test dataset. Using a transformer architecture model such as BERT gave a higher accuracy of 0.6731 compared to a traditional machine learning model such as the Random Forest Classifier which only gave a accuracy of 0.5235.

Conclusions

As mentioned before two different approaches were used to solve the given task. The first approach involved using a transformer architecture such as BERT. The second approach involved using a traditional machine learning model such as a Random Forest classifier. Higher accuracy (0.6731) of BERT suggests that using transformer architecture like BERT for classification proves to be more accurate than traditional and feature-dependent machine learning models that are commonly used for classification. Overall, it can be concluded that BERT's deep contextual and language understanding with its ability to leverage transfer learning, makes it better suited for the nuanced task of humor classification according to genre.

Figure 1 :1Figure 1: Data Preprocessing

Figure 2 :2Figure 2: Sentence Embedding/TFidf Vectorisation of Preprocessed Data

Figure 3 :3Figure 3: BERT Classification Process

Figure 4 :4Figure 4: Random Forest Classification Process

Figure 5 :5Figure 5: A Random Forest comprising of 3 decision trees.

Table 11Different Classes of Humor from the given Train DatasetIdTextClassNumber of texts available per class1112Honesty may be the best policy, butSC356insanity is the best defense.782no more instagram. we must all return toIR212scrapbooks.484The answer is going to a grocery storeEX125during a pandemic . That's what I'd do fora Klondike bar1613Knock knock. Who's there? Tank. TankAID232who? You're welcome.167All my imaginary friends tell me that ISD169need therapy.2140

Table 22Accuracy MetricsModelAccuracyBERT0.6731Random Forest0.5235Table 3Precision, Recall and F1 scoresModelTypePrecision RecallF1BERTmacro0.60240.6027 0.6006weighted0.66620.6731 0.6687Random Forestmacro0.53530.3736 0.3742weighted0.52780.5223 0.4583

Overview of JOKER @ CLEF-2024: Automatic humour analysis LErmakova A.-GBosser TMiller VMPalma Preciado GSidorov AJatowt Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024) Lecture Notes in Computer Science LGoeuriot PMulhem GQuénot DSchwab LSoulier GM DNunzio PGaluščáková AG SDe Herrera GFaggioli NFerro

Cham

Springer 2024 To appear The joker corpus: English-french parallel data for multilingual wordplay recognition LErmakova A.-GBosser AJatowt TMiller 10.1145/3539618.3591885 doi:10.1145/3539618.3591885 Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '23 the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '23

New York, NY, USA

Association for Computing Machinery 2023 LErmakova A.-GBosser TMiller TThomas-Young VPreciado GSidorov AJatowt 10.1007/978-3-031-56072-9_5 CLEF 2024 JOKER Lab: Automatic Humour Analysis 2024 Bert: A review of applications in natural language processing and understanding MVKoroteev arXiv:2103.11943 2021 JDevlin M.-WChang KLee KToutanova arXiv:1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding 2018 arXiv preprint SPrabhu MMohamed HMisra arXiv:2104.14289 Multi-class text classification using bert-based active learning 2021 arXiv preprint A random forest guided tour GBiau EScornet Test 25 2016