JOKER Track @ CLEF 2024: the Jokesters’ Approaches for Retrieving, Classifying, and Translating Wordplay Harouna Baguian1,† , Nina Ashley Huynh1 1 Ecole Nationale d’Ingénieurs de Brest, 945 Av. du Technopôle, 29280 Plouzané, France Abstract Humor-sensitive information retrieval presents unique challenges, particularly the need to understand wordplay and implicit cultural references. This paper presents our work on the JOKER 2024 track at CLEF. The first task will explore the use of TF-IDF (Term Frequency-Inverse Document Frequency) weighting combined with logistic regression to improve the efficiency of retrieving relevant humorous texts. The second task will focus on classifying humour texts based on genres and humor techniques. The architecture combines stacking and weighted voting approaches to optimize classification performance. The goal is to leverage the strengths of different base models to enhance classification accuracy. The third task will be to translate texts from English into French. Ensuring the preservation of meaning while maintaining fluency and contextual appropriateness is particularly difficult. To address this issue, we employ an approach based on the MarianMTModel, a neural machine translation model designed for translating text between various languages. Keywords Logistic Regression, Machine learning, TF-IDF, SVC, DecisionTreeClassifier, RandomForestClassifier, Gradient- BoostingClassifier, DecisionTreeClassifier, Stacking, voting, MarianMTModel, Fine-tuning, Introduction The JOKER workshop aims to foster reseaarch about the computational processing of humour and worplay [1]. The 2024 edition at CLEF proposes three tasks: Task 1 of JOKER 2024 focuses on retrieving humorous texts in response to specific queries. This includes detecting and locating puns in a collection of documents. This part describes an approach using TF-IDF weighting and logistic regression for this task. Task 2 is about Classifying humor. It is a complex task, given the diversity of genres and techniques involved. In order to deal with this complexity, we have developed a hybrid architecture combining several machine learning classifiers via stacking and voting approaches. This architecture aims to improve overall performance by exploiting the strengths of each classifier for different classes. Task 3 concerns Translating texts which is a complex challenge. It is difficult to preserve the meaning of the sentence in humorous texts or texts with puns. There are many considerations to produce accurate, fluent and contextually appropriate translations. In order to provide a solution to this problem, we use an approach based on the machine translation model ’MarianMTModel’. 1. Task1 : Humour-aware information retrieval 1.1. TF-IDF TF-IDF is a commonly used weighting technique to evaluate the importance of a word in a document relative to a corpus. Term Frequency (TF) measures how often a term appears in a document, while Inverse Document Frequency (IDF) measures the importance of this term in the entire corpus. The CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France * Corresponding author. † These authors contributed equally. $ baguian.harouna7231@gmail.com (H. Baguian); ninashley23@gmail.com (N. A. Huynh) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings TF-IDF formula is given by: TF-IDF(𝑡, 𝑑) = TF(𝑡, 𝑑) × IDF(𝑡) where TF(𝑡, 𝑑) is the frequency of term 𝑡 in document 𝑑 and IDF(𝑡) is calculated as follows: (︂ )︂ 𝑁 IDF(𝑡) = log |{𝑑 ∈ 𝐷 : 𝑡 ∈ 𝑑}| with 𝑁 being the total number of documents in the corpus and |{𝑑 ∈ 𝐷 : 𝑡 ∈ 𝑑}| the number of documents containing term 𝑡. 1.2. Logistic Regression Logistic regression is a classification model that can predict the probability that a document is relevant to a given query. The logistic function is defined by: 1 𝑃 (𝑦 = 1|𝑋) = 1 + 𝑒−(𝑋·𝛽) where 𝑋 is the feature vector (here, the TF-IDF values of the terms) and 𝛽 is the weight vector learned during model training. 1.3. Application to Task 1 For Task 1 of JOKER 2024, the goal is to retrieve humorous texts relevant to a specific query. The steps are as follows: 1.3.1. Step 1: Identifying Jokes • Collect a corpus of humorous and non-humorous texts. Each text is labeled; 0 for non-joke and 1 for joke. • Pre-processing: The data is cleaned by removing special characters, stop words, etc. • Vectorization: We vectorized the documents using the TF-IDF technique (TfidfVectorizer). • Training: The logistic regression model was trained to distinguish jokes from other texts. • Once the training is complete, we used the model to filter the documents and retain only the jokes. 1.3.2. Step 2: Retrieving Relevant Jokes • TF-IDF Calculation: We apply TF-IDF vectorization to each identified joke to obtain a numerical representation based on term importance. • Creating the TF-IDF Matrix: We construct a TF-IDF matrix where each row represents a joke and each column represents a term, with the TF-IDF values corresponding to the term weights in the jokes. • Query Comparison and Retrieving Relevant Jokes: TF-IDF values are used to calculate the cosine similarity between the queries and jokes. We retrieve the most relevant jokes based on similarity. The following figure show the pipeline diagram for joke retrieval. Figure 1: Pipeline diagram for joke retrieval using logistic regression and TF-IDF vectorization. 2. Task2 : Humour classification according to genre and technique 2.1. Model Architecture The model architecture is illustrated in Figure 2. It consists of two main components: a Stacking Classifier[2] and a Voting Classifier[3]. Figure 2: Humor classification model architecture. 2.1.1. Stacking Classifier The Stacking Classifierr combines several basic classifiers (decision trees, random forestsand gradient boostingand uses logistic regression as the final classifier. Each basic classifier is trained on the input data and its predictions are then used as features to train the final classifier. This approach makes it possible to capture complex patterns by combining the strengths of each base classifier. • DecisionTreeClassifier: used for its simplicity and ability to handle nonlinear interactions between features. • RandomForestClassifier: combines multiple decision trees to improve robustness and accuracy. • GradientBoostingClassifier: uses a boosting approach to correct the errors of previous classi- fiers and enhance overall performance. • LogisticRegression: used as the final classifier to combine the predictions of the base classifiers. 2.1.2. Voting Classifier The Voting Classifier combines the predictions of the Stacking Classifier and an SVC (Support Vector Classifier) after text vectorization by a TfidfVectorizer. The final predictions are obtained by a weighted voting, where the Stacking Classifier and SVC contribute to the final decisions according to their performance on different classes. • TfidfVectorizer: Used to transform texts into feature vectors based on term frequency. • SVC: Used for its high performance in text classification tasks. 2.2. Results Model performance is assessed using the confusion matrix and classification metrics. Figure 3 shows the confusion matrix for model predictions. Table 1 shows the main performance metrics, including precision, recall, and F1-score for each class. Figure 3: Confusion matrix of the model’s predictions. Table 1 Model performance metrics 3. Task3 : Translation of puns from English to French 3.1. MarianMTModel ’MarianMTModel’[4] is a model for the automatic translation of texts between different languages. It is based on the Transformer architecture, which comprises two main parts: • Encoder : It takes the source text and generates a contextual representation for each word in the text. • Decoder : It uses these contextual representations to generate the target text word by word. MarianMT models are trained on large multilingual corpus. They are able to translate between a large number of language pairs thanks to training on high-quality aligned data. Being pre-trained on generic data, we used fine-tuning to improve performance on our data.[5] 3.2. Fine-tuning Fine-tuning is a process in machine learning where a model pre-trained on a large corpus of data is re-trained on specific data for a given task. It allows the model to retain the general knowledge acquired during the initial training while learning the specifics of the new task. The Fine-tuning procedure is described as follows: • Loading the Pre-trained Model • Preparation of specific data : tokenisation, alignment, etc. • Re-training : training the model on specific data using appropriate hyperparameters • Rating and adjustments : Evaluate performance on a validation set and adjust hyperparameters if necessary. 3.3. Application to task 3 For task 3 of JOKER 2024, the aim is to translate texts from English into French while preserving the meaning. The various steps are as follows: Step 1: Library import • train_test_split from sklearn.model_selection • pandas: for handling dataframes • json: for data management • datasets: for loading data sets • transformers: for the translation model ’MarianMTModel’ Step 2: Loading data The data is then loaded and converted into dataframes. Step 3: Preparing the training data The data is pre-processed and divided into training and validation sets Step 4: Loading the Model and Tokenizer The MarianMT model and tokenizer are loaded from Helsinki-NLP Step 5: Data pre-processing Tokenisation of texts Step 6: Model configuration and training The training arguments are defined and the model is trained using Seq2SeqTrainer Step 7: Saving the Model and Tokenizer Conclusion For humour information retrieval, we have worked on a lightweight method, using TF-IDF and logistic regression, which gives acceptable results for humour text identification and extraction. Future work will focus on the integration of larger models to further improve the performance and accuracy of humorous text retrieval tasks. For humour classification based on genre and technique, we have proposed an architecture that combines the advantages of stacking and voting approaches to improve the performance of humour text classification. By exploiting the strengths of the different classifiers, this approach provides a better understanding of the diversity of humour texts, genres and techniques, resulting in a more accurate and robust classification. As part of the Translation of puns from English into French, we used the MarianMTM model for JOKER Task 3, combined with fine-tuning, offers state-of-the-art machine translation performance thanks to the Transformer architecture, while minimising the time and resources required. It is easy to use and can be adapted to different languages. Although very powerful, the model can nevertheless encounter difficulties with very subtle nuances or complex cultural references. References [1] L. Ermakova, T. Miller, A. Bosser, V. M. Palma-Preciado, G. Sidorov, A. Jatowt, Overview of CLEF 2024 JOKER track on automatic humor analysis, in: L. Goeuriot, G. Q. Philippe Mulhem, D. Schwab, L. Soulier, G. M. D. Nunzio, P. Galuščáková, A. G. S. de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Springer, 2024. [2] scikit-learn developers, StackingRegressor — scikit-learn 1.5.0 documentation, 2024. URL: https: //scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingRegressor.html. [3] scikit-learn developers, VotingClassifier, 2024. URL: https://scikit-learn/stable/modules/generated/ sklearn.ensemble.VotingClassifier.html. [4] M. Junczys-Dowmunt, R. Grundkiewicz, T. Dwojak, H. Hoang, K. Heafield, T. Neckermann, F. Seide, U. Germann, A. F. Aji, N. Bogoychev, A. F. T. Martins, A. Birch, Marian: Fast neural machine translation in c++, 2018. arXiv:1804.00344. [5] MarianMT — transformers 3.5.0 documentation, ???? URL: https://huggingface.co/transformers/v3. 5.1/model_doc/marian.html#.