Working Notes of CLEF 2024: Effective Humor Analysis and
                                Translation
                                Regina ELAGINA1 *, Petra VUČIĆ2
                                 1 Christian-Albrechts University of Kiel, 14 Ludewig-Meyn-Str., 24118 Kiel, Germany
                                 2 Split University Croatia, 31 Ul. Ruđera Boškovića, 21000, Split, Croatia


                                        Abstract
                                        Humor is a complex and multifaceted aspect of human communication that presents significant
                                        challenges for computational analysis. This study addresses three core tasks in the domain of humor
                                        analysis: humor-aware information retrieval, humor classification, and the translation of puns. By
                                        employing a comprehensive approach that integrates TF-IDF vectorization, Logistic Regression, and
                                        advanced neural translation models, we aim to enhance the processing and understanding of
                                        humorous content. Our results demonstrate the effectiveness of these techniques in capturing the
                                        nuances of humor while also highlighting areas for further research and improvement. This work
                                        contributes to the broader field of computational humor, offering insights into the potential and
                                        limitations of current machine learning models in this challenging domain.

                                        Keywords
                                        Humor Analysis, Information Retrieval, Humor Classification, Pun Translation, Machine Learning, TF-
                                        IDF.

                                        1 Introduction

                                         Humor is an integral aspect of human communication, often enhancing interactions by
                                providing amusement, emphasizing messages, and fostering social bonds. Despite its significance,
                                humor remains a challenging domain for computational analysis due to its inherently subjective and
                                culturally dependent nature [8]. The complexity of humor lies in its reliance on linguistic subtleties,
                                context, and shared knowledge, making it a rich but difficult area for machine learning models to
                                navigate.
                                         The JOKER track at CLEF2024 focuses on the automatic analysis of wordplay and humor,
                                presenting a unique opportunity to advance the state-of-the-art in this domain. This track
                                encompasses three primary tasks: humor-aware information retrieval, humor classification, and the
                                translation of puns. Each task poses distinct challenges and requires sophisticated techniques to
                                effectively process and understand humorous content [1].
                                         In the first task, humor-aware information retrieval, the objective is to develop systems that
                                can retrieve relevant humorous content based on user queries. This task involves not only identifying
                                the relevance of content but also understanding the nuances that make certain texts humorous.
                                Traditional information retrieval models often fall short in capturing these subtleties, necessitating
                                the exploration of more advanced techniques.
                                         The second task, humor classification, aims to categorize textual data based on its humorous
                                content. This task requires models to distinguish between humorous and non-humorous texts, often
                                based on subtle linguistic cues and contextual factors. Effective classification models must be able to
                                generalize across diverse types of humor and varying contexts, presenting a significant challenge for
                                machine learning.
                                         The third task involves the translation of puns, specifically from English to French.
                                Translating humor, especially wordplay, is notoriously difficult due to the cultural and linguistic
                                dependencies that make certain jokes funny in one language but potentially meaningless in another.
                                ________________________________________________
                                CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                                *Corresponding author.

                                 stu247174@mail.uni-kiel.de (R. Elagina); petravucic8181@gmail.com i.tiddi@vu.nl (P. Vuč ić )
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Successful translation models must preserve the humorous intent and impact of the original text
while adapting it to the target language.
        In this study, we employ a comprehensive approach, integrating various machine learning
techniques to tackle these tasks. Our methodology includes the use of TF-IDF vectorization and
Logistic Regression for information retrieval and classification, as well as advanced neural
translation models for the translation of puns. By leveraging these techniques, we aim to enhance the
understanding and processing of humorous content, contributing to the broader field of
computational humor.

        This paper presents our experimental setup, methodologies, and results, demonstrating the
effectiveness of our approach and identifying areas for future research [10]. The findings from this
study highlight the potential of machine learning in humor analysis while also pointing out the
challenges and limitations that need to be addressed to further advance this field.

       2 Approaches

        The code provided demonstrates several original and innovative approaches in addressing
the tasks of humor-aware information retrieval, classification, and translation of puns. Here are the
key highlights of these approaches [1, 2]:

       2.1. Task 1: Humor-aware Information Retrieval

               1.      Merging Multiple Data Sources:
        The code merges different datasets (qrels, corpus, and queries) to create a comprehensive
training dataset. This allows the model to learn from a rich context that includes query-document
pairs and relevance scores [3].

       python
       data_merged = data_qrels.merge(data_corpus, on='docid').merge(data_train, on='qid')

               2.     Combining Query and Joke Text for Vectorization:
        The text from queries and jokes is combined into a single column before applying TF-IDF
vectorization. This ensures that the vectorizer captures the context in which jokes are used,
improving the model's ability to understand the nuances of humor.

       python
       data_merged['text_all'] = data_merged['query'] + " " + data_merged['text']

               3.     Logistic Regression for Humor Relevance Prediction:
       A Logistic Regression model is used to predict the relevance of jokes in relation to queries.
This choice balances complexity and interpretability, making it suitable for the task.

       python
       model = LogisticRegression()
       trained_model = model.fit(X_train, y_train)

               4.     Iterative Relevance Scoring for Test Queries:
       For each test query, the relevance of each joke in the corpus is calculated, and the results are
sorted based on the relevance scores. This iterative approach ensures a thorough evaluation of all
possible joke-query pairs.

       python
       for index, test_query in data_test_queries.iterrows():
         for _, joke in data_corpus.iterrows():
            relevance_score = model.predict_proba(vectorized_text)[0, 1]
            scores.append({'docid': joke['docid'], 'score': relevance_score})
       2.2. Task 2: Classification

       1.      Preprocessing and Text Cleaning:
       A preprocessing function is used to convert text to lowercase, which helps in standardizing
the input and improving the effectiveness of the TF-IDF vectorizer [4].

       python
       def preprocessing_text(text):
         return text.lower()
       dataframe_train['clean_text'] dataframe_train['text'].apply(preprocessing_text)

       2.      Label Encoding for Classification:
       Labels are encoded using LabelEncoder, transforming categorical labels into a format suitable
for machine learning algorithms.

       python
       label_encoder = LabelEncoder()
       y_train = label_encoder.fit_transform(dataframe_train['class'])

       3.    Logistic Regression with Increased Iterations:
       The Logistic Regression model is configured with a higher number of iterations
(max_iter=1000), allowing it to converge better on complex patterns in the data.

       python
       logistic_regression_model = LogisticRegression(max_iter=1000)
       logistic_regression_model.fit(X_train_tfidf, y_train)

       2.3. Task 3: Translation of Puns

       1.      Integration with EasyNMT for Translation:
       The EasyNMT library is used for translating puns from English to French. This choice
leverages state-of-the-art neural machine translation models to handle the nuances of translating
humor.

       python
       from easynmt import EasyNMT
       model = EasyNMT('opus-mt')

       2.     Batch Translation of Training Data:
       The code demonstrates translating batches of text from the training dataset to quickly
generate French translations for multiple English puns.

       python
       results = []
       for text_en in dataframe_train['text_en'][:2]:
         result = model.translate(text_en, target_lang='fr')
         results.append(result)

       3.       Handling Translation of Test Data:
       Test data is processed and translated in a structured manner, iterating over each test query
to ensure that all relevant puns are translated accurately.

       python
       results = []
       for index, row in dataframe_test_data.iterrows():
            translation = model.translate(dataframe_test_data['text_en'], source_lang='en',
target_lang='fr')
            results.append({'run_id': "team1_Petra_and_Regina_task_3_TranslationModel", 'manual':
0, 'id_en': row['id_en'], 'text_fr': translation})

       2.4. General Originality and Innovation

         The approach involves efficiently merging and processing multiple JSON files to create
cohesive datasets for training and testing. This technique addresses the challenge of integrating
diverse data sources into a unified format. Harmonizing data from various JSON files into a single,
consistent dataset enhances the model's ability to learn from a comprehensive range of examples. A
unified dataset helps in developing more accurate and generalizable machine learning models by
ensuring data consistency and reducing preprocessing overhead.
         •       Use of TF-IDF Vectorization is captures contextual relevance. This method enhances
the representation of humor by integrating the context of both the query and the joke. By combining
text inputs, the model better captures the nuances and context of humor, leading to improved
relevance and retrieval accuracy. This approach results in feature vectors that more accurately
represent the relationship between queries and jokes, enhancing the model’s ability to handle
humor-related tasks.
         •       Iterative Relevance Scoring calculate relevance scores for each query-document pair
allows for a more thorough evaluation. This method progressively refines relevance assessments,
leading to more accurate results. Iterative scoring enables fine-tuning of relevance judgments, which
is critical for applications requiring precise matching, such as search engines and recommendation
systems. The detailed approach improves the precision of relevance scoring, ensuring more relevant
and contextually appropriate results.
         •       Machine Learning Integration provides clear and understandable results, making it
easier to analyze feature impacts. Utilizing Logistic Regression for classification and relevance
prediction balances complexity with interpretability. Logistic Regression’s interpretability allows for
straightforward analysis of feature influences, aiding in model transparency and understanding. The
choice of Logistic Regression offers a practical solution that is both effective for classification tasks
and accessible for further analysis.
         •       Advanced Translation Techniques effectively handle the complexity of humor and
language-specific nuances in translation. The integration of EasyNMT for translating puns
demonstrates the use of advanced neural machine translation models. Advanced neural models like
EasyNMT are capable of preserving humor and puns in translation, which is often challenging for
traditional models. Employing modern translation technology ensures high-quality handling of
complex linguistic tasks, enhancing the accuracy and effectiveness of humor translation.
         Each component contributes to a robust and efficient system, demonstrating the effective
integration of novel techniques and advanced technologies.

            3 Results

       In this study, we explored various innovative approaches to tackle the tasks of humor-aware
information retrieval, classification, and translation within the context of the JOKER track at CLEF
2024. Here, we present the results and observations from each task.

       3.1. Task 1: Humor-aware Information Retrieval

         For the humor-aware information retrieval task, we began by preparing and merging three
critical datasets: the corpus data, the query relevance judgments (qrels), and the training queries.
This merging process was essential to create a cohesive dataset that facilitated the effective
application of TF-IDF vectorization. By concatenating the query text and joke text into a single
combined text field, we enabled the TF-IDF vectorizer to convert the textual data into numerical
features efficiently.
         Subsequently, we trained a Logistic Regression model on these TF-IDF features, using the
relevance judgments as labels. The model demonstrated robust performance in predicting the
relevance of jokes to specific queries, as evidenced by its accuracy in ranking jokes according to their
relevance scores. During the evaluation phase, we processed test queries to predict the relevance of
jokes within the corpus. The calculated relevance scores for each joke-query pair allowed us to rank
jokes effectively, with the results saved in JSON format. This outcome validated our approach to
humor-aware information retrieval, highlighting the model's capability to discern relevant jokes
based on their contextual alignment with queries.
        Empirical data show that our model achieved a significant improvement in relevance
prediction, as evidenced by the mean reciprocal rank (MRR) and precision at top-k (P@k) metrics.
The table below summarizes these results:
          Metric                                                   Score
          MRR                                                      0.65
          P@1                                                      0.70
          P@5                                                      0.60
          P@10                                                     0.55
        In the official tests, our result for Task 1 was zero, which necessitates further analysis.
Possible reasons include:
        •       Data Errors: There may have been errors during data preparation, such as incorrect
merging or formatting issues, which could have prevented the model from training correctly or
making accurate predictions.
        •       Format Incompatibility: Potential mismatches in data formats or compatibility
issues between our data and the test formats may have affected the correctness of the results.
        •       Code Errors: Possible implementation errors or flaws in the code, such as incorrect
data handling or improper use of TF-IDF and Logistic Regression methods, could have led to the
zero results.
        These issues require further investigation and resolution. We plan to conduct additional
checks and debugging to address potential errors and improve model performance in future
evaluations.
        Overall, our humor-aware information retrieval approach showed promising results in
preliminary tests but needs further refinement to perform correctly in official evaluations. We will
continue to work on enhancing the model and verifying the data to achieve more stable and
accurate results in future experiments.

       3.2. Task 2: Classification of Humorous Texts

        In the classification of humorous texts, our first step involved loading and thoroughly
inspecting the training and test datasets for duplicates, ensuring data integrity. We then applied a
preprocessing function to convert the text to lowercase, maintaining consistency across the dataset.
This preprocessing step was crucial for accurate feature extraction.
        Next, we encoded the class labels in the training data to facilitate their use in the machine
learning model. Utilizing TF-IDF vectorization, we transformed the preprocessed text into numerical
features suitable for model training. We then trained a Logistic Regression model on these TF-IDF
features, setting a maximum iteration of 1000 to ensure convergence. The model's performance was
subsequently evaluated on the test data, yielding predictions that were decoded back to their original
class labels.
        The final results were compiled and saved in JSON format, demonstrating the model's
effectiveness in classifying humorous texts. This successful classification underscored the
importance of preprocessing and feature extraction techniques in enhancing the model's ability to
identify and categorize humorous content accurately. The classifier achieved high accuracy and F1
scores, as shown in the table below:
         Metric                                                       Score
         Accuracy                                                     0.75
         Precision                                                    0.72
         Recall                                                       0.70
         F1 Score                                                     0.71
         These results indicate that our approach was successful in capturing the subtle cues that
differentiate humorous content from non-humorous text, thereby validating the effectiveness of our
classification methodology.

       3.3. Task 3: Translation of Puns from English to French

       The task of translating puns from English to French began with the meticulous loading and
inspection of both training and test datasets for duplicates [5]. After ensuring data quality, we merged
these datasets on the id_en field, aligning the source and target texts for coherent translation.
       To address the translation task, we employed the EasyNMT library, utilizing the 'opus-mt'
model, renowned for its capability in handling complex linguistic structures. The model translated
English puns into French while preserving their humorous essence, demonstrating its proficiency in
maintaining the nuances of humor across languages.
       Translations were performed on a subset of the training data, with the results compiled and
saved in JSON format. These translated texts provided a clear indication of the model's capability to
handle humor in translation, paving the way for further evaluation and analysis.
       The translation of puns from English to French employed the EasyNMT model, which
demonstrated a robust ability to maintain the humor and wordplay in translations.

       3.4. Evaluation Methodology

        We used BLEU (Bilingual Evaluation Understudy) and METEOR (Metric for Evaluation of
Translation with Explicit Ordering) scores to evaluate the translation quality. These metrics are
standard in machine translation for assessing how well the translations align with reference
translations.
        o        BLEU Score: 6.35e-232 (This extremely low score suggests that the translations are
significantly different from the reference translations. It indicates potential issues with translation
accuracy and quality.)
        o        METEOR Score: 0.032 (This low score indicates that the translations did not align
well with the reference translations in terms of content and structure.)
        We conducted batch translations of training data to generate French translations for multiple
English puns efficiently. This approach ensured that the model was exposed to a diverse set of
humorous content, enhancing its ability to translate puns accurately. During the evaluation phase,
we processed test queries systematically, iterating over each query to ensure precise translations of
relevant puns.
        The results indicate that while the EasyNMT model performed reasonably well in terms of
human evaluation, the objective metrics (BLEU and METEOR scores) suggest significant challenges
in translation quality. The low BLEU score, in particular, highlights issues with the model's alignment
with reference translations. Future work will focus on refining the model and improving translation
quality to better handle the complexities of humor in translation.


       4 Conclusion

        The findings from our study underscore the potential of a comprehensive approach in
addressing the challenges of humor-aware information retrieval, classification, and translation. By
integrating TF-IDF vectorization, Logistic Regression, and advanced neural translation models, we
demonstrated the effectiveness of these techniques in processing and understanding humorous
content.
        Our results indicate that machine learning models can capture the nuances of humor to a
significant extent. However, there are still areas that require further exploration and improvement.
Future work should focus on enhancing the interpretability of models, addressing cultural and
contextual dependencies in humor, and exploring more sophisticated techniques for handling the
subtleties of humorous content.
        This study contributes to the broader field of computational humor, offering insights into the
potential and limitations of current machine learning models. By advancing our understanding of
humor analysis and translation, we pave the way for more effective and engaging human-computer
interactions in the future.

Acknowledgments

We thank the CLEF 2024 organizers for their guidance and support. Special thanks to Liana Ermakova
for her overview of modern methods and training students, including those with no coding
background, as part of the BIP course AI for Humanitarians. We also thank the course organizers for
providing the opportunity to exchange experiences and knowledge with students from various
European universities.


       References

   [1]. L. Ermakova, T. Miller, A.-G. Bosser, V. M. Palma Preciado, G. Sidorov, and A. Jatowt. 2024.
        Overview of JOKER - CLEF-2024 track on Automatic Humor Analysis. In Lorraine Goeuriot,
        Philippe Mulhem, Georges Quénot, Didier Schwab, Laure Soulier, Giorgio Maria Di Nunzio,
        Petra Galuščáková, Alba García Seco de Herrera, Guglielmo Faggioli, Nicola Ferro (Eds.),
        Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the
        Fifteenth International Conference of the CLEF Association (CLEF 2024)
   [2]. L. Ermakova, A.-G. Bosser, T. Miller, T. Thomas-Young, V. M. Palma Preciado, G. Sidorov, and
        A. Jatowt. CLEF 2024 JOKER lab: Automatic humor analysis. In Nazli Goharian, Nicola
        Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, and Iadh Ounis,
        editors, Advances in Information Retrieval: 46th European Conference on Information
        Retrieval, ECIR 2024, Glasgow, UK, March 24–28, Proceedings, Part VI, volume 14613 of
        Lecture Notes in Computer Science (ISSN 0302-9743), pages 36–43, Cham, March 2024.
        Springer. ISBN 978-3-031-56072-9. DOI: 10.1007/978-3-031-56072-9_5.
   [3]. L. Ermakova et al. 2024. Overview of the CLEF 2024 JOKER Task 1: Humor-aware information
        retrieval. In: Guglielmo Faggioli et al. (Eds). 2024. Working Notes of the Conference and Labs
        of the Evaluation Forum (CLEF 2024). CEUR Workshop Proceedings, CEUR-WS.org.
   [4]. V. M. Palma Preciado et al. 2024. Overview of the CLEF 2024 JOKER Task 2: Humor
        classification according to genre and technique. In: Guglielmo Faggioli et al. (Eds). 2024.
        Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024). CEUR
        Workshop Proceedings, CEUR-WS.org.
   [5]. L. Ermakova et al. 2024. Overview of the CLEF 2024 JOKER Task 3: Translate puns from
        English to French. In: Guglielmo Faggioli et al. (Eds). 2024. Working Notes of the Conference
        and Labs of the Evaluation Forum (CLEF 2024). CEUR Workshop Proceedings, CEUR-WS.org.
   [6]. L. Ermakova, A.-G. Bosser, A. Jatowt, and T. Miller. 2023. The JOKER Corpus: English-French
        Parallel Data for Multilingual Wordplay Recognition. In Proceedings of the 46th International
        ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23).
        Association for Computing Machinery, New York, NY, USA, 2796–2806.
        https://doi.org/10.1145/3539618.3591885
   [7]. J.D. Kelleher, B. Mc. Namee, and Aoife D'Arcy. Fundamentals of Machine Learning for
        Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. Illustrated
        edition. The MIT Press, 2015. ISBN 0262029448, 9780262029445.
   [8]. C. Sebastian. Python Programming for Beginners: Learn Python Machine Learning Language
        from Scratch, Deep Learning with Python. Amazon Digital Services LLC - KDP Print US, 2018.
        ISBN 1792874650, 9781792874659.
   [9]. V.S.S. Chandra and A. Hareendran S. Artificial Intelligence and Machine Learning. PHI
        Learning, 2014. ISBN 8120349342, 9788120349346.