<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>at the CLEF 2025 JOKER Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alecsandru Kreeft-Libiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Finley Helms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cem Selçuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Bakker</string-name>
          <email>j.bakker@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaap Kamps</string-name>
          <email>kamps@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Storage and Retrieval</institution>
          ,
          <addr-line>Natural Language Processing, Wordplay Translation, Humor Retrieval, Funny</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper reports on the University of Amsterdam's participation in the CLEF 2025 JOKER track. Our overall goal is to investigate the non-literal use of language, such as in humor and wordplay, that remains challenging for current information retrieval and natural language processing technologies. Our specific focus is on a post hoc approach where we exploit wordplay or humorous text detection as a means to filter out humorous translations or search results. Our main findings are the following. First, we successfully developed efective humor detection classifiers for both English and French. Second, for humor-aware information retrieval, we could increase retrieval efectiveness by filtering for humorous content in search results ranked solely on topical relevance. Third, for wordplay translation, we manage to generate multiple translation candidates and select the one with the highest pun score based on the detector. This approach performs well, with limited gain on the automatic evaluation measures, but qualitative analysis confirms that this is an encouraging strategy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>and the specific runs submitted. Section</p>
      <sec id="sec-1-1">
        <title>3 discusses the results of our runs. We end in Section 4 by discussing our results and outlining the lessons learned.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Setup</title>
      <p>In this section, we will detail our approach for the three CLEF 2025 JOKER track tasks.</p>
      <p>
        For details of the exact task setup and results, we refer the reader to the detailed overview of the
track in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The basic ingredients of the track are:
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>Run
1 UAms_en_bm25
1 UAms_en_rm3
1 UAms_en_bm25_CE1K
1 UAms_en_rm3_CE1K
1 UAms_RM3</p>
      <sec id="sec-2-1">
        <title>1 UAms_RM3RoBERTa</title>
      </sec>
      <sec id="sec-2-2">
        <title>1 UAms_RM3RoBERTa_drop60</title>
      </sec>
      <sec id="sec-2-3">
        <title>1 UAms_pt_bm25</title>
      </sec>
      <sec id="sec-2-4">
        <title>1 UAms_pt_rm3</title>
      </sec>
      <sec id="sec-2-5">
        <title>1 UAms_pt_bm25_CE1K</title>
      </sec>
      <sec id="sec-2-6">
        <title>1 UAms_pt_rm3_CE1K</title>
        <p>Description</p>
      </sec>
      <sec id="sec-2-7">
        <title>BM25 baseline (Anserini, stemming)</title>
      </sec>
      <sec id="sec-2-8">
        <title>RM3 baseline (Anserini, stemming)</title>
      </sec>
      <sec id="sec-2-9">
        <title>BM25 + Crossencoder top 1,000</title>
      </sec>
      <sec id="sec-2-10">
        <title>BM25/RM3 + Crossencoder top 1,000</title>
      </sec>
      <sec id="sec-2-11">
        <title>Okapi BM25/RM3</title>
      </sec>
      <sec id="sec-2-12">
        <title>BM25/RM3 + Filter on RoBERTa Pun classifier (keeps 90%)</title>
      </sec>
      <sec id="sec-2-13">
        <title>BM25/RM3 + Filter on RoBERTa Pun classifier (keeps 40%)</title>
      </sec>
      <sec id="sec-2-14">
        <title>BM25 baseline (Anserini, stemming)</title>
      </sec>
      <sec id="sec-2-15">
        <title>RM3 baseline (Anserini, stemming)</title>
      </sec>
      <sec id="sec-2-16">
        <title>BM25 + Crossencoder top 1,000</title>
        <p>BM25/RM3 + Crossencoder top 1,000
2 UvA_finetunedmBARTcc25 mBART-large-cc25 Finetuned
2 UvA_mBARTcc25&amp;finetunedroBERTa mBART-large-cc25 + humour detector roBERTa-large filter</p>
      </sec>
      <sec id="sec-2-17">
        <title>2 UvA_finetunedT5-base T5-base Finetuned</title>
      </sec>
      <sec id="sec-2-18">
        <title>2 UvA_T5-base&amp;finetunedroBERTa T5-base + humour detector roBERTa-large filter</title>
      </sec>
      <sec id="sec-2-19">
        <title>2 UvA_finetunedNLLB-1.3B NLLB-200-1.3B Finetuned</title>
      </sec>
      <sec id="sec-2-20">
        <title>2 UvA_finetunedNLLB-1.3B&amp;finetunedroBERTa NLLB-200-1.3B + humour detector roBERTa-large filter</title>
      </sec>
      <sec id="sec-2-21">
        <title>2 UvA_finetunedMarianMT MarianMT Finetuned</title>
      </sec>
      <sec id="sec-2-22">
        <title>2 UvA_finetunedMarianMT&amp;finetunedroBERTa MarianMT + humour detector roBERTa-large</title>
        <p>Corpus For Task 1, there is a large corpus of 77,658 documents (usually a single sentence each) for the
retrieval task in English. There is an additional Portuguese corpus of 45,126 documents (including
a small fraction of AI-generated humorous texts).</p>
        <p>Train Data For Task 1, there are 12 English train queries with relevance judgments (between 4 and 364
relevant per query). There are 29 Portuguese train queries with relevance judgments (between 2
and 40 relevant per query).</p>
        <p>For Task 2, there are 1,405 English wordplays, with a total of 5,838 professional human translations
in French.</p>
        <p>For Task 3, there are 353 English onomastic wordplays, or informally “funny” names, with
professional human translations in French.</p>
        <p>Test Data For Task 1, there are 219 English test queries. These include the 12 train queries, resulting
in a total of 207 unseen queries on which the test evaluation is based. For these unseen queries,
there are between 1 and 233 relevant documents. There are also 98 Portuguese test queries, not
including any of the train queries, with between 1 and 79 relevant documents.</p>
        <p>For Task 2, there are 1,682 English wordplays, with 2,615 reference translations into French by
professional translators.</p>
        <p>For Task 3, there are 2,333 English onomastic wordplays, with French reference translations made
by professional translators</p>
        <p>We created runs for two of the tasks of the 2025 track, which we will discuss in order.
2.1. Task 1: Humor-aware Information Retrieval
This task asks to retrieve short humorous texts for a query. We submitted twelve runs in total, shown
in Table 1.</p>
        <p>Baseline Rankers We first submitted four baseline runs focusing on regular information retrieval
efectiveness. Two are vanilla baseline runs on an Anserini index, using either BM25 or BM25+RM3
with default settings [8].1 The other two runs are neural cross-encoder rerankings of these runs, based
on zero-shot application of an MSMARCO trained ranker, reranking the top 1,000 of either the BM25
or the BM25+RM3 baseline run.2 We submitted four runs for both the English and the Portuguese
data. To understand the efectiveness of standard retrieval systems optimized for topical relevance, our
submissions used default, non-optimized settings and privileged recall over precision.
RoBERTa All three submitted runs were based on Okapi BM25 as the retrieval model, combined
with RM3 relevance feedback. The first run, UAms_RM3, applied no filtering and served as a BM25+RM3
baseline. The second run, UAms_RM3RoBERTa, added a filtering step using a pre-trained RoBERTa-based
pun classifier. 3 Documents with a predicted pun probability above a tuned threshold were retained,
resulting in approximately 90% of documents being kept. The third run, UAms_RM3RoBERTa_drop60,
used the same classifier, but instead of threshold tuning, it applied a fixed-ratio filter that retained the
top 40% of documents with the highest predicted pun probabilities.
2.2. Task 2: Wordplay Translation
This task asks to translate english punning jokes into french. We submitted eight runs, as shown in
Table 1.</p>
        <p>MarianMT MarianMT is a “sequence-to-sequence” (Seq2Seq) model based on the Marian framework.
Marian, first introduced in 2017, is written entirely in C++, which supports faster training and translation.
MarianMT provides pre-trained models, which are smaller than most other translation models, about
298 MB on disk, compared to other transformer-based translation models that exceed 1 GB. The size
of MarianMT makes the model useful for fine-tuning on custom datasets for specific tasks. For the
translation task, the MarianMTModel and MarianTokenizer were loaded from the transformers library,
using the model checkpoint “Helsinki-NLP/opus-mt-en-fr”. Before training, the data was preprocessed
by merging the input en qrels files on “id_en”, to create a single csv file. The columns were renamed to
“English” and “French”, no prefix was needed. The preprocessed data was divided with train_test_split
into a 90/10 split of respectively training and test data. After which the training data was further divided
into training and validation sets using a 80/20 split.</p>
        <p>T5-base As stated in previous paragraphs, a T5 model can be used for diferent NLP tasks. It is suitable
for machine translation due to its ability to understand natural language and generate contextually
relevant information. The ‘T5ForConditionalGeneration’ and the model name ‘t5-base’ were used to
load the T5-base model for English to French translation. The preprocessing of the data was done
similarly to the preprocessing for the MarianMT model. The split of the test, validation and train set
was also done in the same manner. The ‘T5Tokenizer’ was used to tokenize the data before training.
Training was done with the same number of epochs, batch size and evaluation metric as used for
MarianMT.</p>
        <p>NLLB-200-1.3B The NLLB-200-1.3B model (No Language Left Behind), developed by Meta AI, is
a state-of-the-art multilingual translation model supporting 200 languages. It uses an optimized
Transformer-based encoder-decoder architecture to produce high-quality translations, especially for
low-resource languages. For the translation task, the model “facebook/nllb-200-1.3B” was loaded from
the transformers library, along with the corresponding NllbTokenizer. Since NLLB requires explicit
language codes, the input sentence was prefixed with the target language ID, in this case fra for French.
1https://github.com/castorini/pyserini
2https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2
3https://huggingface.co/frostymelonade/roberta-small-pun-detector-v2
mBART-large-cc25 The mBART (Multilingual BART) model, introduced by Facebook AI, is an
encoder-decoder model that combines multilingual pretraining and fine-tuning. The mbart-large-cc25
variant is pretrained on 25 languages using a denoising autoencoding objective, allowing it to generalize
well for translation tasks. For this translation task, the model facebook/mbart-large-cc25 was used from
the transformers library, along with the MBartTokenizer. Like NLLB, mBART requires explicit language
codes. For English-to-French translation, en_XX was used as the source language code and fr_XX as
the target language code. These tokens were appended to the input during encoding and decoding
respectively.
2.2.1. Humour Detection &amp; Translation
Building on the foundation laid by the UvA’s participation in the 2024 CLEF JOKER Track [9], while
literal translations can often be handled efectively by LLMs with suficient training, the translation of
puns and other forms of wordplay presents a considerably greater challenge due to their inherently
ambiguous and context-dependent nature. To explore new possibilities, we employed the same
RoBERTabased model as described in the previous section and fine-tuned it on the binary pun detection task
presented in the 2023 CLEF JOKER Track [10]. This task focuses on distinguishing sentences that
contain puns from those that do not, providing a useful benchmark for evaluating wordplay sensitivity
in NLP systems. For detailed information regarding the dataset and task formulation, we refer the
reader to the oficial workshop notes of the Pun Detection Task [ 10].</p>
        <p>Subsequently, we integrate this pun detector with each of the MT models under investigation. This
setup allows us to systematically assess how pun-aware pre-processing could influence the performance
and behavior of MT systems. Following the methodology of the CLEF 2024 track approach [9], we
generate three candidate translations for each pun-containing sentence using beam search in MarianMT.
Each of these translations is then evaluated using our fine-tuned pun classifier, which assigns both
binary class labels (”pun” or ”non-pun”) and associated class probabilities. The candidate translation
with the highest pun probability score was selected as the final output.
2.3. Task 3: Onomastic Wordplay Translation
This task asks to translate funny names from english into french. We made no oficial submissions for
this task other than trial runs with the Task 2 approach applied to the names and the descriptions of
the onomastic wordplay.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>In this section, we will present the results of our experiments in self-contained subsections following
the CLEF 2025 JOKER Track tasks.
3.1. Task 1: Humor-aware Information Retrieval
We discuss our results for Task 1, asking to retrieve short humorous texts for a query.</p>
      <p>Table 2 shows the performance of the Task 1 submissions on the test data. None of our approaches
was trained or informed by the train data, nor was pooling used to locate relevant documents (the
corpus’s recall base is known to be complete). As a consequence, the results of the train and test data are
comparable. First, we observe that standard lexical ranking approaches perform reasonably, albeit not
very well, and do not yield a significant performance gain when using blind feedback. In particular, for
Portuguese, the scores are low, possibly also due to the synthetic nature of the corpus construction. We
also see that the neural rankers may attract topically relevant content, but do not attract non-humorous
content. As a result, zero-shot rankers lead to a decrease in performance due to the large fraction of
non-relevant documents. Second, the filter based on a pun classifier is efective, leading to a notable</p>
      <sec id="sec-3-1">
        <title>UAms_en_bm25</title>
      </sec>
      <sec id="sec-3-2">
        <title>UAms_en_rm3</title>
      </sec>
      <sec id="sec-3-3">
        <title>UAms_en_bm25_CE1K</title>
      </sec>
      <sec id="sec-3-4">
        <title>UAms_en_rm3_CE1K</title>
      </sec>
      <sec id="sec-3-5">
        <title>UAms_RM3</title>
      </sec>
      <sec id="sec-3-6">
        <title>UAms_RM3RoBERTa</title>
      </sec>
      <sec id="sec-3-7">
        <title>UAms_RM3RoBERTa_drop60</title>
      </sec>
      <sec id="sec-3-8">
        <title>UAms_pt_bm25</title>
      </sec>
      <sec id="sec-3-9">
        <title>UAms_pt_rm3</title>
      </sec>
      <sec id="sec-3-10">
        <title>UAms_pt_bm25_CE1K</title>
        <p>UAms_pt_rm3_CE1K</p>
        <p>MRR</p>
        <p>Precision</p>
        <p>NDCG</p>
        <p>5
increase in both precision and recall (MRR, NDCG, and MAP). We observe that the more aggressive
iflter, which retains 40% of results, outperforms the moderate filter, which retains 90% of results.
3.2. Task 2: Wordplay Translation
We continue with Task 2, asking to translate english punning jokes into french. We submitted eight runs,
using four stand-alone finetuned models and four stand-alone models combined with a pun-detector.
Oficial Evaluation Results Table 3 shows the results of the CLEF 2025 JOKER track’s Task 2 on the
test data. We make a number of observations.</p>
        <p>First, the general translation quality varies across systems, with BLEU scores ranging from 16.55%
to 42.55% and BERTScore F1 ranging from 79.59% to 87.42%. These results demonstrate that some
models are capable of generating fluent and accurate translations, although preserving the pun remains
a challenge. Second, we observe that the inclusion of the pun detector has a small efect on BERTScore
F1, slightly improving results in mBARTcc25. Its combination with the fine-tuned RoBERTa achieves
a BERTScore F1 of 80.00%, compared to 79.59% without the detector. This trend, however, does not
hold for all models, and performance seems to decrease slightly on these metrics. Third, examining
the location of the punword and whether the output exactly contains this punword from the reference
translation, we observe an increasing trend, albeit with a small gain and a draw for the best-performing
MarianMT model.</p>
        <p>Automatic evaluation metrics assess the entire translated sentence as a whole. They are a necessary
but not suficient condition for successful pun translation. The ground truth consists of professional
translations that preserve the wordplay across languages. Therefore, while the results are acceptable,
they also underline the importance of further qualitative analysis of the translations.
Qualitative Evaluation Table 4 shows an example from the train data set. The top half of the table
shows the English pun and the six French translations made by professional translators. The bottom
half of the table shows the translations generated by our systems. We make a number of observations:</p>
        <p>First, most model outputs match the style and wordplay of professional references quite well, especially
when the pun detector is applied. Every model takes something from the original. It is the way
the verb ``Sauve`` is conjugated or the way Tom expresses himself that is changed. Notably, the
UAms_Task2_NLLB-200-1.3B_roBERTa-large_filter has chosen for Toto as the correct translation
of Tom, even though it only comes forth once in the references. Toto jokes are a common type of joke
in French,4 similar to Tom (also Tom Swifty) jokes in English. 5</p>
        <p>Second, other outputs (e.g., from T5-base) are fluent and grammatically correct, but they fail to
preserve the pun. These outputs often resemble literal translations that miss the wordplay entirely,
which can occur when only the top-ranked translation is used.</p>
        <p>Our analysis revealed both the quality of current machine translation and the complexity of preserving
the wordplay in a literally correct translation. We also observed that the models are able to generate
creative translations preserving the wordplay, but that the most likely translation or the first one
generated by the model may not be a pun. This observation supports our general idea to generate
multiple translations with the model and use an efective pun detector to choose one of these translations
in case it is likely to preserve the wordplay.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>This paper detailed the University of Amsterdam’s participation in the CLEF 2025 JOKER track. We
conducted a range of experiments for each of the three tasks of the track. Our primary objective
was to develop a post hoc approach that leverages wordplay or humorous text detection to filter out
humorous translations or search results. Our main findings are the following. First, we successfully</p>
      <sec id="sec-4-1">
        <title>4https://fr.wikipedia.org/wiki/Blague_de_Toto</title>
        <p>5https://en.wikipedia.org/wiki/Tom_Swifty
developed efective humor detection classifiers for both English and French. Second, for humor-aware
information retrieval, we could increase retrieval efectiveness by filtering for humorous content in
search results ranked solely on topical relevance. Third, for wordplay translation, we generated multiple
translation candidates and selected the one with the highest pun score, as determined by the detector.
This approach performed well, with limited gain on the automatic evaluation measures, and qualitative
analysis confirmed that this is an encouraging strategy.</p>
        <p>Our qualitative analysis also demonstrated the importance and potential of models that capture
such deep cultural references, such as the Wellerism of Tom Swifty jokes that match French Blague
de Toto wordplay. At the same time, it also highlights the complexity of evaluating output against
these references: this particular cultural reference hinges on only a single word in one of the reference
translations. The importance of generating high-quality professional data sets from human translators
is paramount.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was conducted as part of the final research projects of the Bachelor in Artificial Intelligence at the
University of Amsterdam, We thank the coordinator, Dr. Sander van Splunter, for his support and flexibility to
work around the CLEF deadlines. We also thank the track and task organizers for their amazing service and efort
in making realistic benchmarks available for analyzing and processing humorous text.</p>
      <p>Jan Bakker is partly funded by the Netherlands Organization for Scientific Research (NWO NWA # 1518.22.105).
Jaap Kamps is supported by the Netherlands Organization for Scientific Research (NWO CI # CISC.CC.016, NWO
NWA # 1518.22.105), the University of Amsterdam (AI4FinTech program), and ICAI (AI for Open Government
Lab). Views expressed in this paper are not necessarily shared or endorsed by those funding the research.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check and Paraphrase and reword. After using these tools/services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
F. Piroi, P. Rosso, D. Spina, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF
Association (CLEF 2025), Lecture Notes in Computer Science, Springer, 2025.
[5] L. Ermakova, A.-G. Bosser, T. Miller, R. Campos, Overview of the CLEF 2025 Joker Task 1:</p>
      <p>Humous-Aware Information Retrieval , in: [11], 2025.
[6] L. Ermakova, A.-G. Bosser, T. Miller, R. Campos, Overview of the CLEF 2025 Joker Task 2: Wordplay</p>
      <p>Translation, in: [11], 2025.
[7] L. Ermakova, A.-G. Bosser, T. Miller, R. Campos, Overview of the CLEF 2025 Joker Task 3:</p>
      <p>Onomastic Wordplay Translation, in: [11], 2025.
[8] J. Lin, X. Ma, S. Lin, J. Yang, R. Pradeep, R. F. Nogueira, Pyserini: A python toolkit for reproducible
information retrieval research with sparse and dense representations, in: F. Diaz, C. Shah, T. Suel,
P. Castells, R. Jones, T. Sakai (Eds.), SIGIR ’21: The 44th International ACM SIGIR Conference
on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021,
ACM, 2021, pp. 2356–2362. URL: https://doi.org/10.1145/3404835.3463238. doi:10.1145/3404835.
3463238.
[9] E. Schuurman, M. Cazemier, L. Buijs, J. Kamps, University of amsterdam at the CLEF 2024 joker
track, in: G. Faggioli, N. Ferro, P. Galuscáková, A. G. S. de Herrera (Eds.), Working Notes of the
Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, 9-12 September,
2024, volume 3740 of CEUR Workshop Proceedings, CEUR-WS.org, 2024, pp. 1909–1922. URL:
https://ceur-ws.org/Vol-3740/paper-181.pdf.
[10] L. Ermakova, T. Miller, A. Bosser, V. M. Palma-Preciado, G. Sidorov, A. Jatowt, Overview of JOKER
2023 automatic wordplay analysis task 1 - pun detection, in: M. Aliannejadi, G. Faggioli, N. Ferro,
M. Vlachos (Eds.), Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023),
Thessaloniki, Greece, September 18th to 21st, 2023, volume 3497 of CEUR Workshop Proceedings,
CEUR-WS.org, 2023, pp. 1785–1803. URL: https://ceur-ws.org/Vol-3497/paper-149.pdf.
[11] G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025: Conference and Labs
of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org, 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Palma-Preciado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 JOKER track - automatic humour analysis</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuscáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality</source>
          , Multimodality, and Interaction - 15th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2024</year>
          , Grenoble, France, September 9-
          <issue>12</issue>
          ,
          <year>2024</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>14959</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>182</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -71908-
          <issue>0</issue>
          _8. doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 71908- 0\_8.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 JOKER task 1: Humouraware information retrieval</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), Grenoble, France,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2024</year>
          , volume
          <volume>3740</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1775</fpage>
          -
          <lpage>1785</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-165.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2024 JOKER task 3: Translate puns from english to french</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuscáková</surname>
          </string-name>
          , A. G. S. de Herrera (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ), Grenoble, France,
          <fpage>9</fpage>
          -
          <issue>12</issue>
          <year>September</year>
          ,
          <year>2024</year>
          , volume
          <volume>3740</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1800</fpage>
          -
          <lpage>1810</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3740</volume>
          /paper-167.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 Joker track: Humour in the machine</article-title>
          , in: J. Carrillo de Albornoz,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , J. Mothe,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>