<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VerbaNexAI at CLEF 2025 JOKER Task 3: Multi-Model LLM Approach for Onomastic Wordplay Translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maria Paz Ramirez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeison D. Jimenez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deyson Gómez Sánchez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jairo E. Serrano</string-name>
          <email>jserrano@utb.edu.co</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan C. Martinez-Santos</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edwin Puertas</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Tecnologica de Bolivar, School of Engineering</institution>
          ,
          <addr-line>S Architecture, and Design; Cartagena de Indias 130013</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Our approach achieved first place in the CLEF 2025 JOKER Task 3 competition, outperforming all other participating teams and establishing new benchmarks for LLM-based creative translation systems. Testing five diferent models using advanced prompting strategies. Our methodology involved systematic prompt engineering with Chain-of-Thought reasoning and universe-specific translation patterns. ChatGPT-4o achieved the best performance with 29.5% exact matches and 30.6% accent-tolerant matches, resulting in an overall 60.1% success rate, demonstrating the potential of LLM-based approaches for creative multilingual wordplay translation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;onomastic wordplay</kwd>
        <kwd>machine translation</kwd>
        <kwd>large language models</kwd>
        <kwd>chain-of-thought prompting</kwd>
        <kwd>model comparison</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The translation of onomastic wordplay represents one of the most challenging tasks in computational
linguistics, requiring the preservation of both semantic meaning and ludic form across diferent linguistic
and cultural contexts. Such wordplay is prevalent in fictional universes like Asterix comics, Harry Potter
series, and modern video games, where character names often contain deliberate puns that contribute
to humor and character development.</p>
      <p>
        The CLEF 2025 JOKER Lab addresses these challenges through three tasks focused on humor in
machine translation and information retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Task 3 specifically targets onomastic wordplay
translation from English to French, using a parallel corpus of approximately 2,000 named entities from
various sources including video games, literature, and advertising slogans [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Our approach leverages the recent advances in Large Language Models (LLMs) and their
demonstrated capabilities in understanding context, linguistic creativity, and cross-lingual reasoning. We
systematically evaluated multiple state-of-the-art models using carefully designed prompting strategies
that incorporate Chain-of-Thought reasoning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and universe-specific translation patterns.
      </p>
      <p>
        The JOKER corpus [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a comprehensive resource for multilingual wordplay recognition,
ofering English-French parallel data that enables systematic evaluation of translation approaches. This
work contributes to the understanding of LLM capabilities in creative language tasks and provides
insights into efective prompting strategies for specialized translation domains.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>
        Wordplay translation has been approached from various perspectives in computational linguistics
research. The theoretical foundations of this field were established by Delabastita’s seminal work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
which demonstrated that wordplay is not inherently untranslatable but requires sophisticated
understanding of linguistic structures and cultural contexts. His framework for wordplay translation
strategies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] remains influential, identifying key challenges such as the tension between preserving
humor and maintaining semantic fidelity. Low [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] extended this work by proposing practical strategies
for joke and pun translation, emphasizing the importance of cultural adaptation over literal preservation.
      </p>
      <p>
        Traditional rule-based and statistical machine translation systems have struggled with the creative
and cultural aspects of puns. Neural machine translation brought improvements in handling ambiguous
meanings and context-dependent translations, but fundamental limitations persist. Troiano et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
demonstrated that even advanced NMT systems fail to preserve non-propositional elements like
emotions and humor, losing crucial communicative aspects in back-translation scenarios. The emergence of
large-scale multilingual models like NLLB [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] has improved general translation quality, but specialized
challenges in wordplay translation remain largely unaddressed.
      </p>
      <p>
        The emergence of Large Language Models has opened new possibilities for creative translation tasks.
Recent comparative studies have shown that LLMs can outperform traditional NMT in humor retention
tasks. Pituxcoosuvarn and Murakami [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] found that GPT-based models with explanation-enhanced
prompting achieved significantly higher joke retention rates (62.94%) compared to neural machine
translation systems, particularly excelling in tasks requiring cultural and linguistic creativity.
      </p>
      <p>
        Chain-of-Thought prompting has emerged as a crucial technique for enhancing LLM reasoning
capabilities. Wei et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] demonstrated that CoT prompting enables models to break down complex
reasoning into intermediate steps, achieving significant improvements on arithmetic, commonsense,
and symbolic reasoning tasks. Zhang et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] further developed automatic CoT generation methods,
reducing manual efort while maintaining reasoning quality. This approach appears particularly
promising for wordplay translation, which requires multi-step cultural and linguistic reasoning.
      </p>
      <p>
        Contemporary evaluation frameworks have been established through shared tasks and competitions.
Miller and Hempelmann [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] created standardized benchmarks for pun detection and interpretation,
while Miller’s comprehensive survey [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] outlined computational approaches to puns, establishing
fundamental methodologies for automated humor analysis. Zhou et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] specifically examined
humor evaluation in neural language models, providing insights into how modern architectures handle
comedic content.
      </p>
      <p>
        Partington [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] provided crucial linguistic insights into wordplay structure, identifying how puns
exploit organizational expectations in language through relexicalization and reworking processes.
This theoretical framework has informed subsequent computational approaches to understanding and
generating wordplay.
      </p>
      <p>
        The JOKER Lab at CLEF has systematically addressed humor and wordplay in computational systems
since 2022. The track has evolved from initial wordplay classification tasks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to comprehensive
humor analysis including pun detection, location, and interpretation [17]. The development of the
JOKER corpus [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] established the first substantial English-French parallel dataset for humorous content,
creating standardized benchmarks for computational humor research.
      </p>
      <p>Recent specialized approaches have begun to address specific aspects of creative translation. Dhanani
et al. [18] developed transformer-based methods specifically for English-French pun translation,
demonstrating the potential of focused models over general-purpose systems. However, their single-model
approach limited comprehensive evaluation across diferent types of wordplay. Pilyarchuk [ 19] explored
multimodal wordplay translation in audiovisual contexts, revealing additional complexities when humor
spans multiple communication channels.</p>
      <p>Despite these advances, several gaps remain in the literature. Most previous work focuses on
single-model approaches rather than systematic multi-model evaluation. The application of advanced
prompting techniques like Chain-of-Thought reasoning to creative translation tasks remains
underexplored, particularly in the context of the transformer architecture [20] that underlies modern LLMs.
Additionally, universe-specific translation patterns—such as the distinct naming conventions in fictional
worlds like Asterix or Harry Potter—have received limited attention in computational approaches. This
gap motivates our exploration of multi-model LLM evaluation with specialized prompting strategies for
onomastic wordplay translation.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section provides a detailed description of the approach designed to tackle onomastic wordplay
translation from English to French within fictional universe contexts. Our methodology centers on
Chain-of-Thought prompting strategies that guide Large Language Models through systematic
reasoning processes, enabling efective preservation of humor mechanisms while adapting cultural and
linguistic elements to the target language. To address this challenge, we developed a comprehensive
framework combining advanced prompt engineering with comparative evaluation across five diverse
LLM architectures, thereby optimizing translation accuracy through structured cognitive guidance.
Below, we comprehensively explain the general approach, including model selection rationale, prompt
engineering framework, experimental design, and implementation details, highlighting how each
component contributes to the overall objective.</p>
      <sec id="sec-3-1">
        <title>3.1. General Approach</title>
        <p>Our work aims to establish efective methodologies for creative translation tasks that require
simultaneous optimization across multiple linguistic dimensions. This task demands capturing complex
semantic relationships, cultural contexts, and humor mechanisms while maintaining adherence to
ifctional universe conventions. To achieve this, we designed a systematic approach that leverages the
reasoning capabilities of transformer models to generate contextually appropriate translations through
structured cognitive processes that mirror human translator expertise.</p>
        <p>We adopted a Chain-of-Thought prompting approach rather than fine-tuning or simpler prompting
strategies for several interconnected methodological reasons. Prompting-based approaches enable rapid
experimentation across multiple models without computational overhead while preserving pre-trained
multilingual capabilities, making systematic comparison feasible across our diverse model selection.
However, preliminary experiments with simple translation prompts revealed frequent failures due to
models attempting direct translation without understanding underlying humor mechanisms, while
few-shot examples alone proved insuficient given the vast diversity of wordplay types in our dataset.</p>
        <p>This led us to adopt Chain-of-Thought prompting specifically, as wordplay translation requires
sequential reasoning through multiple linguistic layers: identifying original wordplay mechanisms,
understanding cultural contexts, adapting to target language constraints, and maintaining creative
intent—cognitive processes that mirror human translator reasoning. Our four-step CoT framework
explicitly guides models through the cognitive processes that human translators naturally employ,
while providing interpretability that allows us to identify at which reasoning step models fail, enabling
targeted analysis of model limitations in creative language tasks.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Selection and Evaluation Strategy</title>
        <p>The selection of our five models was strategically designed to represent diferent architectural approaches
and specializations, enabling identification of which model characteristics are most critical for creative
translation tasks. ChatGPT-4o [21] represents the current state-of-the-art in commercial language
modeling with enhanced reasoning capabilities, establishing the performance ceiling for available
systems. DeepSeek [22] was included as a reasoning-specialized model to test specifically whether
mathematical and logical reasoning capabilities transfer efectively to creative linguistic domains,
providing insights into the relationship between analytical and creative reasoning.</p>
        <p>Llama3-70b [23] represents the community standard for open-source large-scale models, allowing
evaluation of whether open-source alternatives can compete with proprietary systems in specialized
creative tasks. Llama-4-scout-17b [24] was selected as an optimized implementation that balances
performance with computational eficiency, relevant for practical applications with resource constraints.
Finally, Mistral-saba-24b [25] contributes multilingual specialization with specific optimization for
European languages, theoretically providing advantages for English-French translation scenarios.</p>
        <p>This diverse selection allows identification of whether superiority in creative translation stems from
parameter scale, architectural specialization, multilingual optimization, or general reasoning capabilities,
providing valuable insights for future development of creative translation systems.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Chain-of-Thought Framework Design</title>
        <p>Our core methodology centered on advanced prompt engineering structured around a four-stage
cognitive process that decomposes complex creative translation into manageable reasoning steps. The
framework was developed through iterative analysis of successful human translations in the training
dataset, identifying consistent patterns in professional translator decision-making processes.</p>
        <p>The first stage, wordplay deconstruction, requires models to analyze hidden words or concepts
embedded in English names, identify the type of wordplay mechanism employed (phonetic, semantic,
or morphological), and recognize cultural references that inform the humor structure. This stage is
crucial because many translation failures stem from models not recognizing the underlying humor
mechanism in the source term, leading to inappropriate direct translation attempts.</p>
        <p>The second stage implements universe identification, applying context-specific constraints based on
ifctional world conventions. For Asterix universe characters, we distinguished between Gallic characters
requiring "-ix" sufixes with French professional vocabulary integration, and Roman characters
employing Latin-style sufixes with classical structural elements. For Harry Potter universe elements, the
focus centers on magical compound words that prioritize functional description over literal wordplay
preservation. This distinction is fundamental because each fictional universe has established naming
conventions that must be respected to maintain narrative coherence.</p>
        <p>The third stage develops French adaptation strategies through identification of core conceptual
meaning behind the original wordplay, generation of equivalent French vocabulary and cultural
concepts, construction following universe-specific patterns, and ensuring phonetic naturalness for French
pronunciation. This stage requires balancing humor preservation with linguistic appropriateness in the
target language. The final stage implements candidate generation and evaluation, creating multiple
translation alternatives, assessing wordplay preservation and cultural appropriateness, and selecting
optimal solutions based on multiple simultaneous criteria.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Universe-Specific Adaptation Patterns</title>
        <p>Analysis of the training dataset revealed distinct translation patterns that informed our adaptation
strategies for diferent fictional contexts. For Asterix characters, we identified that French translations
frequently abandon literal English wordplay in favor of French professional vocabulary that maintains
character function while adapting to French cultural preferences. Examples include medical terminology
integration in "Panoramix" for the druid Getafix, administrative vocabulary in "Assurancetourix" for the
bard Cacofonix, and technical language adaptation in "Ordralfabétix" for the fishmonger Unhygienix.
This pattern reflects French cultural preferences for professional specificity and linguistic sophistication
in character development.</p>
        <p>For Harry Potter elements, our analysis focused on magical compound words such as
"Chocogrenouille" and "bièreaubeurre," functional translations that prioritize magical purpose over
literal meaning preservation, and French phonetic adaptation ensuring natural pronunciation
patterns. These patterns emphasize magical functionality over linguistic cleverness, reflecting translation
philosophy focused on narrative consistency and world-building coherence.</p>
        <p>We constructed specialized knowledge bases from these training examples, extracting paradigmatic
transformations that exemplify successful translation strategies. These knowledge bases informed
our prompt engineering with established patterns of efective adaptation, enabling models to leverage
proven translation approaches while maintaining creative flexibility for novel cases.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Experimental Design and Implementation</title>
        <p>The experimental implementation employed standardized configuration across all evaluated models
to ensure fair comparison and reproducible results. We used temperature settings between 0.1-0.2 to
prioritize consistency over uncontrolled creativity, maximum token limits of 30-50 to ensure concise
responses, structured prompt formatting combining system messages with user queries, and automated
response extraction with multiple fallback patterns to handle variation in output formats.</p>
        <p>All models were evaluated on the complete CLEF 2025 JOKER Task 3 dataset comprising 353
EnglishFrench wordplay translation pairs from Asterix and Harry Potter universes. Each translation attempt
was logged with complete input prompts, model responses, extracted translations, and evaluation
outcomes. The evaluation employed exact string matching, accent-tolerant matching allowing for
French diacritical variations, and combined success rates incorporating both matching types.</p>
        <p>This standardized approach ensures that observed performance diferences reflect genuine model
capabilities in creative translation rather than experimental variation or implementation inconsistencies,
enabling reliable conclusions about architectural advantages for creative language tasks.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data Description</title>
      <p>The training dataset consists of 353 entries in JSON format with the following fields:
Our analysis revealed two primary universe categories:
• Asterix universe: 191 entries (54.1%) featuring Gallic and Roman characters
• Harry Potter universe: 162 entries (45.9%) containing magical terminology and character names</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset Characteristics and Complexity Distribution</title>
        <p>The dataset exhibits significant variation in translation complexity, which we categorized into three
levels based on the cognitive and linguistic processing required:
• Simple Phonetic Adaptations (28% of dataset): Direct phonetic modifications requiring
minimal cultural adaptation (e.g., "Asterix" → "Astérix")
• Cultural Localization (35% of dataset): Terms requiring understanding of cultural references
and appropriate French cultural context adaptation
• Creative Reconstruction (37% of dataset): Complete reimagining of wordplay mechanisms
while preserving original humorous or functional intent</p>
        <p>This distribution provides an ideal testbed for evaluating LLM capabilities across diferent levels of
linguistic creativity, from straightforward adaptations to complex creative tasks requiring deep cultural
and linguistic understanding.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Competition Performance and Comparative Results</title>
        <p>Our VerbaNex system achieved first place in the oficial CLEF 2025 JOKER Task 3 competition,
demonstrating the efectiveness of our multi-model evaluation approach and Chain-of-Thought prompting
strategy. Table ?? shows our performance compared to other participating teams.</p>
        <p>Our system’s superior performance validates the efectiveness of our methodological choices:
systematic model comparison, structured Chain-of-Thought prompting, and the selection of ChatGPT-4o as
the optimal model for creative translation tasks. The significant performance gap between our approach
and competing systems demonstrates the importance of both model selection and prompting strategy
design in specialized creative language tasks.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Overall Performance Comparison</title>
        <p>Our comprehensive evaluation across all five models reveals significant performance disparities that
highlight the varying capabilities of current LLMs in creative translation tasks. As demonstrated in
Figure 2 and detailed in Table 1, ChatGPT-4o demonstrated superior performance across all evaluation
criteria, achieving the highest exact match rate (29.5%), accent-tolerant accuracy (30.6%), and overall
success rate (60.1%). This represents a significant improvement of 25.8 percentage points over the
second-best performing model, DeepSeek, which achieved a 34.3% success rate. Table 2 shows the
detailed performance comparison across all evaluated models.</p>
        <p>Table 1 shows the detailed performance comparison across all evaluated models.</p>
        <p>The performance gap between models reveals interesting insights into the capabilities required
for creative language tasks. DeepSeek, despite its specialization in reasoning tasks, achieved only
16.4% exact matches, suggesting that mathematical reasoning capabilities do not directly translate
to creative linguistic processing. The open-source models (Llama3-70b, Llama-4-scout-17b) showed
modest performance, with Llama3-70b achieving 11.6% exact matches, indicating that model scale alone
is insuficient for this specialized task.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Universe-Specific Analysis</title>
        <sec id="sec-5-3-1">
          <title>Asterix Universe Performance (191 entries):</title>
          <p>• ChatGPT-4o: Approximately 56 exact matches (29.3%)
• Success examples: Asterix→Astérix, Getafix →Panoramix, Vitalstatistix→Abraracourcix</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>Harry Potter Universe Performance (162 entries):</title>
          <p>• ChatGPT-4o: Approximately 48 exact matches (29.6%)
• Success examples: remembrall→rapeltout,</p>
          <p>Parseltongue→Fourchelang
sneakoscope→scrutoscope,</p>
          <p>The model performed consistently across both universes, showing no significant bias toward either
functional magical terminology or creative character naming conventions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Error Analysis and Performance Insights</title>
      <sec id="sec-6-1">
        <title>6.1. Common Failure Patterns</title>
        <p>Our detailed analysis of translation failures revealed four primary error categories across all evaluated
models:</p>
        <p>Literal Translation Bias (32% of errors): Models frequently attempted direct word-for-word
translation without understanding wordplay mechanisms. For example, translating "Unhygienix" as
"Malproprix" instead of the correct "Ordralfabétix," missing the alphabetical ordering concept that
characterizes this fishmonger character.</p>
        <p>Morphological Pattern Violations (28% of errors): Inconsistent application of universe-specific
naming conventions, such as failing to maintain the mandatory "-ix" sufix in Asterix characters or
creating phonetically unnatural French constructions that violate French phonological rules.</p>
        <p>Cultural Context Misunderstanding (23% of errors): Failure to recognize or appropriately
adapt cultural references, particularly evident in Asterix character names where French professional
vocabulary is essential for maintaining humor.</p>
        <p>Semantic Drift (17% of errors): Loss of original functional or humorous intent while preserving
surface linguistic structure, resulting in technically plausible but contextually inappropriate translations.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Success Factors Analysis</title>
        <p>Analysis of successful translations across all models revealed three key success patterns:
• Functional Preservation: Most successful translations prioritized maintaining the functional or
descriptive purpose over literal linguistic elements
• Cultural Adaptation: Efective translations substituted English cultural references with
appropriate French equivalents while preserving humor mechanisms
• Phonetic Optimization: Successful candidates demonstrated natural French pronunciation
patterns while maintaining morphological consistency</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This research, which achieved first place in the CLEF 2025 JOKER Task 3 competition, presents the
ifrst systematic comparative evaluation of Large Language Models for onomastic wordplay translation,
demonstrating that ChatGPT-4o significantly outperforms other state-of-the-art models with 29.5%
exact match accuracy and 60.1% overall success rate—a substantial 25.8 percentage point improvement
over the second-best model, DeepSeek. Our findings reveal that advanced reasoning architectures
provide greater advantages than parameter scaling alone, as evidenced by the clear performance
hierarchy across all evaluated models, while our structured Chain-of-Thought prompting methodology
proved essential for systematic handling of complex wordplay mechanisms through four-step reasoning
(wordplay deconstruction → universe identification → French adaptation → candidate generation).
The substantial performance gaps between models highlight that creative translation capabilities
require specialized architectural features rather than general multilingual training, though even the
best-performing model achieves only 30% exact accuracy, indicating the continued need for human
oversight in practical applications. This work establishes benchmarks and methodologies for future
research in computational creativity and specialized translation domains, providing a foundation for
developing more efective approaches to creative language tasks through systematic model evaluation
and structured prompting strategies.</p>
      <p>section*Acknowledgments The authors would like to acknowledge the support provided by the
master’s degree scholarship program in engineering at the Universidad Tecnologica de Bolivar (UTB)
in Cartagena, Colombia.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Claude Sonnet 4 for grammar, spelling, and
translation assistance. After using this tool, the author(s) reviewed and edited the content as needed
and take full(s) responsibility for the publication’s content.
[17] L. Ermakova, A.-G. Bosser, T. Miller, T. Thomas, V. M. Palma Preciado, G. Sidorov, A. Jatowt,
Clef 2024 joker lab: Automatic humour analysis, in: Advances in Information Retrieval, volume
14610, Springer, 2024, pp. 82–95. URL: https://doi.org/10.1007/978-3-031-56072-9_5. doi:10.1007/
978-3-031-56072-9_5.
[18] F. Dhanani, M. Rafi, M. A. Tahir, Tickling translations: Small but mighty open-sourced transformers
bring english pun-ny entities to life in french!, Computer Speech &amp; Language 90 (2024) 101739.</p>
      <p>URL: https://doi.org/10.1016/j.csl.2024.101739. doi:10.1016/j.csl.2024.101739.
[19] K. Pilyarchuk, Wordplay-based humor: to leave it or to translate it, that is the question, The
European Journal of Humour Research 12 (2024) 120–144. URL: https://doi.org/10.7592/EJHR.2024.
12.2.915. doi:10.7592/EJHR.2024.12.2.915.
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Kaiser, I. Polosukhin,
Attention is all you need, in: Advances in Neural Information Processing Systems, 2017, pp.
5998–6008.
[21] OpenAI, Gpt-4o: Omni-modal ai model, 2024. URL: https://openai.com/index/hello-gpt-4o/, openAI</p>
      <p>Blog Post.
[22] DeepSeek-AI, Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,
2025. URL: https://arxiv.org/abs/2501.12948. arXiv:2501.12948.
[23] AI@Meta, Llama 3 model card, 2024. URL: https://github.com/meta-llama/llama3/blob/main/</p>
      <p>MODEL_CARD.md.
[24] AI@Meta, Llama 4 scout 17b, 2025. URL: https://huggingface.co/meta-llama/</p>
      <p>Llama-4-Scout-17B-16E, accessed: 2025-07-07.
[25] M. AI, Mistral-saba-24b, 2025. URL: https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501,
accessed: 2025-07-07.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 joker lab: Humour in machine</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the clef 2025 joker task 3: Onomastic wordplay translation</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ),
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2201.11903. arXiv:arXiv:
          <fpage>2201</fpage>
          .
          <fpage>11903</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>The joker corpus: English-french parallel data for multilingual wordplay recognition</article-title>
          ,
          <source>in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)</source>
          , ACM, New York, NY, USA,
          <year>2023</year>
          , pp.
          <fpage>2796</fpage>
          -
          <lpage>2806</lpage>
          . URL: https://doi.org/10.1145/3539618.3591885. doi:
          <volume>10</volume>
          .1145/3539618.3591885.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Delabastita</surname>
          </string-name>
          ,
          <article-title>There's a Double Tongue: An Investigation into the Translation of Shakespeare's Wordplay, with Special Reference to Hamlet</article-title>
          , Rodopi, Amsterdam,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Delabastita</surname>
          </string-name>
          ,
          <article-title>Focus on the pun: Wordplay as a special problem in translation studies</article-title>
          ,
          <source>Target</source>
          <volume>6</volume>
          (
          <year>1994</year>
          )
          <fpage>223</fpage>
          -
          <lpage>243</lpage>
          . URL: https://doi.org/10.1075/target.6.2.07del.
          <source>doi:10.1075/target.6.2.07del.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <article-title>Translating jokes and puns</article-title>
          ,
          <source>Perspectives</source>
          <volume>19</volume>
          (
          <year>2011</year>
          )
          <fpage>59</fpage>
          -
          <lpage>70</lpage>
          . URL: https://doi.org/10.1080/ 0907676X.
          <year>2010</year>
          .
          <volume>485688</volume>
          . doi:
          <volume>10</volume>
          .1080/0907676X.
          <year>2010</year>
          .
          <volume>485688</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Troiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Padó</surname>
          </string-name>
          ,
          <article-title>Lost in back-translation: Emotion preservation in neural machine translation</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4340</fpage>
          -
          <lpage>4354</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>384</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Costa-jussà</surname>
          </string-name>
          , J.
          <string-name>
            <surname>Cross</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Çelebi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Elbayad</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Heafield</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hefernan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kalbassi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Licht</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Maillard</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Wenzek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Youngblood</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Akula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>G. Mejia</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hansanti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jarrett</surname>
            ,
            <given-names>K. R.</given-names>
          </string-name>
          <string-name>
            <surname>Sadagopan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Spruit</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <string-name>
            <surname>Ayan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bhosale</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Edunov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Goswami</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Guzmán</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mourachko</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Ropers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>No language left behind: Scaling human-centered machine translation</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>52</lpage>
          . URL: https://doi.org/10.1162/tacl_a_00447. doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00447</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pituxcoosuvarn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Murakami</surname>
          </string-name>
          ,
          <article-title>Jokes or gibberish? humor retention in translation with neural machine translation vs</article-title>
          .
          <source>large language model</source>
          ,
          <year>2024</year>
          . URL: https://papers.ssrn.com/sol3/papers. cfm?abstract_id=5148455, available at SSRN.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <article-title>Automatic chain of thought prompting in large language models</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2210.03493. arXiv:arXiv:
          <fpage>2210</fpage>
          .
          <fpage>03493</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Hempelmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Semeval
          <article-title>-2017 task 7: Detection and interpretation of english puns</article-title>
          ,
          <source>in: Proceedings of the 11th International Workshop on Semantic Evaluation</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>68</lpage>
          . URL: https://doi.org/10.18653/v1/
          <fpage>S17</fpage>
          -2005. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>S17</fpage>
          -2005.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Computational approaches to puns: A survey and proposal</article-title>
          , in: Workshop on Computational Approaches to Linguistic Code-Switching,
          <year>2017</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Evaluating humor in neural language models</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8210</fpage>
          -
          <lpage>8220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Partington</surname>
          </string-name>
          ,
          <article-title>A linguistic account of wordplay: The lexical grammar of punning</article-title>
          ,
          <source>Language Sciences</source>
          <volume>31</volume>
          (
          <year>2009</year>
          )
          <fpage>642</fpage>
          -
          <lpage>657</lpage>
          . URL: https://doi.org/10.1016/j.langsci.
          <year>2008</year>
          .
          <volume>09</volume>
          .002. doi:
          <volume>10</volume>
          .1016/j. langsci.
          <year>2008</year>
          .
          <volume>09</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M.</given-names>
            <surname>Palma Preciado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          , Overview of joker - clef
          <article-title>-2023 track on automatic wordplay analysis</article-title>
          ,
          <source>in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , volume
          <volume>14163</volume>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>397</fpage>
          -
          <lpage>415</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>26</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -42448-9_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>