<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. D. Jimenez);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UTBNLP at CLEF JOKER 2025 Task 2: mBART-50 Fine-Tuning with Dictionary-Guided Forced Decoding and Phoneme-Based Techniques for English-French Pun Translation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Duvan Andres Marrugo-Tobon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeison D. Jimenez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jairo E. Serrano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan C. Martinez-Santos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edwin Puertas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Tecnológica de Bolívar, School of Digital Transformation</institution>
          ,
          <addr-line>Cartagena de Indias 130010</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper presents a pun-focused translation system developed for the JOKER-2025 Task 2 competition on English-to-French pun translation. The system combines mBART-50 fine-tuning, LoRA adapter training, and a novel phoneme-aware forced-decoding strategy guided by a specialized pun dictionary. The end-to-end pipeline encompasses robust pun detection and tagging, oversampling-based data augmentation, phoneme transcription, and sentiment feature enrichment, enabling comprehensive capture of the layered meanings and playful ambiguity inherent in puns. Multiple model configurations and decoding schemes were evaluated, with the best configuration achieving a BLEU score of 32.82 and ranking 22nd overall. These findings underscore the efectiveness of incorporating phonetic cues and lexical constraints to enhance both faithfulness and humor preservation in pun translation across languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Pun translation</kwd>
        <kwd>mBART-50</kwd>
        <kwd>phoneme transcription</kwd>
        <kwd>sentiment analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The automatic translation of humor remains one of the most persistent challenges in natural language
processing (NLP), especially when it comes to wordplay such as puns. Recent studies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] highlight
that humorous content demands systems capable of navigating a delicate balance between semantic
ifdelity and creative linguistic adaptation. Unlike general text, humor heavily relies on implicit
cultural knowledge, phonetic ambiguity, and contextual cues that do not always transfer directly across
languages.
      </p>
      <p>
        Among humorous forms, puns stand out for their intricate layering of meanings through phonological
resemblance, lexical ambiguity, or syntactic manipulation. This dual (or multiple) reading efect often
exploits language-specific structures, making literal translation inefective or outright impossible [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
The translation of puns thus poses challenges not only for automatic systems but also for human
translators, requiring a blend of linguistic creativity and cultural sensitivity.
      </p>
      <p>
        In response to these challenges, shared tasks like JOKER [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ] have provided a standardized
benchmark for the computational humor community. They promote the development of new datasets,
evaluation metrics, and innovative translation strategies specifically tailored to preserve wordplay
across languages. These tasks have revealed that traditional neural machine translation (NMT)
models, optimized for literal semantic transfer, tend to underperform when the goal shifts from factual
correctness to stylistic and pragmatic equivalence.
      </p>
      <p>Recent developments in multilingual sequence-to-sequence modeling, most notably the mBART-50
architecture combined with parameter-eficient adaptation techniques such as LoRA adapters [ 7, 8] have
unlocked powerful domain-specific translation capabilities. Complementing these modeling advances,
work on lexically constrained decoding [9] and full phoneme-level pipelines [10] has demonstrated that
explicit control over the generation process can help retain critical structural elements in challenging
tasks, such as pun translation.</p>
      <p>A variety of specialized methods for handling wordplay have since emerged:
• Dual-attention architectures that localize and interpret puns via gloss pairs [11].
• GPT-3–based generators exploiting contextual ambiguity for creative pun synthesis [12].
• Enriched corpora annotated with keywords and explanatory notes to boost classification and
generation quality [13].
• Unified frameworks addressing both homophonic and homographic puns [14].
• Grapheme-to-phoneme augmentation techniques preserving acoustic wordplay signals [15].
• Large-model evaluations exposing “lazy” pun reproduction and motivating new metrics [16].
While these approaches advance semantic disambiguation, data balancing, and lexical enforcement, few
integrate phonetic features directly into the decoding mechanism itself.</p>
      <p>This paper proposes an end-to-end pipeline for English-to-French pun translation that unites
mBART50 fine-tuning (with optional LoRA adapters), systematic pun detection and markup, explicit G2P-based
phoneme transcription, and a phoneme-aware forced-decoding strategy guided by a curated pun
dictionary. By enriching inputs with sentiment and phonetic embeddings, experiments show substantial
gains in preserving both the humorous intent and the phonological wordplay of source puns. This
paper is structured as follows: Section 2 describes the methodology, including dataset description, pun
detection and tagging, data augmentation, feature enrichment, and model training. Section 3 presents
performance results and qualitative analysis of the pipeline variants. Finally, Section 4 ofers conclusions
and outlines directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>This section describes the architecture of the English-to-French pun translation system developed for
Task 2 of the JOKER-2025 competition. As shown in Figure 1, the workflow comprises five main stages:
• Data Preprocessing: Automated detection of puns in the source text and insertion of explicit
markup to guide translation.
• Feature Engineering: Enrichment of input data with sentiment labels and phonemic
transcriptions to capture both emotional tone and sound-based wordplay.
• Model Training: Comparison of two complete fine-tuning training strategies versus the LoRA
adaptation that is eficient in parameterizing applied to the multilingual model mBART-50.
• Inference: Application of a custom forced decoding algorithm, guided by a pun-specific lexicon,
to ensure the preservation of key wordplay elements during translation.
• Evaluation: Performance assessment using both automatic metrics and manual evaluation
criteria defined by the JOKER track, with an emphasis on maintaining the intended humorous
efect.</p>
      <sec id="sec-2-1">
        <title>2.1. Dataset Description</title>
        <p>
          The dataset used in this work was released as part of the JOKER 2025 shared task on wordplay translation
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. It consists of English-French translation pairs curated to evaluate pun translation quality, with
Input
Data Raw Data
        </p>
        <p>Pun Detection &amp; Tagging</p>
        <p>Data Augmentation
(3x oversampling)</p>
        <p>Cleaning</p>
        <p>Sentiment
Analysis
Phoneme
Transcription</p>
        <p>Training and Model</p>
        <p>Selection
mBART-50
Fine-Tuning
LoRA Adapter</p>
        <p>Trainig
Tokenizer</p>
        <p>Evaluation
each entry formatted as a JSON object containing: id_en (unique identifier), en (source English text
containing wordplay), and fr (French translation).</p>
        <p>A distinctive characteristic of the training data is the presence of multiple French translations for each
English pun, reflecting diferent approaches to wordplay preservation. The dataset contains training
examples similar to those provided in previous editions, ofering diverse examples of pun translation
strategies.</p>
        <p>The test dataset follows the same format but contains only the original English text (en) and identifier
(id_en), requiring systems to generate French translations that preserve both semantic meaning and
humorous efect. The evaluation prioritizes wordplay preservation over traditional translation metrics,
following Delabastita’s pun translation strategy. Table 1 shows the composition of the dataset provided
by the competition organizers.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Preprocessing</title>
        <p>The preprocessing pipeline implements a systematic approach to prepare the pun translation dataset for
model training. As illustrated in Figure 1, the preprocessing stage consists of two sequential operations
designed to enhance wordplay structure representation and address dataset imbalances inherent in pun
translation tasks.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Pun Detection and Tagging</title>
        <p>English puns are automatically identified by matching the input text against a curated lexicon, and each
matched occurrence is wrapped with special delimiters to guide the translator, as shown in Algorithm 1.</p>
        <sec id="sec-2-3-1">
          <title>Key points</title>
          <p>• Each input sentence is iterated over and scanned against the English to French pun dictionary.
• On the first case-insensitive match, &lt;PUN_EN&gt; is injected before and after the English pun.
• During training the French side can receive &lt;PUN_FR&gt; around the mapped French entry.
• This explicit markup focuses the downstream model on the wordplay span without disturbing
the surrounding context.</p>
          <p>Algorithm 1 Pun Detection and Tagging
Require: List of sentences {}, pun lexicon ℒ = {(,  )} where  is English pun,  its French
equivalent
Ensure: Tagged sentences with ⟨PUN_EN⟩, ⟨PUN_FR⟩
1: for each sentence  in input do
2: tagged ← 
3: for each (,  ) in ℒ do
4: if lower() occurs in lower() then
5: Mark first match: tagged ← replaceOnce( tagged, , ⟨PUN_EN⟩  ⟨PUN_EN⟩)
6: break ◁ Only tag one pun per sentence
7: end if
8: end for
9: Output tagged
10: end for</p>
          <p>Tagging Example
Original:
A skier retired because he was going downhill.</p>
          <p>Tagged:</p>
          <p>A skier retired because he was going &lt;PUN_EN&gt; downhill &lt;PUN_EN&gt;.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Data Augmentation</title>
        <p>Targeted oversampling is applied to mitigate the imbalance between punny and non-punny examples.
Each pun-containing instance is replicated three times in the training set. This ensures a balanced
representation of wordplay cases during learning, following established minority-class techniques to
improve the model’s ability to handle scarce patterns.</p>
        <p>As illustrated in Figure 2, the augmentation process operates systematically through four sequential
stages.</p>
        <p>• Pun Identification: Identify all examples containing at least one detected pun.
• Data Augmentation: Create three additional copies of each example containing puns.
• Dataset Consolidation: Combine original and augmented examples into a unified training set.
• Dataset Shufling : Shufle the dataset using a fixed random seed for reproducibility.</p>
        <p>Train
Dataset</p>
        <p>Pun
Identification</p>
        <p>Data
Augmentation</p>
        <p>Dataset
Consolidation</p>
        <p>Dataset
Shuffling</p>
        <p>The augmentation process creates a more balanced training distribution, allowing the fine-tuned
model to develop robust representations for both standard translation patterns and specialized wordplay
mechanisms. Without this rebalancing, the model would predominantly optimize for literal translation
patterns, potentially failing to preserve wordplay in test scenarios.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.3. Features Enrichment</title>
        <p>A comprehensive feature engineering approach is implemented to incorporate linguistic characteristics
relevant to wordplay translation. This stage enriches the preprocessed dataset with semantic and
phonetic features designed to capture emotional and acoustic patterns that may influence preservation
strategies for puns.</p>
        <sec id="sec-2-5-1">
          <title>2.3.1. Sentiment Analysis</title>
          <p>Each training instance is augmented with sentiment classification to capture the emotional context of
humorous content. Sentiment analysis is performed by a pre-trained transformer model
(cardifnlp/twitterroberta-base-sentiment-latest) fine-tuned on social media data [ 17], providing robust performance for
informal and creative text typical in pun datasets.</p>
          <p>The sentiment classifier assigns one of three labels to each original English sentence: POSITIVE,
NEGATIVE, or NEUTRAL. This categorical sentiment information is stored in a new field in the dataset,
providing the translation model with explicit emotional context that can guide the preservation of the
appropriate tone in the target language.</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>2.3.2. Phonemic Transcription</title>
          <p>Automatic phonemic transcription is implemented using a grapheme-to-phoneme (G2P) conversion
system [15] to capture phonetic patterns relevant to sound-based wordplay. The G2P model converts
English text to ARPAbet phonetic notation, providing explicit acoustic representations that may inform
the model about sound-based pun mechanisms. The phonemic transcription is stored in a new field,
enabling the model to access pronunciation patterns that are often crucial for wordplay preservation,
particularly for puns based on homophones or rhymes. Each training record contains a unique identifier,
the English pun, and one or more human-verified French translations. For feature enrichment, each
sentence is paired with its phonemic transcription in ARPAbet notation to highlight sound-based
wordplay structures. Algorithm 2 summarizes the phoneme extraction process.</p>
          <p>Algorithm 2 Detailed Feature Enrichment Process
Require: Dataset of pun sentences with unique IDs and English text
Ensure: The same dataset with added sentiment and phoneme annotations
1: for each record in the dataset do
2: Extract the English text
3: Preprocess the text:
• Lowercase normalization
• Removal of extra whitespace or unwanted characters
4:
5:</p>
          <p>Predict the sentiment label (POSITIVE, NEGATIVE, NEUTRAL) using a pre-trained sentiment
classification model</p>
          <p>Generate the phoneme transcription:
• Tokenize the text into words
• For each word, obtain its ARPAbet phoneme sequence
• Reconstruct the full sentence phoneme string, preserving punctuation
6: Add the sentiment label and phoneme string to the record
7: Leave the translation field empty for future prediction
8: end for
9: return Enriched dataset in JSON format with fields: id_en, en, sent, phon, fr
Each input pun undergoes careful preprocessing, sentiment classification, and phoneme conversion to
enrich its representation for training. This step enables the translation model to capture the emotional
tone and sound-based ambiguities inherent in puns. Algorithm 2 describes this process in detail, and an
example output is presented below.</p>
          <p>Example of a Fully Enriched Record</p>
          <p>id_en: en_1
en: Save the whales, spouted Tom.
sent: POSITIVE
phon: S EY1 V DH AH0 W EY1 L Z ,
fr: Sauvez les baleines, souffla Tom.</p>
          <p>S P AW1 T AH0 D</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>2.4. Model Training</title>
        <p>Two distinct fine-tuning strategies for pun translation are evaluated, comparing complete parameter
optimization against parameter-eficient adaptation techniques. Both approaches utilize mBART-50
(facebook/mbart-large-50-many-to-many-mmt) as the base multilingual transformer [18], configured
for English-to-French translation with source language token en_XX and target language token fr_XX.</p>
        <p>Algorithm 3 details the training procedure for both full fine-tuning and LoRA adaptation.
Algorithm 3 Model Training for Pun Translation
Require: Enriched training data, pre-trained mBART-50 model
Ensure: the Fine-tuned model is ready for pun translation
1: Initialize the mBART-50 model and tokenizer
2: Define the objective: maximize translation accuracy while preserving wordplay
3: Tokenize sentences and truncate to maximum length
4: Create mini-batches and set optimizer and learning rate scheduler
5: if Full fine-tuning then
6: Update all model weights
7: else
8: Freeze base weights and train only low-rank adapters (LoRA)
9: end if
10: for each training epoch do
11: Compute loss on each batch
12: Backpropagate gradients and update weights
13: Periodically evaluate the validation set
14: end for
15: Select best checkpoint based on validation performance
16: return Trained model for inference</p>
        <sec id="sec-2-6-1">
          <title>2.4.1. Full Parameter Fine-tuning</title>
          <p>The first approach implements complete model fine-tuning, updating all parameters of the
mBART50 architecture during training. This method provides maximum model expressivity by allowing
unrestricted parameter updates across all transformer layers, attention mechanisms, and feed-forward
networks.</p>
          <p>The training process employs the Seq2SeqTrainer framework with the hyperparameter configuration
shown in Table 2.</p>
          <p>The complete fine-tuning approach enables comprehensive adaptation to the pun translation domain,
allowing the model to learn specialized linguistic patterns for wordplay preservation across all network
parameters. This method provides maximum learning capacity by updating the entire transformer
architecture.</p>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>2.5. LoRA Adapter Training</title>
        <p>The second approach implements Low-Rank Adaptation (LoRA) [7], a parameter-eficient fine-tuning
technique that freezes the pre-trained model weights and trains only low-rank decomposition matrices
inserted into attention layers. This method significantly reduces the number of trainable parameters
while maintaining competitive performance.</p>
        <p>The LoRA configuration targets attention projection matrices with the settings detailed in Table 3.</p>
        <p>The adapter training maintains the same core hyperparameters as full fine-tuning but operates on a
drastically reduced parameter space. The training configuration turns of intermediate checkpointing
and external logging for eficiency, thereby concentrating computational resources on the adapter
optimization process.</p>
        <p>Both training strategies incorporate the enhanced dataset with pun tagging and linguistic features,
enabling comparison between maximum parameter flexibility and eficient domain adaptation for
specialized translation tasks. The following boxed example illustrates how the trained model processes
an enriched input step by step. It demonstrates how tokenization, phoneme cues, sentiment guidance,
and forced decoding work together to generate a pun-preserving French translation.</p>
        <p>Step-by-Step Example: From Enriched Input to Final Translation
Input Record:
id_en: en_1
en: Save the whales, spouted Tom.
sent: POSITIVE
phon: S EY1 V DH AH0</p>
        <p>W EY1 L Z ,
Model Process:</p>
        <p>S P AW1 T AH0 D
1. Tokenization: The English sentence is tokenized with special language tokens (e.g.,
en_XX, fr_XX).
2. Encoding: The tokenized input is converted into embeddings using mBART-50’s encoder
layers.
3. Forced Decoding: During generation, the decoder applies lexical constraints guided by
the pun dictionary and phoneme cues.
4. Sentiment Guidance: The model conditions the output tone to match the predicted
sentiment (here: POSITIVE).</p>
        <p>5. Output: The decoder generates a French pun translation that preserves the wordplay.
Final Translation:
fr: Sauvez les baleines, souffla Tom.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Performance Results</title>
      <p>The JOKER-2025 Task 2 competition for English-to-French pun translation was addressed using a
progressive pipeline comprising three configurations: a fully fine-tuned system and two
parametereficient LoRA variants. Table 4 summarizes the oficial BLEU scores assigned during the automatic
evaluation phase.</p>
      <p>The results indicate that the Full Fine-tuning model substantially outperformed both LoRA-based
variants, achieving approximately 4-5 points higher BLEU. It suggests that, for pun translation, extensive
parameter updates allow the model to learn better the subtle interplay of lexical form and multiple
senses typical of wordplay.</p>
      <p>However, as established in the JOKER evaluation framework, BLEU alone cannot reliably measure the
success of pun translation, which requires preserving both semantic coherence and creative linguistic
distortions. Therefore, all submitted systems are additionally evaluated by human experts using
criteria including lexical field preservation, pun form retention, appropriateness of style shift, humor
maintenance, and the detection of syntactic or lexical inaccuracies. Ultimately, the final ranking
prioritizes the number of translations that successfully transfer the original wordplay mechanisms.</p>
      <p>Example of Successful Pun Preservation (Full Fine-tuning)
English: “A skier retired because he was going downhill.”
Phonemes: AH0 S K IY1 ER0 R IH0 T AY1 ER0 D B IH0 K AO1 Z IY1 W AA1 Z G OW1
IH0 NG D AW1 N HH IH1 L
French (Full Fine-tuning): “Le skieur a pris sa retraite car il n’arrivait plus à remonter la pente.”
Analysis: The model correctly preserves the double sense of “going downhill” (literal skiing +
ifgurative decline) by using the idiomatic French expression “remonter la pente”, demonstrating
strong handling of layered meaning and phonetic nuance.</p>
      <p>Example of Failed Pun Preservation (Baseline LoRA)
English: “Someone once accused me of stealing an old, rare, valuable stamp, and I philately denied
it.”
Phonemes: S AH1 M W AH1 N AH1 N S AH0 K Y UW1 Z D M IY1 AH0 V S T IY1 L IH0
NG AE1 N OW1 L D R EH1 R V AE1 L Y UW0 B AH0 L S T AE1 M P AE1 N D AY1 F
IH0 L AE1 T L IY0 D IH0 N AY1 D IH0 T
French (Baseline LoRA): “Quelqu’un m’a accusé d’avoir volé un timbre ancien et rare, et je l’ai
nié sans hésiter.”
Analysis: Here, the pun on “philately” (which plays on “flatly denied”) is entirely lost. The baseline
LoRA output conveys a literal denial, missing both the wordplay and the sound-based twist. It
shows LoRA’s limits without explicit phoneme constraints.</p>
      <p>Combining automatic and qualitative results, the team achieved 5th place overall, with the highest
BLEU of 32.82 for the Full Fine-tuning system. It confirms that extensive parameter updates yield
more robust pun preservation. In contrast, the Baseline and Refined LoRA models, despite being
computationally lighter, often fall back on literal phrasing, primarily when the pun hinges on phonetic
similarity or double meanings. The contrast demonstrates that creative phenomena, such as puns,
demand models to learn deeper associations across syntax, semantics, and phonology, which lightweight
adapters alone may fail to capture fully.</p>
      <p>Therefore, while BLEU provides an initial signal, manual expert evaluation remains crucial for fair
ranking, as it verifies whether both the form and sense of the original wordplay survive translation.
These insights support the adoption of hybrid approaches, which combine eficient adaptation with
advanced decoding and post-editing to bridge the gap between automatic translation and human-like
linguistic creativity.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>Our system Full Fine-tuning achieved the highest BLEU (32.82, 22nd place) by fully updating the
mBART50 weights, which allowed it to capture both the semantic and phonetic layers of puns (e.g. ’going
downhill’remonter la pente). However, this approach incurs significant training and inference costs
and still occasionally falters in out-of-vocabulary or heavily constrained puns. In contrast, the two
LoRA variants (BLEU 28.17 and 27.57) required far fewer resources but consistently defaulted to literal
renderings when phonetic or lexical cues were subtle (e.g., ’philately denied’plain denial), exposing
their limited capacity to model creative wordplay.</p>
      <p>These results underscore two key lessons: First, BLEU, while useful, only partially reflects the success
of humor translation, as high overlap does not guarantee the preservation of wordplay form or tone shift.
Second, manual expert evaluation remains indispensable to assess lexical field preservation, stylistic
shifts, and the maintenance of humorous efect. Only human judgments can reliably rank systems by
their ability to transfer both the form and the sense of puns across languages.</p>
      <p>Future directions include:
• Dynamic Phoneme Constraints: integrate on-the-fly phoneme matching to handle unseen
puns.
• Pun-Aware Metrics: develop semi-automatic measures that correlate with human judgments of
wordplay.
• Adapter Enhancements: augment lightweight LoRA with explicit phoneme–grapheme
alignment layers to narrow the quality gap.
• Cross-Domain Extension: apply and evaluate the pipeline on other language pairs and humor
forms (idioms, jokes).</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors would like to acknowledge the support provided by the master’s degree scholarship program
in engineering at the Universidad Tecnologica de Bolivar (UTB) in Cartagena, Colombia.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling
check. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.
[7] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
adaptation of large language models, 2021. URL: https://arxiv.org/abs/2106.09685. arXiv:2106.09685.
[8] T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, Qlora: Eficient finetuning of quantized
llms, 2023. URL: https://arxiv.org/abs/2305.14314. arXiv:2305.14314.
[9] S. Braun, O. Vasilyev, N. Iskender, J. Bohannon, Does summary evaluation survive translation
to other languages?, in: M. Carpuat, M.-C. de Marnefe, I. V. Meza Ruiz (Eds.), Proceedings
of the 2022 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle,
United States, 2022, pp. 2425–2435. URL: https://aclanthology.org/2022.naacl-main.173/. doi:10.
18653/v1/2022.naacl-main.173.
[10] X. Zhao, E. Durmus, D.-Y. Yeung, Towards reference-free text simplification evaluation with a BERT
Siamese network architecture, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Findings of the
Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics,
Toronto, Canada, 2023, pp. 13250–13264. URL: https://aclanthology.org/2023.findings-acl.838/.
doi:10.18653/v1/2023.findings-acl.838.
[11] S. Liu, M. Ma, H. Yuan, J. Zhu, Y. Wu, M. Al-Ajarmah, A dual-attention neural network for pun
location and using pun-gloss pairs for interpretation, 2021. doi:10.48550/arXiv.2110.07209.
[12] A. Mittal, Y. Tian, N. Peng, AmbiPun: Generating humorous puns with ambiguous context
(2022) 1053–1062. URL: https://aclanthology.org/2022.naacl-main.77/. doi:10.18653/v1/2022.
naacl-main.77.
[13] J. Sun, A. Narayan-Chen, S. Oraby, A. Cervone, T. Chung, J. Huang, Y. Liu, N. Peng, ExPUNations:
Augmenting puns with keywords and explanations (2022) 4590–4605. URL: https://aclanthology.
org/2022.emnlp-main.304/. doi:10.18653/v1/2022.emnlp-main.304.
[14] Y. Tian, D. Sheth, N. Peng, A unified framework for pun generation with humor principles
(2022) 3253–3261. URL: https://aclanthology.org/2022.findings-emnlp.237/. doi: 10.18653/v1/
2022.findings-emnlp.237.
[15] A. Pine, P. William Littell, E. Joanis, D. Huggins-Daines, C. Cox, F. Davis, E. Antonio Santos,
S. Srikanth, D. Torkornoo, S. Yu, G2P rule-based, index-preserving grapheme-to-phoneme
transformations, in: S. Moeller, A. Anastasopoulos, A. Arppe, A. Chaudhary, A. Harrigan, J. Holden,
J. Lachler, A. Palmer, S. Rijhwani, L. Schwartz (Eds.), Proceedings of the Fifth Workshop on the Use
of Computational Methods in the Study of Endangered Languages, Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 52–60. URL: https://aclanthology.org/2022.computel-1.7/.
doi:10.18653/v1/2022.computel-1.7.
[16] Z. Xu, S. Yuan, L. Chen, D. Yang, “a good pun is its own reword”: Can large language models
understand puns?, in: Y. Al-Onaizan, M. Bansal, Y.-N. Chen (Eds.), Proceedings of the 2024
Conference on Empirical Methods in Natural Language Processing, Association for Computational
Linguistics, Miami, Florida, USA, 2024, pp. 11766–11782. URL: https://aclanthology.org/2024.
emnlp-main.657/. doi:10.18653/v1/2024.emnlp-main.657.
[17] D. Loureiro, F. Barbieri, L. Neves, L. Espinosa Anke, J. Camacho-collados, TimeLMs: Diachronic
language models from Twitter, in: Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics: System Demonstrations, Association for Computational Linguistics,
Dublin, Ireland, 2022, pp. 251–260. URL: https://aclanthology.org/2022.acl-demo.25. doi:10.18653/
v1/2022.acl-demo.25.
[18] Y. Tang, C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, A. Fan, Multilingual translation
with extensible multilingual pretraining and finetuning, arXiv preprint arXiv:2008.00401 (2020).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Aamir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Biznárová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scigliuzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Osman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lozano</surname>
          </string-name>
          , I. Strandberg,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gasparinetti</surname>
          </string-name>
          ,
          <article-title>Engineering symmetry-selective couplings of a superconducting artificial molecule to microwave waveguides</article-title>
          ,
          <source>Physical Review Letters</source>
          <volume>129</volume>
          (
          <year>2022</year>
          ). URL: http: //dx.doi.org/10.1103/PhysRevLett.129.123604. doi:
          <volume>10</volume>
          .1103/physrevlett.129.123604.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dev</surname>
          </string-name>
          , S. Singh,
          <string-name>
            <surname>MISGENDERED:</surname>
          </string-name>
          <article-title>Limits of large language models in understanding pronouns</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>5352</fpage>
          -
          <lpage>5367</lpage>
          . URL: https://aclanthology. org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>293</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>293</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Khalid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alikhani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stone</surname>
          </string-name>
          ,
          <article-title>Combining cognitive modeling and reinforcement learning for clarification in dialogue</article-title>
          , in: D.
          <string-name>
            <surname>Scott</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Bel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Zong (Eds.),
          <source>Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4417</fpage>
          -
          <lpage>4428</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .coling-main.
          <volume>391</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .coling-main.
          <volume>391</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF 2025 JOKER lab: Humour in machine</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2025 JOKER task 1: Humour-aware information retrieval</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , et al.,
          <article-title>Overview of the CLEF 2025 JOKER task 2: Wordplay translation from english into french</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>