<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Igor Kuzmin</string-name>
          <email>igor.kuzmin@upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universitat Pompeu Fabra</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Barcelona Supercomputing Center</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Humor processing remains a challenging problem for NLP due to linguistic ambiguity, language-specific nuances, and intricate wordplay. The CLEF JOKER 2025 lab tackles this with two tasks we participated in: humour-aware information retrieval in Portuguese and English (Task 1), and pun translation from English to French (Task 2). For Task 1 we developed a hybrid pipeline combining BM25, dense retrieval with multilingual-e5-small, and a cross-encoder reranker, achieving MAP 0.050 and NDCG@100 0.172 in English, and MAP 0.074 and NDCG@100 0.184 in Portuguese. For Task 2 we fine-tuned Lucie-7B-Instruct and CroissantLLMChat-v0.1 using supervised ifne-tuning (SFT) and Adaptive Rejection Preference Optimization (ARPO), obtaining a best BLEU of 42.40 (Lucie + SFT) and demonstrating a modest overlap trade-of (41.32 BLEU) when integrating ARPO, while CroissantLLM variants scored 35.17 and 35.28 BLEU. Our experiments show the baseline IR setup underperforms compared to more advanced systems, while the LLMs-based pun translation achieves best results conrfiming the promise of their cross-lingual wordplay transfer.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Humor Analysis</kwd>
        <kwd>Humor Retrieval</kwd>
        <kwd>Humor Translation</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>LLM</kwd>
        <kwd>Machine Translation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>While humour plays an essential role in human interaction, it remains a complex challenge for advanced
natural-language-processing (NLP) systems—even for the latest large language models (LLMs). Cultural
diferences, implicit meanings, intricate wordplay, and the inherently subjective nature of humour all
blur the clear indicators models rely on, making computational detection, translation and transfer of
humour far from trivial.</p>
      <p>
        The JOKER Lab at CLEF 2025 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduces multiple tasks that are aimed to address this automatic
humour analysis complexities. The following tasks are:
      </p>
      <sec id="sec-1-1">
        <title>2. To what extent can latest LLMs succeed at translating English wordplay into French while retaining humour and intent?</title>
      </sec>
      <sec id="sec-1-2">
        <title>Our contributions are threefold:</title>
        <p>• A systematic evaluation of hybrid retrieval with reranking for humour-aware IR in English and</p>
      </sec>
      <sec id="sec-1-3">
        <title>Portuguese.</title>
        <p>• An LLM-based pun translation pipeline that contrasts Supervised Fine-Tuning (SFT) with Adaptive</p>
      </sec>
      <sec id="sec-1-4">
        <title>Rejection Preference Optimization (ARPO). • A quantitative evaluation of BLEU performance for SFT-only versus ARPO-augmented models.</title>
      </sec>
      <sec id="sec-1-5">
        <title>The remainder of this report is organized as follows. Section 2 details our approaches, Section 3 presents experimental results, and Section 5 draws conclusion.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Approach</title>
      <sec id="sec-2-1">
        <title>2.1. Task 1: Humour-Aware Information Retrieval</title>
        <p>
          We follow the standard pipeline of (i) dense retrieval, (ii) lexical retrieval, and (iii) cross-encoder
reranking.
2.1.1. Data
The oficial English and Portuguese corpora and query sets provided by the organizers [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] constitute
our primary data. To enlarge training resources, we sampled 10% of the documents and generated
synthetic queries with gpt-4o-mini wordplay classifier, prompting it to both label each passage and,
for those marked as wordplay, generate a concise search-style query (see Listing 1).
        </p>
        <sec id="sec-2-1-1">
          <title>Listing 1: System prompt for the wordplay classifier</title>
          <p>You a r e an a s s i s t a n t t h a t c l a s s i f i e s s h o r t d o c u m e n t s a s w o r d p l a y (
j o k e s ) o r no t , and − i f i t i s w o r d p l a y − g e n e r a t e s a c o n c i s e
s e a r c h − s t y l e q u e r y t h a t would r e t r i e v e t h i s j o k e .</p>
          <p>F o r e x a m p l e :
− T e x t : "Why d i d t h e s c a r e c r o w win an award ? B e c a u s e he was
o u t s t a n d i n g i n h i s f i e l d . "
i s _ w o r d p l a y : T r u e
g e n e r a t e d _ q u e r y : " s c a r e c r o w award "
− T e x t : " The m i t o c h o n d r i a i s t h e p o w e r h o u s e o f t h e c e l l . "
i s _ w o r d p l a y : F a l s e
g e n e r a t e d _ q u e r y : " "</p>
          <p>We combined the original (query, reference) pairs with these synthetic pairs to form our
positive training set. Next, we performed hard-negative mining using a pretrained SentenceTransformer
(all-MiniLM-L12-v2) with the following configuration:
• Score range: retrieve candidates ranked 8–100 by cosine similarity
• Maximum similarity: 0.8 (to avoid too-easy negatives).
• Relative margin: 0.05 (filter out near-duplicates).</p>
          <p>• Negatives per positive: 5, sampled at random.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>This yields triplets of the form (query, positive, negatives).</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Finally, we split the mined triplets into train (90 %), validation (5 %), and test (5 %) sets by first holding out 10 % for evaluation and then equally splitting that hold-out into validation and test.</title>
          <p>
            2.1.2. Models
Next we fine -tune intfloat/multilingual-e5-small [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] for one epoch using 16-sentence batches
and a warm-up ratio of 0.1 on query–document pairs, to enhance humour-aware semantic search. We
were interested to compare two contrastive objectives: the popular Multiple-Negative Ranking loss
(MNRL) and our Adaptive Margin loss, inspired by SigLIP [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and MNRL.
          </p>
          <p>
            Humour often hinges on very subtle semantic shifts (puns, wordplays) where positives and hard
negatives lie close in embedding space; MNRL is given in Equation (1), it’s forces the model to pull
true humour examples away from all negatives, while Adaptive Margin loss introduces a temperature 
and bias  (Equation (2)), this dynamic penalty preserves learning signal for near-tie cases, helping
the retriever tease apart genuinely funny hits from near misses. Preliminary experiments on all-nli
dataset [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] indicated similar cosine distributions, with Adaptive Margin converging more stably.
          </p>
          <p>ℒMultipleNegativeRanking = 1 ∑︁[︁log∑︁ exp(︀ ⟨,  ⟩︀) −
=1</p>
          <p>=1
⏟ negative si⏞milarity
 
ℒAdaptiveMargin = 1 ∑︁[︁log∑︁ exp(︀  ⟨,  ⟩ + )︀ −
=1</p>
          <p>=1
where  =  ′ ,  ∈ R.
⟨ , ⟩
posi⏟tive similarity
⏞</p>
          <p>]︁.
︀(  ⟨, ⟩ + )︀ ]︁,
(1)
(2)</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>As lexical retriever we used a BM25 index [8] which is built with Anserini, while dense vectors are stored in Qdrant [9].</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Finally We train cross-encoder/ms-marco-MiniLM-L12-v2 [10] for two epochs on the mined triplets, using batch size 16 and warm-up ratio 0.1.</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>For each query we retrieve the top-1000 documents from both dense and BM25 indices, merge by reciprocal rank fusion, and rerank the top-100 with the cross-encoder.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Task 2: Wordplay Translation</title>
        <p>
          Our translation system follows a two-stage strategy: supervised fine -tuning (SFT) and ARPO preference
optimization.
2.2.1. Data
We merge the JOKER Task 2 corpus [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with parallel EN–FR sentences from X-ALMA1 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. After
formatting prompts with the template in Listing 2, the data are split 96 : 1.5 : 2.5 for SFT and
90 : 2.5 : 7.5 for ARPO preference tuning for train/val/test.
        </p>
        <sec id="sec-2-2-1">
          <title>We applied the following prompt:</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Listing 2: Translation prompt template</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>T r a n s l a t e t h e f o l l o w i n g t e x t from E n g l i s h i n t o F r e n c h .</title>
          <p>E n g l i s h : { s o u r c e }
F r e n c h : { t a r g e t }</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>1https://huggingface.co/datasets/haoranxu/X-ALMA-Parallel-Data</title>
          <p>
            2.2.2. Models
We experiment with croissantllm/CroissantLLMChat-v0.1 [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] and
OpenLLM-France/Lucie-7B-Instruct-v1.1 [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] due to their bilingual capabilities. Both
models are 8-bit-quantized and fine -tuned with LoRA [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. At inference we use 3-beam search,
temperature 0.3, top- 0.9, and repetition penalty 1.3.
          </p>
        </sec>
        <sec id="sec-2-2-5">
          <title>For both training and inference we employed random seed equal to 3407.</title>
          <p>
            2.2.3. Supervised Fine-Tuning
The supervised fine-tuning (SFT) script uses a completions-only data collator that masks out prompt
tokens and computes loss solely over the generated responses, which—given our instruction-style
data—forces the model to focus on learning to produce high-quality completions rather than memorizing
the prompts. Fine-tuning is done with the trl library [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], using an inverse-sqrt scheduler, a peak
learning rate of 5 × 10− 5, batch size 32, and gradient accumulation steps of 4.
2.2.4. ARPO Optimization
          </p>
          <p>
            ℒARPO = − E(,,)∼
To enhance humour retention, we apply ARPO2 [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] after SFT stage, which combines behavior-cloning
and preference losses. While Reinforcement learning from human feedback (RLHF) is known for
out-ofdomain improvement as well as better generalization within small amounts of training data compared
to supervised fine-tuning only, we were particularly interested in bringing the latest state-of-the-art
methods in Natural Machine Translation (NMT) to humour preservation tasks.
          </p>
        </sec>
        <sec id="sec-2-2-6">
          <title>The ARPO loss has two components: a behavior cloning (BC) term to prevent the model from drifting</title>
          <p>too far from its original distribution, and an adaptive preference term that rejects low-quality candidates.</p>
        </sec>
        <sec id="sec-2-2-7">
          <title>Formally, the core ARPO loss is defined as:</title>
          <p>[︁</p>
          <p>log  (︀  log   (|) −   (, )  log   (|))︀ + log   (|)]︁. (3)
Here, the first term inside the expectation is the preference loss (with a temperature  ), and the second
term is the BC regularization.</p>
          <p>
            The adaptive penalty weight   (, ) ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] modulates how strongly we down-weight the
likelihood of the worse translation , based on its similarity to the preferred output :
  (, ) = min(︁  (︀  ·  (,) − 1︀) , 1)︁ ,
(4)
where  is scale,  is a hyperparameter controlling penalty sensitivity, and  (, ) measures the
distance between the two responses via their average log-likelihoods:
          </p>
          <p>(, ) = ⃒⃒⃒ |1| log   (|) − |1| log   (|)⃒⃒⃒ . (5)</p>
        </sec>
        <sec id="sec-2-2-8">
          <title>Preference pairs for ARPO stage are obtained by sampling negative (rejected) translations with an 8</title>
          <p>bit version of X-ALMA model 3. We used the library’s default  (specified by relax_coefficient_2)
and the scale of   (, ),  (specified by relax_coefficient_1). The default values are  = 0.4
and  = 0.9.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Humour-Aware Information Retrieval</title>
        <sec id="sec-3-1-1">
          <title>2Source code: https://github.com/fe1ixxu/ALMA/tree/xalma</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>3https://huggingface.co/mradermacher/X-ALMA-13B-Group4-GGUF</title>
          <p>A família (do latim: familia) é um agrupamento
humano formado por duas ou mais pessoas com
ligações biológicas, ancestrais, legais ou afetivas
que, geralmente, vivem ou viveram na mesma
casa.</p>
          <p>Vision is the ability to think about or plan the
future with imagination and wisdom.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Although the pipeline returns high-confidence matches by retrieving lexically related definitions as shown in the Table 2, these passages lack any humorous content. This illustrates why our overall retrieval metrics remain low—the system fails to surface genuinely funny or wordplay.</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Wordplay Translation</title>
        <sec id="sec-3-2-1">
          <title>The oficial evaluation metric used by organizers for Task 2 is BLEU. Table 3 lists the four runs we submitted.</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Surprisingly, the BLEU evaluation reveals that our straightforward SFT model still holds a slight</title>
          <p>edge over its ARPO-enhanced counterparts: the SFT baseline scored 42.40 BLEU, compared to 41.32
for the Lucie-7B-Instruct-v1.1 + ARPO variant. This suggests that, although ARPO’s adaptive
preference loss can improve qualitative aspects—such as preserving humour or other nuanced translation
properties—it may do so at the cost of n-gram overlap as measured by BLEU.</p>
          <p>Table 4 provides a concrete example (source en_83) where ARPO better preserves the pun: the
SFT-only translation “poule en vitesse” is a literal but awkward translation, whereas the ARPO output
“poule pressée” more naturally mirrors the play on “pullet” and “pressé.” In practical terms, if maximizing
standard BLEU is the primary objective, the pure SFT approach remains the stronger choice. However,
if downstream qualities that BLEU cannot fully capture—such as humour preservation—are important,
integrating ARPO may still be worthwhile despite the slight BLEU trade-of.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Post-Competition Analysis</title>
      <sec id="sec-4-1">
        <title>After the oficial competition deadline, we conducted a detailed post-competition analysis of Task 2. Our goal was to combine multiple datasets, explore alternative hyperparameters, and evaluate diferent model variants to improve upon our initial training data. In particular, we extended our experiments with the Lucie-Instruct model, motivated by some skepticism regarding our original ARPO loss results.</title>
        <sec id="sec-4-1-1">
          <title>4.1. Extended SFT Experiments</title>
          <p>We updated the SFT configuration by merging the JOKER and X-ALMA translation pairs and compared
two training regimes:</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>1. Training exclusively on the JOKER Task 2 dataset.</title>
      </sec>
      <sec id="sec-4-3">
        <title>2. Training on the combined JOKER Task 2 and X-ALMA parallel datasets to increase the diversity</title>
        <p>of translation examples.</p>
        <p>First, we split the JOKER parallel dataset into training and validation sets (train size = 0.97, validation
size = 0.03) and mixed the SFT training split with the X-ALMA parallel dataset. A grid search was
performed over a range of hyperparameters (see Appendix A).</p>
      </sec>
      <sec id="sec-4-4">
        <title>After evaluating SFT, we selected the two best models according to the COMET-22 [16] metric since its has high correlation with human judgment: one trained on the JOKER-only dataset (v4) and one on the combined dataset (v8) (see Appendix B.</title>
        <sec id="sec-4-4-1">
          <title>4.2. Extended ARPO Experiments</title>
          <p>For ARPO, negative samples were generated for the JOKER Task 2 dataset using the X-ALMA model 3
with the same train/validation split. We again compared two setups:</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>1. Training solely on the obtained JOKER preference dataset.</title>
      </sec>
      <sec id="sec-4-6">
        <title>2. Training on a mixture of the JOKER preference dataset and X-ALMA EN-FR preference pairs</title>
        <p>dataset4.</p>
        <p>The ARPO hyperparameter grid is also detailed in Appendix A. Finally, we selected the best ARPO
models in terms of COMET-22 for each dataset combination and each SFT model (see Appendix B).</p>
        <sec id="sec-4-6-1">
          <title>4.3. Extended Results</title>
          <p>As shown in Table 5, the SFT-only models achieve higher BLEU scores, consistent with the findings in
Section 3. The ARPO-enhanced variants, however, fail to produce any significant gains on this metric.
Notably, the configurations in Appendix B show that optimal performance required diferent  values:
 = 0.4 for the JOKER-only dataset versus  = 1.0 for the combined dataset. This suggests that smaller,</p>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>4https://huggingface.co/datasets/haoranxu/X-ALMA-Preference</title>
        <p>less diverse datasets benefit from weaker penalties that preserve more translation variants, while larger,
more diverse datasets require stronger adaptive penalties to efectively filter translation quality while
maintaining optimization stability.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we presented two baseline systems for the CLEF 2025 JOKER Lab: a hybrid retrieval pipeline
combining BM25, dense retrieval with multilingual-e5-small, and cross-encoder reranking for
Task 1; and an LLM-based pun translation framework that combines SFT with ARPO preference
optimization for Task 2. Our retrieval system achieved mid-tier performance (MAP 0.050/0.074, NDCG@100
0.172/0.184), revealing that purely lexical or semantic matches often miss true wordplay. In our
translation experiments, SFT maximizes BLEU (42.40) but tends to produce overly literal translations, whereas</p>
      <sec id="sec-5-1">
        <title>ARPO trades a small drop in BLEU (41.32) for more idiomatic, pun-preserving outputs.</title>
      </sec>
      <sec id="sec-5-2">
        <title>Future work will explore sophisticated retrieval methods. We also plan to explore larger bilingual</title>
      </sec>
      <sec id="sec-5-3">
        <title>LLMs and more diverse training corpora to improve humour translation in multilingual settings.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>By using the activity taxonomy in https:// ceur-ws.org/ genai-tax.html:</p>
      <sec id="sec-6-1">
        <title>During the preparation of this work, the author(s) used OpenAI ChatGPT (GPT-4) in order to:</title>
        <p>• Formatting assistance: ensuring adherence to the formatting guidelines required by journals or
institutions.
• Peer review simulation: simulating peer review by providing feedback on the strengths and
weaknesses of the manuscript.</p>
        <p>• Coherence enhancement: improving the overall clarity and logical flow of the text.
After using this tool/service, the author(s) reviewed and edited the generated content as needed and
take(s) full responsibility for the publication’s content.
SFT Hyperparameter Grid
ARPO Hyperparameter Grid
Parameter
Learning rate</p>
        <p>Values</p>
        <p>× 10− 7
sigmoid
5
1
8</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>B. Training Configurations</title>
      <p>SFT configurations: dataset, learning rate, and epochs.
v1
JOKER + X-ALMA preference</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Clef 2025 joker lab: Humour in the machine</article-title>
          ,
          <source>in: Advances in Information Retrieval: 47th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2025</year>
          , Lucca, Italy, April 6-
          <issue>10</issue>
          ,
          <year>2025</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>V</given-names>
          </string-name>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2025</year>
          , p.
          <fpage>389</fpage>
          -
          <lpage>397</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>59</fpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 joker task 1: Humouraware information retrieval</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 joker task 2: Wordplay translation from english into french</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 joker task 3: Onomastic wordplay translation</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2025</year>
          ), CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <source>Multilingual e5 text embeddings: A technical report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2402.05672. arXiv:
          <volume>2402</volume>
          .
          <fpage>05672</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mustafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          , L. Beyer,
          <article-title>Sigmoid loss for language image pre-training</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2303.15343. arXiv:
          <volume>2303</volume>
          .
          <fpage>15343</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          ,
          <source>Found. Trends Inf. Retr</source>
          .
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          . URL: https://doi.org/10.1561/1500000019. doi:
          <volume>10</volume>
          .1561/ 1500000019.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Team</surname>
          </string-name>
          , Qdrant:
          <article-title>Vector search engine for the next generation of ai</article-title>
          , https://qdrant.tech/,
          <year>2025</year>
          . Accessed: June 1,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Neural Information Processing Systems</source>
          , NIPS '20, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koehn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eriguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Khayrallah</surname>
          </string-name>
          ,
          <article-title>X-alma: Plug play modules and adaptive rejection for quality translation at scale</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2410.03115. arXiv:
          <volume>2410</volume>
          .
          <fpage>03115</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Faysse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Guerreiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Loison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Corro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Boizard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Casademunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yvon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F. T.</given-names>
            <surname>Martins</surname>
          </string-name>
          , G. Viaud,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colombo</surname>
          </string-name>
          ,
          <article-title>Croissantllm: A truly bilingual french-english language model</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/ 2402.00786. arXiv:
          <volume>2402</volume>
          .
          <fpage>00786</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Gouvert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Louradour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cerisara</surname>
          </string-name>
          , E. Dufraisse,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rivière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Lorré</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.-F. community,</surname>
          </string-name>
          <article-title>The lucie-7b llm and the lucie training dataset: Open resources for multilingual language generation</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2503.12294. arXiv:
          <volume>2503</volume>
          .
          <fpage>12294</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mangrulkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belkada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bossan</surname>
          </string-name>
          , Peft:
          <article-title>State-of-the-art parametereficient fine-tuning methods</article-title>
          , https://github.com/huggingface/peft,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] L. von
          <string-name>
            <surname>Werra</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Belkada</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Tunstall</surname>
            , E. Beeching,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Thrush</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Lambert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Rasul</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Gallouédec</surname>
          </string-name>
          , Trl: Transformer reinforcement learning, https://github.com/huggingface/trl,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rei</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. G. C. de Souza</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Zerva</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Farinha</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Glushkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lavie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Coheur</surname>
            ,
            <given-names>A. F. T.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , COMET-22:
          <article-title>Unbabel-IST 2022 submission for the metrics shared task</article-title>
          , in: P.
          <string-name>
            <surname>Koehn</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Bojar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Chatterjee</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          <string-name>
            <surname>Costa-jussà</surname>
            , C. Federmann,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Fishel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Grundkiewicz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Guzman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Haddow</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Huck</surname>
            ,
            <given-names>A. Jimeno</given-names>
          </string-name>
          <string-name>
            <surname>Yepes</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Kocmi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Morishita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Monz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Nagata</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Nakazawa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Negri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Névéol</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Popel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Turchi</surname>
          </string-name>
          , M. Zampieri (Eds.),
          <source>Proceedings of the Seventh Conference on Machine Translation (WMT)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi,
          <source>United Arab Emirates (Hybrid)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>578</fpage>
          -
          <lpage>585</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .wmt-
          <volume>1</volume>
          .52/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>A. Hyperparameter Grids 0.01 0.01 inverse_sqrt 0.01 0.01 1.0 0.1 0</source>
          .9 {
          <issue>0</issue>
          .
          <issue>4</issue>
          ,
          <issue>1</issue>
          .0,
          <issue>1</issue>
          .5}
          <issue>1</issue>
          .
          <fpage>0</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>