<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. K. P);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>REC_Cryptix at JOKER CLEF 2025: Teaching Machines to Laugh: Multilingual Humor Detection and Translation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sarath Kumar P</string-name>
          <email>P@10</email>
          <email>P@100</email>
          <email>P@1000</email>
          <email>P@15</email>
          <email>P@20</email>
          <email>P@200</email>
          <email>P@30</email>
          <email>P@5</email>
          <email>P@500</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beulah A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sushmitha M</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanalaxmi S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Artificial Intelligence and Data Science, Rajalakshmi Engineering College</institution>
          ,
          <addr-line>Chennai, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Humor is a cognitively complex and culturally specific linguistic phenomenon, posing significant challenges for computational modeling. It often transcends formal grammar, exploits phonetic and semantic ambiguities, and varies widely across languages and cultures. The JokerCLEF 2025 challenge addresses this problem through three distinct sub-tasks: Humour-Aware Information Retrieval, Wordplay Translation, and Onomastic Wordplay Translation. This paper presents the system developed by Team Cryptix, which participated in all three tasks. We employed a combination of state-of-the-art models-SBERT, MarianMT, and T5-chosen for their capabilities in semantic representation, multilingual processing, and text generation. Our approach includes task-specific ifne-tuning, targeted feature engineering, and the integration of human-in-the-loop evaluations to refine output quality. We also describe the datasets, preprocessing steps, and evaluation strategies used for each sub-task. Empirical results show that our system demonstrates consistent performance in both retrieval and translation of humorous content, maintaining the intent, nuance, and cultural relevance of the original text. Our findings underscore the importance of combining semantic-aware models with culturally informed design when handling humor in multilingual settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Humor Detection</kwd>
        <kwd>Information retrieval</kwd>
        <kwd>Machine Translation</kwd>
        <kwd>Multilingual translation</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Onomastics</kwd>
        <kwd>Wordplay</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Humor is a complex and culturally dependent aspect of human language, making it particularly
challenging for computational systems to process. The CLEF 2025 Joker [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] shared task aims to push
the boundaries of humor-aware natural language processing through three subtasks: Humour-aware
Information Retrieval, which focuses on retrieving humor-relevant documents for a given query;
Wordplay Translation, involving the translation of puns and linguistic humor; and Onomastic Wordplay
Translation, which targets humor derived from names and culturally specific references. These tasks
are particularly challenging due to the semantic ambiguity of humor, shifts in cultural context, phonetic
diferences across languages, and the scarcity of explicitly annotated humorous datasets.
      </p>
      <p>
        The exploration of deep learning techniques for multilingual natural language processing (NLP)
tasks has gained significant momentum, particularly in areas such as intent classification, hate speech
detection, and ofensive language identification across diverse languages [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These studies
demonstrate the efectiveness of transfer learning models, especially multilingual transformers like
mBERT and XLM-RoBERTa, in handling low-resource languages and code-mixed data, highlighting the
importance of multilingual representations in cross-lingual tasks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In the context of data augmentation, techniques such as back translation and paraphrasing have
been employed to enhance model performance in hate speech detection, illustrating the utility of
synthetic data generation in multilingual settings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Similarly, the use of image translation models like
      </p>
      <p>
        CycleGAN for data augmentation in road scene analysis underscores the potential of deep generative
models to address domain-specific data scarcity issues [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These approaches emphasize the broader
applicability of deep learning-based translation and augmentation methods across modalities and tasks.
      </p>
      <p>
        Specifically related to multilingual translation, hybrid deep learning models incorporating Statistical
Machine Translation (SMT) and neural architectures have been proposed for multilingual machine
translation, with implementations focusing on Asian languages [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Such models facilitate cross-lingual
transfer and improve translation quality in multilingual contexts, which is crucial for tasks requiring
accurate language conversion.
      </p>
      <p>
        While humor detection remains an emerging area within AI, recent frameworks aim to improve
contextual understanding of humorous expressions through advanced modeling techniques, including
pseudo-labeling and post-smoothing strategies [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Although not directly focused on multilingual
humor detection, these developments suggest avenues for enhancing humor recognition systems by
leveraging deep learning’s capacity for nuanced language understanding.
      </p>
      <p>
        In the realm of multilingual humor detection and translation, the integration of deep learning
models—particularly transformer-based architectures—can be instrumental in capturing the subtleties
of humor across languages. The use of translation-based data augmentation, as demonstrated in hate
speech detection, could be adapted to generate multilingual humor datasets, thereby addressing data
scarcity and enabling more robust humor detection systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Moreover, the success of transfer
learning models in multilingual settings indicates their potential to facilitate humor classification and
translation tasks simultaneously, fostering more efective cross-lingual humor understanding.
      </p>
      <p>
        Overall, the convergence of deep learning, transfer learning, and generative models ofers promising
pathways for advancing multilingual humor detection and translation. These approaches can help
overcome linguistic and cultural barriers, enabling AI systems to better interpret and generate humor
across diverse languages and contexts [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]; [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Our approach overcomes these hurdles by leveraging cutting-edge models and techniques: SBERT
for its semantic embeddings in IR, MarianMT for parallel language humor preservation, and T5 for
context-aware, creative name transformations. These models were carefully selected and fine-tuned
using domain-specific corpora and humor-annotated data, resulting in significantly improved retention
of humor, even in low-resource language settings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset Description</title>
      <p>
        The JOKER Corpus [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] was used to support the three sub-tasks in the Joker CLEF 2025 challenge.
Each task was backed by a uniquely structured dataset designed to target specific aspects of humor
understanding in multilingual NLP. These datasets difered in language coverage, annotation granularity,
and humor types, thereby shaping the preprocessing strategies and model design approaches.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Humour-aware Information Retrieval</title>
        <p>This task used a JSON-formatted dataset comprising 231 queries and 352 English documents, annotated
with query_id, query_text, doc_id, doc_text, humor_label, relevance_score. Documents
carried a binary humor label and a graded relevance score from 0 to 3. The challenge lay in detecting
nuanced humor—ranging from overt jokes to subtle sarcasm—while distinguishing humorous documents
from non-humorous but contextually relevant ones. This required deeper semantic matching beyond
simple lexical overlaps.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Wordplay Translation</title>
        <p>The goal was to translate humor, particularly wordplay, between English and French. The JSON dataset
consisted of 709 annotated sentence pairs, with metadata including source_text, target_text,
language_pair, humor_type, pun_span, humor_score. Each entry was richly labeled, identifying
the type of humor and the exact span of the pun. The main dificulty stemmed from the lack of
one-toone cultural and linguistic equivalents between languages, pushing the system to produce culturally
sensitive and creatively translated outputs.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Onomastic Wordplay Translation</title>
        <p>This dataset focused on humor embedded in names and included 3,049 examples in JSON format
with fields such as original_name, source_context, target_language, translated_name,
pun_type, phonetic_mapping, cultural_equivalence. Targeting English-to-French
translations, this dataset captured jokes like “Justin Time” becoming “Jean Juste,” requiring preservation of
humor, phonetics, and cultural nuance. Additional annotations like NER tags and reuse labels were
used to aid modeling.</p>
        <p>To streamline processing, all datasets were standardized for downstream tasks. Preprocessing included
custom IOB tagging and tokenizer adaptation (e.g., for T5 and MarianMT) to maintain structural and
linguistic fidelity. Quality assurance was ensured via inter-annotator agreement checks and
heuristicbased validations, enabling smooth integration into PyTorch training workflows across all tasks.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>A modular system architecture was developed to meet the linguistic and cognitive challenges of
humoraware tasks across the three subtasks. Each pipeline was specifically structured to align with the
input-output requirements and complexity of its corresponding task. The model workflows
incorporated tailored training strategies, optimization techniques, and novel components for efective humor
understanding and translation. The entire process is illustrated in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Semantic Retrieval via SBERT Embeddings</title>
        <p>
          For the Humour-aware Information Retrieval subtask, documents and queries were semantically encoded
using the pre-trained multilingual model distiluse-base-multilingual-cased-v2 from Sentence-BERT
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This approach facilitated cross-lingual embedding of input texts into a shared semantic space. A
ifne-tuning step optimized the cosine similarity loss between relevant query-document pairs, efectively
enhancing the model’s ability to discriminate between humorous and non-humorous content. To further
refine decision boundaries, hard negative mining was employed by incorporating semantically close yet
non-humorous documents during training. The resulting high-dimensional embeddings were indexed
using the Facebook AI Similarity Search (FAISS) library to enable fast and scalable similarity-based
document retrieval during inference.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Neural Machine Translation with Humor Preservation</title>
        <p>
          For the Wordplay Translation task, the MarianMT model [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] was adapted to better handle bilingual
humorous content, particularly puns and idioms. The model was fine-tuned on parallel corpora
annotated for humor using IOB-style labels to mark humorous segments, thereby directing the model’s
attention to critical wordplay regions. Custom attention weighting mechanisms prioritized these
segments during training. Additionally, back-translation was used for data augmentation, improving
the robustness and generalizability of the model. A dual-objective loss function was introduced,
combining BLEU-based translation quality metrics with a humor preservation score derived from a
rule-based pun identification module.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Generative Translation of Onomastic Wordplay with T5</title>
        <p>
          The Onomastic Wordplay Translation task utilized the T5-base model[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], guided by prompt
engineering strategies such as: "Translate this name with humor preserved". Input preparation involved a
hybrid dataset construction process that merged named entity recognition outputs with curated lists
of humorous names and pun constructs. During inference, beam search was configured to prioritize
phonetic alignment between source and target names, aiding the preservation of punning efects. The
outputs were further refined using a humor-aware scoring mechanism that reranked translations based
on creativity, phonetic fidelity, and cultural relevance.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>We evaluated each task’s model performance using both standard NLP metrics and task-specific human
evaluation scores. The results afirm our models’ ability to capture and retain humor-related elements
across retrieval and translation tasks. Below is a detailed breakdown:</p>
      <sec id="sec-4-1">
        <title>4.1. Task 1: Humor-aware Information Retrieval</title>
        <p>To assess the performance of our humor-aware information retrieval system, we utilized Precision
and Mean Average Precision (MAP) as the main evaluation metrics. The Sentence-BERT (SBERT)
model achieved a MAP score of 0.1507, demonstrating its efectiveness in capturing nuanced semantic
relationships in humor-centric queries. Compared to conventional models like TF-IDF and BM25,
SBERT consistently outperformed them in identifying and ranking humor-relevant documents. In
addition to the quantitative analysis, we carried out a human evaluation where participants reviewed
the top 10 retrieved results. Feedback indicated that SBERT’s outputs were perceived to be around
30percentage more humorous and contextually suitable than those from BM25. This reinforces SBERT’s
capability to align closely with human interpretations of humor. Furthermore, as illustrated in Figure 2,
the distribution of similarity scores reveals a concentration near zero, suggesting that only a limited
number of document-query pairs exhibit high relevance. This underscores the inherent challenge in
retrieving humor-aligned content in Task 1.</p>
        <p>The SBERT model used in Task 1 efectively captured semantic relationships relevant to humor,
achieving a MAP score of 0.1507. It outperformed traditional retrieval approaches by producing more
contextually relevant and amusing results, as supported by precision scores at multiple cutofs.
Explanation of Task 1 Metrics:
• MAP (Mean Average Precision): Measures how well the top-ranked documents match the
relevant ones across all queries. A score of 0.1507 indicates moderately good ranking quality.
• GM-MAP (Geometric Mean MAP): Penalizes poor-performing queries more severely. The low
score (5.52%) reflects dificulty in some humor-based queries.
• R-Precision: Precision at the number of relevant documents for a query. At 19.44%, it shows the
accuracy over full recall.
• Reciprocal Rank: Measures how early the first relevant document appears. A high score of
56.94% suggests most queries had a relevant result early.
• P@k (Precision@k): The proportion of relevant documents in the top-k results. For example,</p>
        <p>P@5 = 28.70% means about 1.4 relevant docs appear in top 5.
• NDCG@5 (Normalized Discounted Cumulative Gain): Evaluates ranked retrieval by
considering the position of relevant documents. A score of 33.46% indicates a decent alignment with
relevance ordering.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Oficial Task 1 Results (English Test Set) Our system Cryptix_SBERT was oficially evaluated</title>
        <p>on the English test set for Task 1 by the JOKER CLEF 2025 organizers. The system retrieved 207,000
documents across 207 queries, with a total of 5,995 relevant documents. It achieved a Mean Average
Precision (MAP) of 0.1507 and a reciprocal rank of 0.5693. Precision at diferent cutofs and the NDCG@5
score are summarized in Table 2.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Task 1 Results (English Test Set - Without Duplicates) The run Cryptix_SBERT was also</title>
        <p>evaluated on the deduplicated English test set for Task 1. The system retrieved 207,000 documents with
1,914 relevant documents retrieved. Performance was consistent with the full test set, achieving a MAP
of 15.07%, R-Precision of 19.44%, and a reciprocal rank of 56.94%. Detailed evaluation metrics are shown
in Table 3.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.2. Task 2: Wordplay Translation</title>
        <p>For Task 2, we evaluated the performance of our wordplay translation system using BLEU score, Humor
Retention Rate (HRR), and qualitative feedback from human annotators. The MarianMT model achieved
a BLEU score of 39.37, reflecting strong fluency in translation. However, since humor is context-sensitive
and culturally nuanced, BLEU alone was insuficient. HRR, validated using bilingual annotators on a
sample of 500 translations, provided a more reliable measure of humor preservation. Despite its strong
performance, MarianMT occasionally faltered on nested puns and culturally embedded jokes. These
issues were addressed through IOB tagging and refinements in attention mechanisms. As shown in
Figure 3, most translated English sentences had word counts concentrated between 10 and 20, indicating
manageable sentence complexity. A summary of model performance metrics is provided in Table 4
Explanation of Task 2 BLEU Metrics:
• BLEU Score: Evaluates n-gram overlap between system and reference translations. A score of
39.50 reflects strong fluency.
• BLEU Precision (1–4): Measures the percentage of 1- to 4-word sequences that match. Higher
values mean better phrase-level alignment.
• Brevity Penalty (BP): Rewards length closeness to reference. A score of 1 indicates perfect
length match.
• Length Ratio: Ratio of system output length to reference. A value of 1.007 confirms the
translations were neither too short nor too long.</p>
        <p>Task 2 Results (Test Set) For Task 2, our system Cryptix was evaluated on the oficial test set
released by the JOKER CLEF 2025 organizers. The model achieved a BLEU score of 39.50, indicating
high fluency and alignment with reference translations. BLEU precision scores across n-grams showed
consistent retention of meaning, with perfect brevity penalty (BP = 1), suggesting a strong length match
between system and reference outputs. Detailed BLEU components are summarized in Table 5.
Task 2 Results (Test Set – BERTScore) In addition to BLEU evaluation, our submission Cryptix
was assessed using BERTScore to better capture the semantic similarity of the translations. The model
achieved a precision of 87.34%, recall of 86.95%, and an F1 score of 87.12%, reflecting a high degree
of semantic alignment between the generated and reference texts. These results are summarized in
Table 6.
Explanation of Task 2 BERTScore Metrics:
• Precision: Measures how much of the generated content semantically overlaps with the reference.</p>
        <p>87.34% indicates strong alignment.
• Recall: Captures how much of the reference meaning is present in the output. 86.95% shows
broad coverage.
• F1 Score: Harmonic mean of precision and recall. An 87.12% F1 indicates well-balanced semantic
accuracy.</p>
        <p>Task 2 Results (Test Set – Updated with Pun Location Evaluation) Following the final evaluation
release by the JOKER CLEF 2025 organizers, our system Cryptix was assessed on an updated test
set containing 1,682 instances. A new metric — pun location accuracy — was introduced to measure
whether the translated text retained or reflected the position of the pun from the source. Our model
successfully aligned puns in 113 cases, achieving a pun location match rate of 6.72%. Additionally, slight
improvements were observed in previously reported BLEU and BERTScore values due to corrections in
the reference set. Table 7 summarizes the updated results.
Explanation of Updated Task 2 Metrics:
• Pun Location Accuracy: Percentage of translations where the pun appeared in the same relative
position as the source sentence. A score of 6.72% shows that this aspect remains a challenging
area for machine translation systems.
• BLEU / BERTScore (Updated): Slight improvements were recorded due to the removal of some
faulty references. These metrics now better reflect actual translation quality.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.3. Task 3: Onomastic Wordplay Translation</title>
        <p>
          For Task 3, we assessed the efectiveness of our onomastic wordplay translation system using BERTScore
and Human Pun Recognition (HPR). The T5 model [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] achieved a BERTScore of 0.1419, reflecting its
ability to preserve semantic similarity during name translation. Thanks to its generative capacity, T5
was able to produce culturally adaptive and humorous variants of names, often aligning with local
linguistic patterns. To validate this, human judges were asked to determine whether the translated
names retained their intended pun or humor in context. The results confirmed that T5 captured a
substantial portion of the intended humor. As illustrated in Figure 4, most English inputs in this task
were very short—typically one or two words—emphasizing the dificulty of preserving humor within
minimal lexical content.
        </p>
        <p>Task 3 involved translating names containing puns, requiring creative language generation. The T5
model produced culturally meaningful outputs with a BERTScore of 0.1419 and was positively rated for
pun retention by 70% of human judges.
Task 3 Results (Test Set ) The final evaluation for Task 3 included both automatic and
manual assessments across 204 English source instances containing onomastic wordplay. Our system
Cryptix_task_3_flanT5 achieved an exact match score of 14.49% after normalization. Additionally,
38.15% of the outputs included a direct copy of the English source, and 13.43% of the translations were
judged as humor-preserving by human annotators. These results reflect the dificulty of cultural pun
adaptation and are summarized in Table 9.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The Joker Clef 2025 tasks each posed distinct challenges related to humor comprehension, cultural
nuance, and linguistic inventiveness. To address these, we employed SBERT, MarianMT, and T5 models,
ifne-tuned specifically for their respective objectives. This approach allowed us to develop systems that
not only achieved strong quantitative performance but also preserved the humor and contextual intent
of the inputs. The results demonstrate the capability of modern transformer-based architectures to
manage culturally sensitive and semantically complex NLP tasks. Moreover, their ability to generalize
across varied inputs makes them robust baselines for advancing humor-aware language systems.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI Use</title>
      <p>During the preparation of this work, the authors utilized generative AI tools, namely ChatGPT and
Grammarly, exclusively for grammar correction and language enhancement. All scientific content,
analyses, interpretations, and conclusions were independently developed by the authors, who assume
full responsibility for the originality and integrity of this publication, in accordance with the CEUR-WS
policy on the use of generative AI technologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Clef 2025 joker lab: Humour in the machine</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>389</fpage>
          -
          <lpage>397</lpage>
          . doi:https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -88720-8_
          <fpage>59</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Toraman</surname>
          </string-name>
          ,
          <article-title>Intent classification based on deep learning language model in turkish dialog systems</article-title>
          ,
          <source>in: 2021 29th Signal Processing and Communications Applications Conference (SIU)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Beddiar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Jahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oussalah</surname>
          </string-name>
          ,
          <article-title>Data expansion using back translation and paraphrasing for hate speech detection</article-title>
          ,
          <source>Online Social Networks and Media</source>
          <volume>24</volume>
          (
          <year>2021</year>
          )
          <fpage>100153</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Vasantharajan</surname>
          </string-name>
          , U. Thayasivam,
          <article-title>Towards ofensive language identification for tamil code-mixed youtube comments and posts</article-title>
          ,
          <source>SN Computer Science</source>
          <volume>3</volume>
          (
          <year>2022</year>
          )
          <fpage>94</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          , A.-G. Bosser,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2025 joker lab: Humour in machine</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Cham, Switzerland,
          <year>2025</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rufino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Blin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ainouz</surname>
          </string-name>
          , G. Gasso,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hérault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Meriaudeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Canu</surname>
          </string-name>
          ,
          <article-title>Physically-admissible polarimetric data augmentation for road-scene analysis</article-title>
          ,
          <source>Computer Vision and Image Understanding</source>
          <volume>222</volume>
          (
          <year>2022</year>
          )
          <fpage>103495</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>M. M. Hossain</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Dong</surname>
          </string-name>
          ,
          <article-title>A novel approach to multilingual machine translation using hybrid deep learning</article-title>
          , in: Second International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT
          <year>2023</year>
          ), volume
          <volume>12642</volume>
          ,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>662</fpage>
          -
          <lpage>670</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Humor detection system for muse 2023: contextual modeling, pesudo labelling, and post-smoothing</article-title>
          ,
          <source>in: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , I. Ameer,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muzamil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jalal</surname>
          </string-name>
          , I. Batyrshin, G. Sidorov,
          <article-title>Multilingual hope speech detection from tweets using transfer learning models</article-title>
          ,
          <source>Scientific reports 15</source>
          (
          <year>2025</year>
          )
          <fpage>9005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>The joker corpus: English-french parallel data for multilingual wordplay recognition</article-title>
          ,
          <source>in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2796</fpage>
          -
          <lpage>2806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . URL: https://aclanthology.org/D19-1410/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1410.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rohit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gandheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Sannala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. B.</given-names>
            <surname>Pati</surname>
          </string-name>
          ,
          <article-title>Comparative study on synthetic and natural error analysis with bart &amp; marianmt</article-title>
          , in: 2024
          <source>IEEE 9th International Conference for Convergence in Technology (I2CT)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Hernandez</given-names>
            <surname>Abrego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , Sentence-t5:
          <article-title>Scalable sentence encoders from pre-trained text-to-text models</article-title>
          , in: S. Muresan,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>1864</fpage>
          -
          <lpage>1874</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-acl.
          <volume>146</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .findings-acl.
          <volume>146</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>