<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparative Evaluation of Humour Translation from English to Spanish: A Study with BLOOM and Googletrans</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Popova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Cadiz</institution>
          ,
          <addr-line>9 Paseo Carlos III St., Cadiz, 11003</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>The problem of accurately translating wordplay in automatic humour analysis remains challenging, as highlighted by the CLEF 2024 JOKER Track. This problem is particularly interesting because humour is deeply cultural and context-dependent, making it dificult for language models to handle. Our study compares last year's and this year's results to determine if there have been improvements in the automatic translation of wordplay using BLOOM and Googletrans. The findings indicate that while numerical metrics show high precision, recall, and F1 scores, the actual translation and conveyance of jokes' meanings still rely heavily on coincidence, underscoring the need for further training and enhancement of language models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Translation</kwd>
        <kwd>pun translation</kwd>
        <kwd>automatic translation</kwd>
        <kwd>CLEF2024</kwd>
        <kwd>JOKER</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Setup</title>
      <p>In this section we will give a brief description of the data provided to perform task 3, the methods used
to solve it and the evaluation methods.</p>
      <sec id="sec-2-1">
        <title>2.1. Data description</title>
        <p>The task will provide an updated test set of English punning jokes, for French language the training
data is also available. However, considering that the training data for Spanish is not needed for this
experiment, we only use the test data. The test data is provided in JSON format.</p>
        <p>test data (jokes) in English
entries
5727</p>
        <p>As we can see, we were provided with 5,727 jokes in English to translate. The data itself looked as
follows:
text_en</p>
        <p>Each entry in the table contains an identifying number for each joke, and a one- or two-sentence
joke that includes a pun.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Method description</title>
        <p>As mentioned in the introduction, to carry out task 3, we used BLOOM with two diferent prompts and
Google Translate.
2.2.1. BLOOM</p>
        <p>
          • Prompt 1:
BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) is a
transformerbased language model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Because the number of tokens is limited in BLOOM, each run was performed
with only 100 puns.
        </p>
        <p>The prompts used are provided below.</p>
        <p>"Original: Diabetics should not be allowed to have sweet dreams.\n\
Translation: Los diabéticos no deberían tener dulces sueños.\n\
\n\
Original: I’m going to the guillotine at dawn and my wife has already collected my severance
pay.\n\
Translation: Al amanecer me van a pasar por la guillotina y mi mujer ya ha firmado la
separación.\n\
\n\
Original: After 5 years with the same chiropractor, I moved and had to change doctors. It
was quite an adjustment.\n\
Translation: Me mudé y tuve que buscar otros médicos después de estar cinco años con el
mismo quiropráctico. Fue un mero ajuste.\n\
\n\
Original: A scientist doing a large experiment with liquid chemicals was trying to solve a
problem when he fell in and became part of the solution.\n\
Translation: Un científico que hacía un gran experimento con productos químicos líquidos
estaba intentando solucionar un problema cuando cayó en que él se convertiría en parte
de la solución.\n\
\n\
Original: Old electricians never die, they just keep plugging away.\n\</p>
        <p>Translation:"
• Prompt 2:
"Request: Translate from English to Spanish: Diabetics should not be allowed to have sweet
dreams.\n\
Answer: Los diabéticos no deberían permitirse soñar dulce (expresión figurativa para decir
que no deberían permitirse deseos excesivos o indulgencias).\n\
\n\
Request: I’m going to the guillotine at dawn and my wife has already collected my severance
pay.\n\
Answer: Me voy a la guillotina al amanecer y mi esposa ya ha recogido mi indemnización (
expresión figurativa para decir que me estoy preparando para un evento desafortunado y
mi esposa ya ha preparado los asuntos financieros para el futuro sin mí).\n\
\n\
Request: After 5 years with the same chiropractor, I moved and had to change doctors. It was
quite an adjustment.\n\</p>
        <p>Answer:"</p>
        <p>It is curious that when we created the second prompt, BLOOM itself, along with the Spanish translation
in brackets, provided an explanation of the wordplay:
• ’expresión figurativa para decir que no deberían permitirse deseos excesivos o indulgencias’ =
’figurative expression to say that excessive desires or indulgences should not be allowed’
• ’expresión figurativa para decir que me estoy preparando para un evento desafortunado y mi
esposa ya ha preparado los asuntos financieros para el futuro sin mí’ = ’figurative expression to
say that I am preparing for an unfortunate event and my wife has already prepared the financial
afairs for the future without me’</p>
        <p>
          Thus, it appears that BLOOM has attempted to apply a sort of translator’s explanatory note.
2.2.2. Googletrans
Googletrans is a free and unlimited python library that implements Google Translate API [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Since we
use Google Colab for code execution, it was absolutely impossible to process all of them at once with
over 5000 examples. Therefore, we were compelled to divide the test data into four parts.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Evaluation methods description</title>
        <p>
          The task organizers used two methods for evaluating translations: BLEU and BERT Score [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          BLEU (BiLingual Evaluation Understudy) measures the lexical similarity between a candidate
translation and a reference translation [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The organizers utilized the sacreBLEU implementation9 with the
default tokenizer 13a, which emulates the mteval-v13a script from Moses [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Their report includes the
BLEU score (harmonic mean) and BLEU precisions for n-grams.
        </p>
        <p>
          BERT Score, obtained from the python bert-score package10 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], presents mean values of precision,
recall, and F1 scores.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>We divided the results section into two parts: numerical results and linguistic results. In the first
subsection, we examine the evaluation metrics provided by the organizers, while in the second, we
discuss some examples of translations produced using each method.</p>
      <sec id="sec-3-1">
        <title>3.1. Numerical results</title>
        <p>Above all, we retrieved the table of results from CLEF 2023 for task 3 with BLOOM and Googletrans.</p>
        <p>The results for BLOOM using prompt 1 remain the same for both BLEU and BERT score in CLEF 2023
and CLEF 2024. Googletrans shows superior metrics, but this advantage should be viewed with caution
due to the larger number of evaluations. BLOOM_2 performs slightly better than BLOOM_1 in the few
evaluations conducted, but both methods need more evaluations to provide a more reliable analysis.
For a fairer and more accurate comparison, the number of evaluated examples should be equalized for
all methods. With a larger amount of data, we might see a reduction in the performance gap between
Googletrans and the BLOOM methods. Similarly, we can observe that there has been a slight change
in the metrics performed by BLEU for Googletrans when comparing CLEF 2023 and CLEF 2024, this
change should be analyzed qualitatively by comparing the translations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Linguistic results</title>
        <p>Next, we will analyze some examples of the translations produced by BLOOM and Googletrans in 2023
and 2024.</p>
        <p>If we compare the translations produced by Googletrans in 2023 and 2024, we can observe that the
vast majority of jokes are translated identically. In Example 1 of Table 7, the only diference is in the use
of capitalization. We can infer that the diferences in BLEU metrics are due to the fact that this year’s
test data does not include all the same jokes as last year’s.</p>
        <p>Regarding the translations done by BLOOM, the outputs from 2023 and this year using prompt 1 are
identical. In some cases, the joke is translated understandably, while in others, it is not. Concerning the
translations with prompt 2, we observe that the target texts are more literal than those from prompt
1. While BLOOM_1 attempts to adapt the vocabulary to make it more natural, BLOOM_2 performs a
highly literal translation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>The numerical results, which are quite high in terms of precision, recall, and F1, suggest that the
translations of the jokes are well-executed. However, when we analyze the results from a translational
and linguistic perspective, we observe that the successful translation and conveyance of a joke’s meaning
are more coincidental. Although jokes are a cultural matter and usually require adaptation in translation,
some jokes can be understood when translated literally, which explains some of the good translations
in this task. By comparing the numerical and linguistic results, we can conclude that the more literal
the translation, the higher the metrics. Therefore, we can conclude that language models still require
significant training and improvement to accurately translate jokes in a "conscious" manner.</p>
      <p>Future work could involve analyzing other language models, such as GPT, as it continues to improve
and might ofer diferent results. Additionally, exploring the translation of other types of jokes and
wordplay, and attempting to train models with high-quality translations of jokes could be beneficial.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgments</title>
      <p>I would like to thank the organizers of CLEF 2024 in general and the organizers of JOKER in particular
for providing us with the opportunity to continually improve our research, learn, and develop. Above
all, I would like to express my gratitude to Professor Liana Ermakova for her invaluable advice and
great support throughout the execution of this task.</p>
      <p>This project has received a government grant managed by the National Research Agency under the
program “Investissements d’avenir” integrated into France 2030, with the Reference ANR-19-GURE-0001.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Bosser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. M. P.</given-names>
            <surname>Preciado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jatowt</surname>
          </string-name>
          ,
          <article-title>Overview of clef 2024 joker track on automatic humor analysis</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), LNCS, Springer-Verlag,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Popova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dadic</surname>
          </string-name>
          ,
          <article-title>Does ai have a sense of humor? clef 2023 joker tasks 1, 2 and 3: Using bloom, gpt, simplet5, and more for pun detection, location, interpretation and translation</article-title>
          ,
          <source>in: CLEF (Working Notes)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1888</fpage>
          -
          <lpage>1908</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Workshop</surname>
          </string-name>
          ,
          <source>Bloom (revision 4ab0472)</source>
          ,
          <year>2022</year>
          . URL: https://huggingface.co/bigscience/bloom. doi:
          <volume>10</volume>
          . 57967/hf/0003.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[4] Googletrans 3.0</source>
          .0, ???? URL: https://pypi.org/project/googletrans/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Papineni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roukos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ward</surname>
          </string-name>
          , W. Zhu,
          <article-title>Bleu: A method for automatic evaluation of machine translation</article-title>
          ,
          <source>in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2002</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>318</lpage>
          . URL: https://www.aclweb.org/anthology/P02-1040. doi:
          <volume>10</volume>
          .3115/ 1073083.1073135.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Post</surname>
          </string-name>
          ,
          <article-title>A call for clarity in reporting bleu scores</article-title>
          ,
          <source>in: Proceedings of the Third Conference on Machine Translation: Research Papers</source>
          , Association for Computational Linguistics, Belgium, Brussels,
          <year>2018</year>
          , pp.
          <fpage>186</fpage>
          -
          <lpage>191</lpage>
          . URL: https://www.aclweb.org/anthology/W18-6319.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kishore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Artzi</surname>
          </string-name>
          , Bertscore:
          <article-title>Evaluating text generation with bert</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2020</year>
          . URL: https://openreview.net/ forum?id=SkeHuCVFDr.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>