Overview of the CLEF 2022 JOKER Task 2:
Translate Wordplay in Named Entities
Liana Ermakova1,2 , Tristan Miller3 , Julien Boccou1 , Albin Digue1 ,
Aurianne Damoy1 and Paul Campen1
1
    Université de Bretagne Occidentale, HCTI, 29200 Brest, France
2
    Maison des sciences de l’homme en Bretagne, Rennes, France
3
    Austrian Research Institute for Artificial Intelligence, Vienna, Austria


                                         Abstract
                                         Onomastic wordplay has been widely used as a rhetorical device by novelists, poets, and
                                         playwrights, from character names in Shakespeare and other classic literature to named
                                         entities in Pokémon, Harry Potter, Asterix, and video games. The translation of such wordplay
                                         is problematic both for humans and algorithms due to its ambiguity and unorthodox morpho-
                                         logy. In this paper, we present an overview of Pilot Task 2 of the JOKER@CLEF 2022 track,
                                         where participants had to translate wordplay in named entities from English into French.
                                         For this, we constructed a parallel corpus wordplay in named entities from movies, video
                                         games, advertising slogans, literature, etc. Five teams participated in the task. The methods
                                         employed by participants were based on the state-of-the-art transformer models, which have
                                         the advantage of subword tokenisation. The participants’ models were pre-trained on large
                                         corpora and fine-tuned on the JOKER training set. We observed that in many cases the
                                         models provided the exact official translations, suggesting that they were pre-trained on
                                         the corpus containing the source texts used in the JOKER corpus. Those translations that
                                         differed from the official ones only rarely contained wordplay.

                                         Keywords
                                         wordplay, computation humour, named entities, neologisms, machine translation, deep
                                         learning, transformers


1. Introduction
Wordplay is often used for its attention-getting or mnemonic qualities in headlines,
toponyms, company names, and advertising. Onomastic (i.e., name-related) wordplay
has been widely used as a rhetorical device by novelists, poets and playwrights. It is
widespread in classic literature [1], such as in Shakespeare’s characters’ names [2],
but also in names found in modern-day works such as Pokémon, Harry Potter, Asterix,
and video games. Proper nouns with an extra semantic load are used as a meaningful
element in literary texts and can be considered as wordplay [3]. The translation of


CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ liana.ermakova@univ-brest.fr (L. Ermakova); tristan.miller@ofai.at (T. Miller)
 https://www.joker-project.com/ (L. Ermakova); https://logological.org/ (T. Miller)
 0000-0002-7598-7474 (L. Ermakova); 0000-0002-0749-1100 (T. Miller)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International
                                       (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)
    CEUR
                  http://ceur-ws.org
    Workshop      ISSN 1613-0073
    Proceedings
such literary names is problematic, raising the questions of whether the transposition
of such names into a given target language is technically possible and, if so, what
method might be appropriate for producing them [3].
   Generally speaking, these playful designations in fictional universes can be viewed
as named entities. Named entities are objects, abstract or physical (such as a person,
location, organisation, product, etc.), that can be denoted with a proper name.
Traditional approaches to translating named entities include transliteration [4, 5]
or keeping them unchanged in the target text. However, these approaches do not
typically preserve wordplay (or at least, not in a way which may be meaningful for
the target audience), even though such wordplay can be crucial for understanding
the pragmatics of the text.
   Meaningful proper names in fictional universes are often neologisms. Neologisms –
that is, newly coined words – are among the most common forms of linguistic cre-
ativity, and are a particular focus of the JOKER project. Subcultures and rapidly
advancing technical fields are common sources of neologisms, some of which may
eventually come into regular or even widespread use (in processes known as lex-
icalisation, institutionalisation, and entrenchment ). But neologisms can also be
ephemeral, being invented for use in a specific discourse and not intended for gen-
eral adoption. Such neologisms, known as nonce words or occasionalisms, are
a recurrent feature of literature, advertisements, and journalism, where they are
often used for playful or humorous purposes [6, 7, 8]. Due to their highly idiosyn-
cratic nature, neologisms – particularly humorous ones – are challenging for both
humans and machines to translate [7]. Machine translation systems in particular
generally fail to recognise the deliberate ambiguity or unorthodox morphology of
neologisms, leaving such terms untranslated or else translating them in ways that
lose the humorous aspect [9].
   The goal of the JOKER@CLEF 2022 workshop was to bring together translators,
linguists, and computer scientists in order to bring us a step closer to the automation
of multilingual wordplay analysis and translation. In this paper, we present the
workshop’s Pilot Task 2, where participants had to translate wordplay in named
entities from English into French. Details on other related shared tasks on wordplay
translation are covered in our workshop overview paper [10].


2. Related work
Fortunately, many of the basic processes through which neologisms are formed
and employed are well understood in linguistics [11, 8]. These include semantic
shifting (imbuing an existing word with a new meaning), morphological derivation
(adding a prefix or suffix to an existing word), compounding (combining whole stems),
blending (combining fragments of words), clipping (truncating an existing word),
analogy (formation based on a prototype word or schema), and creation ex nihilo
(extramorphological invention of entirely new roots).
  Recent linguistic scholarship has sought to deepen this understanding: there have
been studies, for example, on the discourse cues used to signal nonce words [12];
on the linguistic, extra-linguistic, and contextual knowledge required for interpret-
ing neologisms [13]; on the sublexical features preferentially modified in nonce
formations [14]; and on the specific morphological and analogical word formation
processes strongly associated with humour [14, 8]. Word formation processes are
also known in the field of translation studies, which has documented strategies for
analysing and translating neologisms [7] and made case studies of translated neolo-
gisms as a source of humour [15]. Despite this, working translators evince a marked
reluctance to neologise, overwhelmingly opting to replace source-language coinages
with lexically conventional options, or else eliminating them altogether [16]. The
reasons for this remain unexplored; on the one hand, this could be an unconscious
tendency of translators to normalise language, but it could also be down to a lack of
awareness of finer-grained neological processes, or to a lack of the time and effort
required to creatively apply them.
  A common approach to lightening the translator’s workload is to employ language
technology, often in the form of machine translation. However, nearly all MT systems
rely on lexical resources such as dictionaries and parallel text corpora which cannot
be expected to contain completely novel words; this essentially makes translating
recent neologisms out of scope for conventional machine translation. Surprisingly,
this problem has been directly addressed only rarely in the MT literature, and even
then only for highly restricted classes of morphologically derived neologisms [17, 18].
  However, there does exist some research involving natural language pro-
cessing (NLP) and neologisms, though not necessarily carried out with MT in mind,
which we hypothesise could nonetheless be adapted for or inform the design of a
future machine or machine-assisted translation system. This work includes meth-
ods for automatically identifying the source words of blends [19], for generating
orthographically plausible cognates in a target language [20], and for predicting the
location of neologisms in a word embedding space [21]. There is even a small meas-
ure of tangentially relevant work in computational humour: [22] attempt to detect
“marketing blunders”, where a serious neologism in one language inadvertently re-
sembles a humorous or otherwise inappropriate term in another language; [23] show
that ex nihilo neologisms can induce consistent semantic intuitions, including hu-
mour; [24] present a technique for generating humorous neologisms conditioned on
user-specified properties; and our own past work [9] focuses on computer-mediated
translation of puns, which sometimes employs blending and other mechanisms also
used in neologism.
  Some prior work does tackle the named entity recognition question in fiction [25,
26] or optimising it for machine translation [27]. The Named Entity Transliteration
Shared Task was held at the 7th Named Entities Workshop in 2018 [4, 5]. In [28],
a method is proposed that fuses bilingual entity class named entity translation
based on a chunk symmetry strategy and a machine learning–based English–Chinese
transliteration model. A combination of standard entity masking techniques and a
semantic equivalent placeholder was proposed in [29]. Conventional neural machine
translation models struggle to translate words with multiple meanings because of
the high ambiguity as well as compound words due to their morphology [30]. To
deal with these problems, named entity tags with a chunk-level LSTM layer over a
word-level LSTM layer were proposed [30]. However, this approach is not suitable for
humourous neologism as it does not aim to reproduce wordplay in a target language.
  Transfer learning became a widely applied technique in many NLP tasks. The term
describes models that are first pre-trained on a large corpus and then fine-tuned on
a downstream task. The state-of-the-art large pre-trained models, such as T5 [31] or
GPT-3 [32], convert all text-based language problems into a text-to-text format. The
initial idea of transformers was intended to eliminate recurrence and convolutions in
favour of attention mechanisms only [33].
  The state-of-the-art transformer models make use subword tokenisers, such as
Byte-Pair Encoding (BPE) [34] and WordPiece [35]. BPE relies on a pre-tokeniser that
splits the training data into words and is used in the GPT-2 [36] Roberta [37] models.
WordPiece is the subword tokenisation algorithm, similar to BPE, used in BERT [33],
DistilBERT [38], and Electra [39]. Although these methods are comparatively shallow,
they have shown promise for the related use case of languages with large vocabular-
ies and many rare words [40, 34]. However, they too are not specifically equipped
for wordplay, ignoring ambiguity and humorous intention.


3. Data
We constructed a parallel corpus of wordplay in named entities in English and French.
We collected distinct 1 398 named entities in English containing wordplay from video
games, advertising slogans, literature, and other sources [41, 10] along with 1 450
translations into French. The vast majority of the translations were the official,
published ones. Some alternative translations were provided by Master’s students
in translation, all native speakers of French. The statistics on wordplay in named
entities is given in Table 1. The vast majority of wordplay are portmanteau words –
i.e., words formed by merging the sounds and meanings of two different words. The
table employs a traditional classification of wordplay as, even though its usage is
problematic,1 it is better known and thus may give a clearer idea of the data we have.
   1
       For example, the category neologism tends to overlap with other categories.


Table 1
Statistics of wordplay in named entities

                             Category                   English   French
                             portmanteau                    909       984
                             pun/homophone                  298       291
                             no manipulation                104       108
                             neologism                       53        42
                             assonance/alliteration          24        17
                             anagram                          8         6
3.1. Training data
The training dataset contains 1 161 instances of wordplay in the form of translated
pairs. The data was provided to participants as a JSON file (or a CSV file for manual
runs) with fields denoting the instance’s unique ID (id), the source text in English
(en), and a target text in French (fr). For example:
                               [
                                   {
                                       "id": "noun_1",
                                       "en": "Ambipom",
                                       "fr": "Capidextre"
                                   }
                               ]


3.2. Test data
We used a further 284 wordplay instances for the test data. The data format was
identical to that of the training data, except that the field for the target text was
omitted. Example:
                               [
                                   {
                                       "id": "noun_1185",
                                       "en": "Fungun",
                                   }
                               ]
   The expected output format was identical to that of the training data, but with the
addition of fields RUN_ID and MANUAL. The RUN_ID field value uniquely identifies a
given run and is formed of the team ID (as registered on the CLEF website) followed
by the task ID (in this pilot task, always task_2), followed by the run number. The
MANUAL field value can be either a 1 (indicating a manual translation run) or a 0
(indicating a machine translation run). Example:
                    [
                        {
                            "RUN_ID": "OFFICIAL_task_2_run1",
                            "MANUAL": 1,
                            "id": "noun_1",
                            "en": "Ambipom",
                            "fr": "Capidextre"
                        }
                    ]
4. Evaluation metrics
For the wordplay translation tasks (Tasks 2 and 3), there do not yet exist any
accepted metrics of translation quality [41, 10]. MT is traditionally measured with
the BLEU (Bilingual Evaluation Understudy) metric, which calculates vocabulary
overlap between the candidate translation and a reference translation [42]. However,
this metric is clearly inappropriate for single-term wordplay translation evaluation,
as overlap measures operate only on larger text spans and not on individual words,
the morphological analysis of which can be crucial for neologisms [41, 10].
  We hypothesised that the majority of proper nouns would not be translated auto-
matically. So we compared the target translation with source wordplay (metric not
translated ).
  As our dataset for Task 2 contains “official” translations of wordplay instances
coming from various published sources (e.g., Pokémon names), we also tried filtering
out these official translations (metric official ).
  We manually evaluated the non-official translations according the following met-
rics:

   • lexical field preservation: A value of true is assigned to translations that
     preserve the lexical field of the source wordplay (i.e., the translation is close to
     a literal one).
   • sense preservation: A value of true is assigned to translations that preserve
     the meaning of the source wordplay.
   • comprehensible terms: A value of true is assigned to translations that do not
     rely on specialised terminology.
   • wordplay form: A value of true is assigned to translations that employ (as
     opposed to omit) wordplay.


5. Methods used by the participants
Five teams participated in Pilot Task 2: FAST_MT [43], TEAM_JOKER [44],
Cecilia [45], Agnieszka, and Lea_T5 [46]. Four of these teams submitted a total
of four runs. Lea_T5 worked on the data of the Task 2 and submitted a paper but no
run for the test set. TEAM_JOKER submitted a run and wrote a blog post but without
an article in CLEF Working Notes Proceedings. Agnieszka submitted a run without a
paper, though the team did notify the JOKER organisers of the method they used.
   Three of the five teams (Cecilia, Agnieszka, and Lea_T5) used the SimpleT5 library2
for the Google T5 (Text-To-Text Transfer Transformer) model, which is based on the
transfer learning with a unified text-to-text transformer [31]. TEAM_JOKER also
fine-tuned Google’s T5 on the available examples for 18 epochs with 80% of the data
in the training set and 20% in the validation set. The only preprocessing they used
was to prefix each data point in English with “translate English to French:”. The
   2
       https://github.com/Shivanandroy/simpleT5
Table 2
Scores of participants’ runs for Pilot Task 2

                                         FAST_MT   TEAM_JOKER   Cecilia   Agnieszka
            total                           284          284      284          242
            not translated                    0            0        0            0
            official                        250          159      216          230
            non-official                     34          125       68           12
            lexical field preservation       16           13        5            0
            sense preservation               13           11        5            0
            comprehensible terms             26           59       16            2
            wordplay form                     3           12        3            1


teams that used the SimpleT5 library trained their models for a much lower number
of epochs, as after three epochs they observed overfitting.
  The team FAST_MT also applied transformers [43]. They mapped the task of
translating single terms containing a wordplay to the problem of question answering
on texts extracted from the open-source parallel corpus OPUS [47]. They generated
the context for each English–French noun pair by pulling from OPUS English–French
parallel sentence pairs that contain at least one English noun in the English version.
Then a transformer-based model [48] from Hugging Face3 was trained to predict the
location of the corresponding French translation in the related contexts given an
English noun as a query.


6. Results
Our initial guess was that the majority of proper nouns would not be translated
by machine translation. However, as our dataset contained officially translated
named entities (e.g., from Pokémon) that may have been discoverable by participants
and large pretrained models, all participants translated all wordplay instances.
The results from Table 2 suggest that the majority of translated named entities
were indeed the official translations. TEAM_JOKER [44], however, provided very
interesting results, with almost half being non-official translations. Among these,
twelve translations were judged as being wordplay. We can also see that among non-
official translations, less than 10% are successful in terms of preserving wordplay.
  As is evident from Table 3, the majority of non-official translations containing
wordplay are accidental, although we observe francisation of English terms. We
provide interpretations in the language of the corresponding wordplay. (For details of
the interpretation annotation, refer to our workshop overview paper [10].) Almost all
wordplay in this list are portemanteaux – i.e., words formed by merging the sounds
and meanings of two different words.


   3
       https://huggingface.co/
7. Conclusion
While wordplay in named entities has been widely used as a literary device, its
translation is problematic both for human and algorithms due to its high ambiguity
and unorthodox morphology. As a rule, standard named entities are transliterated
from one language into another, as in case of names of persons, or kept unchanged to
preserve trademarks. However, these approaches do not preserve wordplay, which
can be crucial for understanding the pragmatics of a creative text.
  In this paper, we presented an overview of Pilot Task 2 of the JOKER@CLEF 2022
track, where participants had to translate wordplay in named entities from English
into French. We constructed a parallel corpus of wordplay in named entities from
movies, video games, advertising slogans, literature, etc. Five teams participated
in the task. The methods employed by participants were based on state-of-the-art
transformer models, which exploit subword tokenisation. The participants’ models
were pre-trained on large text collections and fine-tuned on the JOKER training set.
We observed that, in many cases, the models provided the exact official translations.
This suggests that they were pre-trained on a corpus containing the same texts used
in the JOKER corpus. Where systems did suggest translations different from the
official ones, they rarely employed wordplay.


Acknowledgments
This work has been funded in part by the National Research Agency under the pro-
gram Investissements d’avenir (Reference ANR-19-GURE-0001) and by the Austrian
Science Fund under project M 2625-N31. JOKER is supported by La Maison des
sciences de l’homme en Bretagne.
  We thank Orlane Puchalski, Adrien Couaillet, and Ludivine Grégoire for their
participation in data collection, annotation and adjustment of annotation guidelines.
We also thank Elise Mathurin for co-supervising interns Orlane Puchalski, Adrien
Couaillet, Ludivine Grégoire as well as Alain Kerhervé who supported the project.
  We would like also thank other JOKER organisers: Anne-Gwenn Bosser, Claudine
Borg, Fabio Regattin, Gaelle Le Corre, Elise Mathurin, Silvia Araujo, Monika Bokiniec,
̇ g Mallia, Gordan Matas, Mohamed Saki, Benoît Jeanjean, Radia Hannachi, Danica
Goṙ
Škara, and other PC members: Grigori Sidorov, Victor Manuel Palma Preciado, and
Fabrice Antoine.


References
 [1] J. J. O’Hara, True names: Vergil and the Alexandrian tradition of etymological
     wordplay, University of Michigan Press, 2017.
 [2] R. C. Beshere, “What’s in a name?”: Theorizing an etymological dictionary of
     Shakespearean characters, The University of North Carolina at Greensboro,
     2009.
 [3] L. Manini, Meaningful literary names: Their forms and functions, and their
     translation, The Translator 2 (1996) 161–178.
 [4] N. Chen, R. E. Banchs, M. Zhang, X. Duan, H. Li, Report of NEWS 2018 named
     entity transliteration shared task, in: Proceedings of the Seventh Named Entities
     Workshop, Association for Computational Linguistics, Melbourne, Australia,
     2018, pp. 55–73. URL: https://aclanthology.org/W18-2409. doi:10.18653/v1/
     W18-2409.
 [5] N. Chen, X. Duan, M. Zhang, R. E. Banchs, H. Li, NEWS 2018 whitepaper,
     in: Proceedings of the Seventh Named Entities Workshop, Association for
     Computational Linguistics, Melbourne, Australia, 2018, pp. 47–54. URL: https:
     //aclanthology.org/W18-2408. doi:10.18653/v1/W18-2408.
 [6] D. D. Oaks, Structural Ambiguity in English: an Applied Grammatical Inventory,
     Bloomsbury, 2010.
 [7] B. J. Epstein, What Nonsense: Translating Neologisms, Peter Lang, 2012, pp.
     29–66. doi:10.3726/978-3-0353-0271-4.
 [8] E. Mattiello, Analogy in Word-formation: A Study of English Neologisms and
     Occasionalisms, De Gruyter Mouton, Berlin, Boston, 2017. URL: https://www.
     degruyter.com/view/title/529914. doi:10.1515/9783110551419.
 [9] T. Miller, The punster’s amanuensis: The proper place of humans and machines
     in the translation of wordplay, in: Proceedings of the Second Workshop on
     Human-Informed Translation and Interpreting Technology, 2019, pp. 57–64.
     doi:10.26615/issn.2683-0078.2019_007.
[10] L. Ermakova, T. Miller, F. Regattin, A.-G. Bosser, É. Mathurin, G. L. Corre,
     S. Araújo, J. Boccou, A. Digue, A. Damoy, B. Jeanjean, Overview of JOKER@CLEF
     2022: Automatic Wordplay and Humour Translation workshop, in: A. Barrón-
     Cedeño, G. Da San Martino, M. Degli Esposti, F. Sebastiani, C. Macdonald,
     G. Pasi, A. Hanbury, M. Potthast, G. Faggioli, N. Ferro (Eds.), Experimental
     IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the
     Thirteenth International Conference of the CLEF Association (CLEF 2022),
     volume 13390 of LNCS, 2022.
[11] P. Štekauer, R. Lieber (Eds.), Handbook of Word-formation, number 64 in Studies
     in Natural Language and Linguistic Theory, Springer, Dordrecht, the Nether-
     lands, 2005. doi:10.1007/1-4020-3596-9.
[12] G. Dal, F. Namer, Playful nonce-formations in French: Creativity and productivity,
     in: [49], 2018, pp. 203–228. doi:10.1515/9783110501933-205.
[13] T. Rabrenović, “You just Fredo-kiss me?” The (non-)lexicalizability of nonce
     word-formation, in: B. Čubrović (Ed.), BELLS90 Proceedings: International
     Conference to Mark the 90th Anniversary of the English Department, Fac-
     ulty of Philology, University of Belgrade, volume 1 of Belgrade English Lan-
     guage and Literature Studies, Faculty of Philology, University of Belgrade,
     Belgrade, 2020, pp. 153–164. URL: http://doi.fil.bg.ac.rs/pdf/eb_ser/bells90/
     2020-1/bells90-2020-1-ch9.pdf. doi:10.18485/bells90.2020.1.ch9.
[14] A. Braun, Approaching wordplay from the angle of phonology and phon-
     etics – examples from German, in: [49], 2018, pp. 173–202. doi:10.1515/
     9783110501933-175.
[15] W. Kolb, Translation as a source of humor: Jonathan Safran Foer’s Everything
     is Illuminated/Alles ist erleuchtet, in: K. Kaindl, K. Spitzl (Eds.), Transfiction:
     Research into the Realities of Translation Fiction, number 110 in Benjamins
     Translation Library, John Benjamins, 2014, pp. 299–314. doi:10.1075/btl.110.
     21kol.
[16] M. G. Prinzl, Death to neologisms: Domestication in the English retranslations of
     Thomas Mann’s Der Tod in Venedig, International Journal of Literary Linguistics
     5 (2016). doi:10.15462/ijll.v5i3.73.
[17] B. Cartoni, Lexical resources for automatic translation of constructed neo-
     logisms: the case study of relational adjectives, in: Proceedings of the 6th
     International Conference on Language Resources and Evaluation, European
     Language Resources Association (ELRA), 2008, pp. 976–979. URL: http://www.
     lrec-conf.org/proceedings/lrec2008/pdf/247_paper.pdf.
[18] B. Cartoni, Lexical morphology in machine translation: a feasibility study, in:
     Proceedings of the 12th Conference of the European Chapter of the Association
     for Computational Linguistics, Association for Computational Linguistics, 2009,
     pp. 130–138. URL: https://www.aclweb.org/anthology/E09-1016.
[19] P. Cook, S. Stevenson, Automatically identifying the source words of lexical
     blends in English, Computational Linguistics 36 (2010) 129–149. URL: https:
     //www.aclweb.org/anthology/J10-1005. doi:10.1162/coli.2010.36.1.36104.
[20] L. Beinborn, T. Zesch, I. Gurevych, Cognate production using character-based
     machine translation, in: Proceedings of the 6th International Joint Conference
     on Natural Language Processing, Asian Federation of Natural Language Pro-
     cessing, 2013, pp. 883–891. URL: https://www.aclweb.org/anthology/I13-1112.
[21] M. Ryskina, E. Rabinovich, T. Berg-Kirkpatrick, D. Mortensen, Y. Tsvetkov,
     Where new words are born: Distributional semantic analysis of neologisms and
     their semantic neighborhoods, in: Proceedings of the Society for Computation in
     Linguistics 2020, Association for Computational Linguistics, 2020, pp. 313–322.
     URL: https://www.aclweb.org/anthology/2020.scil-1.43.
[22] C. M. Meyer, J. Eckle-Kohler, I. Gurevych, Semi-automatic detection of cross-
     lingual marketing blunders based on pragmatic label propagation in Wiktionary,
     in: Proceedings of the 26th International Conference on Computational Linguist-
     ics, 2016, pp. 2071–2081. URL: https://www.aclweb.org/anthology/C16-1195.
[23] C. Westbury, C. Shaoul, G. Moroschan, M. Ramscar, Telling the world’s least
     funny jokes: on the quantification of humor as entropy, Journal of Memory
     and Language 86 (2016) 141–156. URL: http://www.sciencedirect.com/science/
     article/pii/S0749596X15001023. doi:10.1016/j.jml.2015.09.001.
[24] G. Özbal, C. Strapparava, A computational approach to the automation of creat-
     ive naming, in: Proceedings of the 50th Annual Meeting of the Association for
     Computational Linguistics, volume 1, Association for Computational Linguistics,
     2012, pp. 703–711. URL: https://www.aclweb.org/anthology/P12-1074.
[25] J. Brooke, A. Hammond, T. Baldwin, Bootstrapped text-level named entity
     recognition for literature, in: Proceedings of the 54th Annual Meeting of the
     Association for Computational Linguistics (Volume 2: Short Papers), Association
     for Computational Linguistics, Berlin, Germany, 2016, pp. 344–350. URL: https:
     //aclanthology.org/P16-2056. doi:10.18653/v1/P16-2056.
[26] K. van Dalen-Oskam, J. de Does, M. Marx, I. Sijaranamual, K. Depuydt, B. Verheij,
     V. Geirnaert, Named entity recognition and resolution for literary studies,
     Computational Linguistics in the Netherlands Journal 4 (2014) 121–136.
[27] N. K. Aulakh, Y. Kaur, Optimized name entity recognition of machine transla-
     tion, International Journal for Research in Applied Science and Engineering
     Technology (IJRASET) 2 (2014) 24–30.
[28] P. Li, M. Wang, J. Wang, Named entity translation method based on ma-
     chine translation lexicon, Neural Computing and Applications 33 (2021)
     3977–3985. URL: https://doi.org/10.1007/s00521-020-05509-y. doi:10.1007/
     s00521-020-05509-y.
[29] P. Mota, V. Cabarrão, E. Farah, Fast-paced improvements to named entity
     handling for neural machine translation, in: Proceedings of the 23rd Annual
     Conference of the European Association for Machine Translation, European
     Association for Machine Translation, Ghent, Belgium, 2022, pp. 141–149. URL:
     https://aclanthology.org/2022.eamt-1.17.
[30] A. Ugawa, A. Tamura, T. Ninomiya, H. Takamura, M. Okumura, Neural ma-
     chine translation incorporating named entity, in: Proceedings of the 27th
     International Conference on Computational Linguistics, Association for Compu-
     tational Linguistics, Santa Fe, New Mexico, USA, 2018, pp. 3240–3250. URL:
     https://aclanthology.org/C18-1274.
[31] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li,
     P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text
     transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL:
     http://jmlr.org/papers/v21/20-074.html.
[32] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal,
     A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,
     G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter,
     C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner,
     S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are
     few-shot learners, 2020. arXiv:2005.14165.
[33] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,
     I. Polosukhin, Attention is all you need, arXiv:1706.03762 [cs] (2017). URL:
     http://arxiv.org/abs/1706.03762.
[34] R. Sennrich, B. Haddow, A. Birch, Neural machine translation of rare words
     with subword units, in: Proceedings of the 54th Annual Meeting of the Associ-
     ation for Computational Linguistics, volume 1, Association for Computational
     Linguistics, 2016, pp. 1715–1725. URL: http://aclweb.org/anthology/P16-1162.
     doi:10.18653/v1/P16-1162.
[35] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat,
     F. Viégas, M. Wattenberg, G. Corrado, et al., Google’s multilingual neural
     machine translation system: Enabling zero-shot translation, Transactions of the
     Association for Computational Linguistics 5 (2017) 339–351.
[36] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Mod-
     els Are Unsupervised Multitask Learners, Technical report, OpenAI, 2019.
     URL: https://cdn.openai.com/better-language-models/language_models_are_
     unsupervised_multitask_learners.pdf.
[37] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
     L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining
     approach, arXiv preprint arXiv:1907.11692 (2019).
[38] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert:
     smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019).
[39] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, Electra: Pre-training
     text encoders as discriminators rather than generators, arXiv preprint
     arXiv:2003.10555 (2020).
[40] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors
     with Subword Information, Transactions of the Association for Computa-
     tional Linguistics 5 (2017) 135–146. URL: https://doi.org/10.1162/tacl_a_00051.
     doi:10.1162/tacl_a_00051.
[41] L. Ermakova, T. Miller, O. Puchalski, F. Regattin, É. Mathurin, S. Araújo, A.-G.
     Bosser, C. Borg, M. Bokiniec, G. L. Corre, B. Jeanjean, R. Hannachi, G.̇ Mallia,
     G. Matas, M. Saki, CLEF Workshop JOKER: Automatic Wordplay and Humour
     Translation, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog,
     K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, volume 13186 of
     Lecture Notes in Computer Science, Springer International Publishing, Cham,
     2022, pp. 355–363. doi:10.1007/978-3-030-99739-7_45.
[42] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: A method for automatic
     evaluation of machine translation, in: Proceedings of the 40th Annual Meeting
     of the Association for Computational Linguistics, 2002, pp. 311–318. URL:
     https://www.aclweb.org/anthology/P02-1040. doi:10.3115/1073083.1073135.
[43] F. Dhanani, M. Rafi, M. A. Tahir, FAST_MT participation for the JOKER CLEF-
     2022 automatic pun and human translation tasks, in: Proceedings of the
     Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum,
     Bologna, Italy, September 5th to 8th, 2022, CEUR Workshop Proceedings,
     CEUR-WS.org, Bologna, Italy, 2022, p. 14.
[44] B. Senel, Can (A)I or can’t (A)I? Translation of wordplays, 2022. URL: https:
     //uazhlt-ms-program.github.io/ling-582-course-blog/senel/shared-task.
[45] L. Glemarec, Use of SimpleT5 for the CLEF workshop JokeR: Automatic Pun
     and Humor Translation, in: Proceedings of the Working Notes of CLEF 2022 –
     Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th to
     8th, 2022, CEUR Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022,
     p. 11.
[46] L. Talec-Bernard, How good can an automatic translation of Pokémon names
     be?, in: Proceedings of the Working Notes of CLEF 2022 – Conference and Labs
     of the Evaluation Forum, Bologna, Italy, September 5th to 8th, 2022, CEUR
     Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022, p. 2.
[47] J. Tiedemann, Parallel data, tools and interfaces in OPUS, in: Proceedings of
     the Eighth International Conference on Language Resources and Evaluation
     (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey,
     2012, pp. 2214–2218. URL: http://www.lrec-conf.org/proceedings/lrec2012/pdf/
     463_Paper.pdf.
[48] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault,
     R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite,
     J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers:
     State-of-the-art natural language processing, in: Proceedings of the 2020
     Conference on Empirical Methods in Natural Language Processing: System
     Demonstrations, Association for Computational Linguistics, Online, 2020, pp.
     38–45. URL: https://aclanthology.org/2020.emnlp-demos.6. doi:10.18653/v1/
     2020.emnlp-demos.6.
[49] E. Winter-Froemel (Ed.), The Dynamics of Wordplay, number 5 in Expanding
     the Lexicon, De Gruyter, 2018.
                            Table 3: List of non-official translations with wordplay
EN            EN Interpret-    Official FR      Official FR In-   Non-official FR   Non-official     Comment
              ation                             terpretation                        FR Interpreta-
                                                                                    tion

Orbeetle      orb + beetle     Astronelle       astronef +        Orbétain          orbe + étain     —
                                                coccinelle
Ribombee      ribbon +         Rubombelle       ruban +           Ribombe           —                term francisation
              bombyliidae +                     bombyliidae +
              bee                               belle +
                                                ribambelle
Celesteela    celestial +      Bamboiselle      bambou +          Célésteela        céleste +        ling. coincidence
              steel                             demoiselle                          [latin] stella
Primarina     prima donna      Oratoria         oratorio +        Primarin          prima donna      ling. coincidence
              + ballerina                       aria                                + marin
Wimpod        wimp +           Sovkipou         sauve qui         Pompode           pompote +        —
              isopod                            peut + pou                          isopode
Incineroar    incinerate +     Félinferno       félin +           Incinéroar        incinérer +      ling. coincidence
              roar                              [anglais]                           roar (cri de
                                                inferno                             lion)
                                                (fournaise)
Incineroar    incinerate +     Félinferno       félin +           Incinéroque       incinérer +      —
              roar                              [anglais]                           roque (coup
                                                inferno                             aux échecs)
                                                (fournaise)
Toxtricity    toxic +          Salarsen         salamandre +      Toxtricité        toxique +        ling. coincidence
              electricity                       arsenic +                           électricité
                                                larsen (effet
                                                Larsen)
Pyroar        pyre + roar      Némélios         lion de           Pyroque           pyro + roque     —
                                                Némée +
                                                [grec] hélios
                                                (mythologie
                                                grecque)
Metallurgix   metallurgy       Amérix           Amérique          Métalurgix         métallurgie +   ling. coincidence
                                                                                    -ix
EN                EN Interpret-   Official FR        Official FR In-   Non-official FR       Non-official       Comment
                  ation                              terpretation                            FR Interpreta-
                                                                                             tion

Wifix             wifi            Rézowifix          "réseau wifi"     Ouifix                wifi + -ix         term francisation
legilimency       [latin]legere   legilimancie       [latin]legere     légilimence           [latin] legere     term francisation
                  (to read) +                        (lire) +                                (lire) + [latin]
                  [latin]mens                        [latin]mens                             mens (esprit)
                  (mind)                             (esprit)
butterbeer        butter + beer   bièreaubeurre      bière + au +      bourreau-bourre       bourreau +         non-sens
                                                     beurre                                  bourre
Drifdlim          drift + blimp   Grodrive           gros + dérive     Grodrive              gros + dérive      —
Mismagius         mischief +      Magirêve           magie + rêve      Virgilus              vigile +           —
                  magus                                                                      Virgile
                                                                                             (personnage
                                                                                             antique) + -us
Dwebble           dwell +         Crabicoque         crabe +           Débébé                débébé / des       —
                  pebble                             bicoque                                 bébés
Terrible Terror   [0]             Terreur Terrible   [0]               Terreur terrifiante   terreur            ling. coincidence
                                                                                             terrifiante
                                                                                             (répétition en
                                                                                             synonymes
                                                                                             pour
                                                                                             amplifier le
                                                                                             sens)
Gold Ammolet      ammo +          Ammolette en or    ammo              Ammolette d’or        ammolette          —
                  amulet                             [anglais] +                             (amulette +
                                                     amulette                                [anglais]
                                                                                             ammo)