=Paper=
{{Paper
|id=Vol-3180/paper-128
|storemode=property
|title=Overview of the CLEF 2022 JOKER Task 3: Pun Translation from English into French
|pdfUrl=https://ceur-ws.org/Vol-3180/paper-128.pdf
|volume=Vol-3180
|authors=Liana Ermakova,Fabio Regattin,Tristan Miller,Anne-Gwenn Bosser,Claudine Borg,Benoît Jeanjean,Élise Mathurin,Gaëlle Le Corre,Radia Hannachi,Silvia Araújo,Julien Boccou,Albin Digue,Aurianne Damoy
|dblpUrl=https://dblp.org/rec/conf/clef/ErmakovaRMBBJMC22
}}
==Overview of the CLEF 2022 JOKER Task 3: Pun Translation from English into French==
Overview of the CLEF 2022 JOKER Task 3: Pun Translation from English into French Liana Ermakova1 , Fabio Regattin3 , Tristan Miller4 , Anne-Gwenn Bosser5 , Claudine Borg6 , Benoît Jeanjean1 , Elise Mathurin1 , Gaelle Le Corre7 , Radia Hannachi8 , Sílvia Araújo9 , Julien Boccou1 , Albin Digue1 and Aurianne Damoy1 1 Université de Bretagne Occidentale, HCTI, 29200 Brest, France 2 Maison des sciences de l’homme en Bretagne, 35043 Rennes, France 3 Dipartimento DILL, Università degli Studi di Udine, 33100 Udine, Italy 4 Austrian Research Institute for Artificial Intelligence, Vienna, Austria 5 École Nationale d’Ingénieurs de Brest, LabSTICC CNRS UMR 6285 5 University of Malta, Msida MSD 2020, Malta 7 Université de Bretagne Occidentale, CRBC, 29200 Brest, France 8 Université de Bretagne Sud, HCTI, 56321 Lorient, France 9 University of Minho, Portugal 4 Austrian Research Institute for Artificial Intelligence, Vienna, Austria Abstract The translation of the pun is one of the most challenging issues for translators and for this reason has become an intensively studied phenomenon in the field of translation studies. Translation technology aims to partially or even totally automate the translation process, but relatively little attention has been paid to the use of computers for the translation of wordplay. The CLEF 2022 JOKER track aims to build a multilingual corpus of wordplay and evaluation metrics in order to advance the automation of creative-language translation. This paper provides an overview of the track’s Pilot Task 3, where the goal is to translate entire phrases containing wordplay (particularly puns). We describe the data collection, the task setup, the evaluation procedure, and the participants’ results. We also cover a side product of our project, a homogeneous monolingual corpus for wordplay detection in French. Keywords wordplay, computation humour, pun, machine translation, deep learning CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy $ liana.ermakova@univ-brest.fr (L. Ermakova) https://www.joker-project.com/ (L. Ermakova) 0000-0002-7598-7474 (L. Ermakova); 0000-0003-3000-3360 (F. Regattin); 0000-0002-0749-1100 (T. Miller); 0000-0003-3858-5502 (C. Borg); 0000-0001-5157-1899 (B. Jeanjean); 0000-0002-7598-7474 (G. L. Corre); 0000-0003-4321-4511 (S. Araújo) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR http://ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1. Introduction Wordplay is ubiquitous in both speech and writing as a means to evoke humour. It can occur on or intersect with virtually any level of language, including the phon- ological, orthographical, morphological, lexical, syntactic, or textual [1]. Punning is a particular form of wordplay in which a word or phrase suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word or phrase [2, 3]. Despite being a popular subject of research in translato- logy [4, 5], the translation of puns has received little attention in the fields of natural language processing (NLP) and machine translation (MT) [6]. With increasing global communication, the demand for translation grows ever faster, which has spurred rapid development of MT technology [7]. Recent developments in machine learning and artificial intelligence have greatly improved the quality of MT, but puns are often held to be untranslatable, particularly by statistical or neural MT [8, 9], which cannot robustly deal with texts that deliberately disregard or subvert linguistic conven- tions [6]. Among the main challenges in translating puns are linguistic and cultural differences [10, 11, 12], which can affect the target audience’s comprehension of the joke and must therefore inform the translator’s choice of strategy. In 2022, the JOKER workshop at CLEF proposed three pilot tasks [13]: (1) classify and explain instances of wordplay, (2) translate single terms containing wordplay, and (3) translate entire phrases containing wordplay (puns) from English into French. This paper describes and discusses the third of these tasks, including the participating systems and their results. The goal of the workshop was to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation. 2. Related work 2.1. Wordplay translation strategies Over the past few decades, the field of translation studies has devoted increasing interest to wordplay [14]. Various strategies for wordplay translation have been conceived and described over time, and, accordingly, some typologies have been produced. Two of them stand out for their quality and their universalist purpose. The first of these is the fourfold typology of Henry [15, pp. 176–192]: 1. traduction isomorphe (isomorphic translation) 2. traduction homomorphe (homomorphic translation) 3. traduction hétéromorphe (heteromorphic translation) 4. traduction libre (free translation) The isomorphic strategy consists of translating a source-text (ST) wordplay with an identical wordplay (except for formal differences) in the target language (TL). This is what happens, for example, when the German portmanteau adjective famil- lionär (amalgamating Familie + Millionär) is translated into English or French as famillionaire. As in this case, the isomorphic strategy is a borderline situation, which only happens due to fortuitous (or historical) similarities between languages. The homomorphic strategy consists of translating an ST wordplay with a wordplay of the same typology, based on different linguistic material. This is what happens when we translate an anagram with an anagram, or a pun with a different pun (i.e., in the great majority of cases where we cannot lean on the isomorphic strategy). The heteromorphic strategy involves translating an ST wordplay with a wordplay of a different typology in the TL. For instance, we could translate an anagram with a pun, or a portmanteau with assonance. Free translation takes place when the ST wordplay is translated into something other than wordplay. Despite its allure (as well as its elegant terminological uniformity), Henry’s tax- onomy has a serious flaw: the fourth category, free translation, is a potentially very broad one, as it brings together many different strategies. The second wordplay translation typology, developed by Delabastita [16], dissects this fourth category in a much more precise way. This is the reason why we will rely on a combination of both typologies in the rest of this paper. While Henry’s typology is mostly based on the author’s experience as a translator, Delabastita’s was developed on the basis of parallel corpus analysis and therefore reflects the real techniques used by human translators in their work. And while the typology was developed specifically for puns (a type of wordplay that exploits multiple meanings of a term or of similar-sounding words), many of the strategies it describes can be successfully applied to other types of wordplay that are not based on ambiguity. Delabastita lists the following options: 1. pun→pun: The ST pun is translated by a TL pun. This category can be further partitioned into three subtypes, using Henry’s typology: • isomorphic translation • homomorphic translation • heteromorphic translation Strategies 2 to 8 below can all be related to Henry’s fourth category, free translation: 2. pun→non-pun: The pun is translated by a non-punning phrase, which may reproduce all senses of the wordplay or just one of them, without trying to do this in an equally ambiguous way. 3. pun→related rhetorical device: The pun is replaced by some other, rhetorically charged, utterance (involving repetition, alliteration, rhyme, irony, paradox, etc.). 4. pun→zero: The part of text containing the pun is omitted altogether. 5. pun ST=pun TT: The punning text, and sometimes its immediate environment, is/are reproduced in the SL in the target text (TT), without attempting a TL rendering. 6. non-pun→pun: A pun is introduced in the TT where no wordplay was present in the ST. 7. zero→pun: New textual material involving wordplay is added to the TT, which bears no correspondence whatsoever in the ST. 8. editorial techniques: All the paratextual strategies involved in explaining, or presenting alternative renderings for, the pun of the ST (footnotes, prefaces, translator’s notes, etc.). Delabastita insists on one further point: these eight strategies are by no means exclusive. A translator could, for instance, suppress a pun somewhere in their TT (locally leading to a pun→non-pun solution), they could explain it in a footnote (editorial techniques), and finally try to compensate for the loss by adding another pun somewhere else in the text (non-pun→pun or zero→pun). The very typology of translation strategies drawn by Delabastita directly points to the main reason for the difficulty of conceiving a working machine translation system for puns. How can we automate the omission of a pun, the introduction of wordplay somewhere else in a text, or the reproduction of a SL textual segment in the TT? One could say, then, that the typology developed by Henry could be more useful, because it (usually) only accounts for translations of a wordplay in the ST with a wordplay in the TT. Unfortunately, it cannot be stressed enough that this goes against most human translators’ practice. Very often, the strategies used by human translators completely break any kind of textual relationship between the ST and the TT. This is the reason why wordplay translation is seen by many practitioners and theoreticians alike as something “other” than translation – say, as adaptation or as re-creation – and this is the reason why we believe that only Delabastita’s typology should be the goal to achieve in the long term for a useful wordplay machine translation engine. 2.2. Computational humour To date, there have been few studies on the MT of wordplay. Farwell and Helm- reich [17] proposed a pragmatic-based approach to MT that accounts for the author’s locutionary, illocutionary, and perlocutionary intents (that is, the “how”, “what”, and “why” of the text), and discuss how it might be applied to puns. However, no work- ing system appears to have been implemented. Miller [18] proposed an interactive method for the computer-assisted translation of puns, an implementation (PunCAT) and evaluation of which was described by Kolb and Miller [19]. Their study was lim- ited to a single language pair (English to German) and translation strategy (namely, the pun→pun strategy described previously). Furthermore, the tool’s functionality is limited to facilitating exploration of the semantic fields corresponding to the two meanings of the pun; actually detecting and interpreting the ST pun, and devising a complete TL punning joke, is left to the user. Numerous studies have been conducted for the related tasks of humour gener- ation and detection. Pun generation systems have often been based on template approaches. Valitutti, Toivonen, Doucet, and Toivanen [20] used lexical constraints to generate adult humour by substituting one word in a pre-existing text. Hong and Ong [21] trained a system to extract automatically humorous templates which were then used for pun generation. Some current efforts to tackle this difficult problem more generally using neural approaches have been hindered by the lack of a sizable pun corpus [22]. Recent work [23] has tackled the issue for generating humourous puns in English based on the data provided at SemEval-2017 [2]. Meanwhile, the recent rise of conversational agents and the need to process large volumes of social media content point to the necessity of automatic humour recognition [24]. Humour and irony studies are now crucial when it comes to social listening [25, 26, 27, 28], dialogue systems (chatbots), recommender systems, reputation monitoring, and the detection of fake news [29] and hate speech [30]. However, the automatic detection, location, and interpretation of humorous wordplay in particular has so far been limited to punning. And while even the earliest such systems have achieved decent performance on the detection and location tasks [31], methods for actually interpreting the double meaning of the pun – a prerequisite for translation – have not been as intensively researched. Miller, Hempelmann, and Gurevych [31] report an accuracy of 16.0% and 7.7% accuracy for homographic and heterographic puns, respectively, and this baseline does not seem to have been improved upon in more recent work [32]. Again, indications point to the lack of sufficient training data as a stumbling block to further progress, especially for languages other than English. A few monolingual humour corpora do exist, including the datasets created for shared tasks of the International Workshop on Semantic Evaluation (SemEval): #HashtagWars: Learning a Sense of Humor [33], Detection and Interpretation of English Puns [31], Assessing Humor in Edited News Headlines [34], and HaHack- athon: Detecting and Rating Humor and Offense [35]. Mihalcea and Strapparava [36] collected 16 000 humorous sentences and an equal number of negative samples from news titles, proverbs, the British National Corpus, and the Open Mind Common Sense dataset, while another dataset contains 2400 puns and non-puns from news sources, Yahoo! Answers, and proverbs [37, 38]. Most datasets are in English, with some notable exceptions for Italian [39], Russian [40, 41], and Spanish [42]. To the best of our knowledge, no corpus exists for French. To the best of our knowledge the only parallel corpus of wordplay was the one introduced in our research [13, 43]. We manually collected over a thousand translated examples of wordplay, in English and French, from video games, advertising slogans, literature, and other sources [13, 43]. Each example has been manually classified according to a multi-label inventory of wordplay types and structures, and annotated according to its lexical-semantic or morphosemantic components. However, the majority of the collected wordplay was single-term proper nouns or neologisms based on portmanteaux, the like of which are common in the Asterix and Harry Potter universes. Large pre-trained AI models, like Jurassic-1 [44], mT5 [45], BERT [46], and GPT [47, 48], have outperformed other state-of-the-art models on several NLP tasks, including MT [49]. Performance of such supervised MT systems depends on the quality and quantity of training data [50]. However, as mentioned above, there exist no large- scale, broad-coverage parallel corpora of wordplay. This corpus is a key prerequisite for the training and evaluation of MT models. Humorous wordplay often exploits the confrontation of similar forms with different meanings, evoking incongruity between expected and presented stimuli. This makes it particularly important in NLP to study the strategies that human translators use for dealing with wordplay [51, 52]. On the one hand, this is because MT is generally ignorant of pragmatics and assumes that words in the source text are formed and used in a conventional manner. MT systems fail to recognise the deliberate ambiguity of puns or the unorthodox morphology of neologisms, leaving such terms untranslated or else translating them in ways that lose the humorous aspect [18]. 3. Data Our English corpus of puns is mainly based on that of the SemEval-2017 shared task on pun identification [31]. The original annotated dataset contains 3387 standalone English-language punning jokes, between 2 and 69 words in length, sourced from offline and online joke collections. Roughly half of the puns in the collection are “weakly” homographic (meaning that the lexical units corresponding to the two senses of the pun, disregarding inflections and particles, are spelled identically) while the other half are heterographic (that is, with lemmas spelled differently). The original annotation scheme is rather simple, indicating only the pun’s location within the joke, whether it is homographic or heterographic, and the two meanings of the pun (with reference to senses in WordNet [53]). In order to translate this subcorpus from English into French, we applied a gamific- ation strategy. More precisely, we organised a translation contest.1 The contest was open to students but we also received multiple translations out of official ranking from professional translators and academics in translation studies. The results were submitted via Google Forms. Forty-seven participants submitted 3950 translations of 500 puns from the SemEval-2017 dataset. We first took 250 puns in English from each of homographic and heterographic subsets. In the form, the homographic and heterographic puns were alternated. Each page of the form contained 100 puns. Unfortunately, Google Forms does not allow questions to be shuffled for each participant. Thus, we observed a drastic drop in the number of translations per pun starting from the second page (see Figure 1). As we had two participants who translated almost all puns (see Figure 3), we have a conspicuous peak on the number of translations per query (Figure 2). However, this histogram does not provide a clear idea about the translation difficulty of puns as the vast majority of participants translated only the first page of the form. Figure 4, the number of translations per query on the first page only, perhaps better reflects the translation difficulty distribution. 1 https://www.joker-project.com/pun-translation-contest/ Number of translations per query 40 30 20 10 0 Figure 1: Number of translations per query Histogram of the number of translations per query (all) 150 100 50 0 0 2 3 5 6 8 10 11 13 14 16 18 19 21 22 24 25 27 29 30 32 33 35 Figure 2: Histogram of the number of translations per query (all) Besides this SemEval-derived data, we sourced further translation pairs from published literature and from puns translated by Master’s students in translation. We annotated the dataset according to the classification used for Pilot Task 1 of our workshop [54]. Number of translations per participant 500 400 300 200 100 0 Figure 3: Number of translations per participant Histogram of the number of translations 20 15 10 5 0 5 8 12 15 18 22 25 28 32 35 Figure 4: Histogram of the number of translations per query (first page) 3.1. Training data In total, the final annotated training set in English contained 1772 instances. The French collection contained 4753 annotated instances. The data was provided to participants as a JSON file (or a CSV file for manual runs) with fields denoting the instance’s unique ID (id), the source text in English (en), and a target text in French (fr). For example: [ { "id": "pun_724_1", "en": "My name is Wade and I’m in swimming pool maintenance.", "fr": "Je m’appelle Jacques Ouzy, je m’occupe de l’entretien des piscines." } ] 3.1.1. Test data The test set contains 2378 instances in English from the SemEval-2017 pun task [31]. The data format was identical to that of the training data, except that the field for the target text was omitted. Example: [ { "id": "het_713", "en": "Ever since my mineral extraction facility was converted to parking, I’ve had a lot on my mine.", } ] The expected output format was identical to that of the training data, but with the addition of fields RUN_ID and MANUAL. The RUN_ID field value uniquely identifies a given run and is formed of the team ID (as registered on the CLEF website) followed by the task ID (in this pilot task, always task_3), followed by the run number. The MANUAL field value can be either a 1 (indicating a manual translation run) or a 0 (indicating a machine translation run). Example: [ { "RUN_ID": "JCM_task_3_run1", "MANUAL": 1, "id": "pun_724_1", "en": "My name is Wade and I’m in swimming pool maintenance.", "fr": "Je m’appelle Jacques Ouzy, je m’occupe de l’entretien des piscines." } ] 4. Evaluation metrics As we have previously argued [13], the prevailing BLEU metric for machine transla- tion is clearly inappropriate for use with wordplay, where a wide variety of translation strategies (and solutions implementing those strategies) are permissible. Many of these strategies require metalexical awareness and preservation of features such as lexical ambiguity and phonetic similarity. For our evaluation, participants’ runs were pooled together. We filtered out all translations that did not match the regular expression .+[?.!"]\s*$ as we con- sidered these translations to be truncated. Indeed, in some runs (e.g., Cecilia’s run 3) the majority of generated translations were too short with regard to the source wordplay and truncated in the middle of the sentence. We refer further the retained translations as valid. We then filtered out French translations identical to the original wordplay in English, as we considered these wordplay instances to be not translated. The pool of valid distinct translations into French contains 9513 instances. Three Master’s students in translation, French native speakers, manually evaluated each valid translation as follows. We evaluated the following errors: • nonsense: This metric is true when the translation contains a nonsensical passage. • syntax problem: This metric is true when the translation contains a passage with errors in syntax. • lexical problem: This metric is true when the translation contains a passage with errors in word choice/use. An instance was not evaluated for subsequent metrics if one of the above errors was identified. For translations without these errors, we evaluated: • lexical field preservation, sense preservation, comprehensible terms, wordplay form: These four metrics are evaluated as in Task 2. • identifiable wordplay: A value of true is assigned to translations that are word- play and are understandable for general audience. For example, the wordplay “Je n’abandonnerai jamais mes chiens!” dit Tom cyniquement. (meaning “ ‘I’ll never abandon my dogs!’ Tom said cynically”) requires etymological knowledge that is beyond most readers. • over-translation: A value of true is assigned to translations that have useless multiple wordplay instances when the source text has just one. • style shift : A value of true is assigned to translations that have style shift (e.g., where a vulgarism is present either in the source text or the translation but not in both). • humorousness shift : A value of true is assigned to translations that were judged to be much more or much less funnier than the source wordplay. Note that the categories over-translation, style shift and humorousness shift are necessarily subjective. Table 1 Scores of participants’ runs for Pilot Task 3 LJGG FAST_MT LJGG Cecilia Humorless Cecilia DeepL auto run 1 run 3 total 2378 2378 2378 2378 2378 2378 valid 2324 2120 2264 2343 384 7 not translated 39 103 206 49 22 2 nonsense 59 220 349 51 297 3 syntax problem 17 58 46 41 6 0 lexical problem 25 79 78 52 10 0 lexical field preservation 2184 1739 1595 2155 118 6 sense preservation 1938 1453 1327 1803 100 6 comprehensible terms 1188 867 827 744 56 5 wordplay form 373 345 261 251 19 1 identifiable wordplay 342 318 240 243 16 1 over-translation 3 1 9 13 0 0 style shift 9 12 4 4 0 0 humorousness shift 930 765 838 1427 68 4 5. Methods used by the participants Four teams participated in Pilot Task 3: FAST_MT [55], Cecilia [56], Humorless (no paper submitted), and LJGG [57]. Cecilia updated their run, and LJGG submitted two runs, one of which was produced with DeepL.2 LJGG’s other run, and that of Cecilia, were generated using the SimpleT5 library3 for the Google T5 (Text-To-Text Transfer Transformer) model, which is based on the transfer learning with a unified text-to-text transformer [58]. FAST_MT also applied transformers but decided not to do fine-tuning; more pre- cisely, the team used the Helsinki/NLP/opus-mt-en-fr model [59] from the Hugging Face4 repository. 6. Results Table 1 presents the results of submitted runs for Task 3. We observe that in many cases the successful translations are due to the existence of the same lexical ambiguity (homonymy) in both languages: Example 6.1. A train load of paint derailed. Nearby businesses were put in the red. Un train de peinture a déraillé. Les entreprises voisines ont été mises dans le rouge. 2 https://www.deepl.com/ 3 https://github.com/Shivanandroy/simpleT5 4 https://huggingface.co/ Example 6.2. An undertaker can be one of your best friends, he is always the last one to let you down. Un entrepreneur peut être l’un de vos meilleurs amis, il est toujours le dernier à vous laisser tomber. We also noticed some surprisingly successful translations:5 Example 6.3. Success comes in cans, failure comes in cant’s. Le succès c’est dans les canons, le pétrin c’est dans les canettes. Example 6.4. Wal-Mart Is Not the Only Saving Place. Come On In. Le clerc n’est pas le seul à faire des économies. Notably, a few successful translations used anglicisms: Example 6.5. I used to be addicted to soap, but I’m clean now. Avant, j’étais accro au savon, mais je suis clean maintenant. Example 6.6. When the beekeeper moved into town he created quite a buzz. Lorsque l’apiculteur s’est installé en ville, il a créé un véritable buzz. Out of over 1155 translations containing wordplay, only 311 were translations of heterographic puns. This suggests that the state-of-the art machine translation is still unsuitable for translating wordplay, even with a manually annotated training set. The successful machine translations are seemingly accidental, owing to the existences of the same word ambiguity in both languages. In total only 13% of automatically translated plays on words were successful, compared to the 90% success rate for instances translated by the human participants of our contest. 7. French corpus for wordplay detection A side product of our project is a creation of homogeneous monolingual corpus for wordplay detection in French. As stated previously, our parallel wordplay corpus is primarily constructed by the translation of the corpus of English puns introduced at SemEval-2017 Task 7: Detec- tion and Interpretation of English Puns [2]. This corpus contains 2250 homographic and 1780 heterographic puns. All puns were translated during the translation contest described in §3 and 90% of these translations were successful. These facts provide evidence that pun translation is possible. On the other hand, machine translations succeeded only in 13% of cases. We manually annotated all 9513 machine transla- tions submitted by our participants. Note that the translations of the same sentence are close to each other in terms of length and lexical field. Given successful and 5 On closer inspection, we determined that Example 6.4 was very close to an example from a training set. Table 2 Confusion matrix of T5 on all SemEval-2017 Task 7 data Pun (ground truth) Not pun (ground truth) 1607 Homographic: 11.64 avg 643 Homographic: 8.7 avg len len 1271 Heterographic: 11.6 avg len 509 Heterographic: 8.6 avg len Pun (predicted) 1564 Homographic: 11.7 avg len 25 Homographic: 9.1 avg len 1238 Heterographic: 11.7 avg len 18 Heterographic: 9.5 avg len Not pun (predicted) 43 Homographic: 10.7 avg len 618 Homographic: 8.7 avg len 33 Heterographic: 7.7 avg len 491 Heterographic: 8.5 avg len unsuccessful human and machine translations, we obtained a homogeneous corpus in French containing wordplay and non-wordplay with similar characteristics. This similarity in terms of length and lexicon is crucial to build a corpus for wordplay de- tection, as the vast majority of state-of-the-art NLP approaches are neural ones [60]. Thus, these models might learn the difference in lexicon or sentence length instead of the ambiguity in a pun. Indeed, when we tested the Google T5 model [58] via the SimpleT5 library on the shuffled SemEval-2017 data, we obtained 92.8% on the test set (403 shuffled instances). The split was 70% train, 20% validation, and 10% test. However, a closer look at the confusion matrix (see Table 2) provides evidence that the non-puns are much shorter than puns in the corpus in average and the model fails when it is not the case. Thus, the homogeneity of the corpus for wordplay detection is important. To the best of our knowledge, this is the first corpus for wordplay detection in French. This corpus has been already used for a five-step wordplay generation, aiming to transform a non-humorous text into wordplay [61]. This source corpus without wordplay has the potential to be transformed into a corpus of wordplays. Only the machine translations that were annotated not to contain wordplay were used for this generation (6780 texts in total). 8. Conclusion The goal of the JOKER project is to advance the automation of creative-language translation by developing the requisite parallel data and evaluation metrics for translating wordplay. To this end, we organised the JOKER track at CLEF 2022, consisting of a workshop and associated pilot tasks on automatic wordplay analysis and translation. We collected a unique English–French parallel wordplay corpus. Successful translations of puns in Pilot Task 3 are usually accidental, as they exploit the ambiguity of the literal translation of the target wordplay term both in English and French. However, some translations are successful due to the correct use of anglicisms in French. A side product of our project is a creation of homogeneous monolingual corpus for wordplay detection in French. To the best of our knowledge, this is the first corpus for wordplay detection in French. Further details on the other pilot tasks and the submitted runs can be found in the CLEF CEUR proceedings [62]. The overview of the entire JOKER track can be found in the LNCS proceedings [43]. Additional information on the track is available on the JOKER website: http://www.joker-project.com/ 9. Authors’ contribution The general framework was proposed by L. Ermakova with the participation of T. Miller and A.-G. Bosser. L. Ermakova, F. Regattin, S. Araújo, B. Jeanjean, C. Borg, G. Le Corre, E. Mathurin, R. Hannachi, and T. Miller worked on the translation contest organisation. J. Boccou, A. Digue, and A. Damoy participated in data creation and worked on the result evaluation under supervision of L. Ermakova. F. Regattin wrote the state-of-the-art on wordplay translation strategies. Acknowledgments This work has been funded in part by the National Research Agency under the pro- gram Investissements d’avenir (Reference ANR-19-GURE-0001) and by the Austrian Science Fund under project M 2625-N31. JOKER is supported by La Maison des sciences de l’homme en Bretagne. We thank other members of the jury of the pun translation contest: Caroline Comacle, Mohamed Saki, Helen McCombie, and Catherine Davis, as well as the trans- lation contest participants. We thank Alain Kerhervé for the financial support of the translation contest and Eric Sanjuan who provided resources for data management. We would like also thank other JOKER organisers: Monika Bokiniec, Goṙ ̇ g Mallia, Gordan Matas, Mohamed Saki, Benoît Jeanjean, Radia Hannachi, Danica Škara, and other PC members: Grigori Sidorov, Victor Manuel Palma Preciado, and Fabrice Antoine. References [1] S. Laviosa, Wordplay in advertising: Form, meaning and function, Scripta Manent 1 (2015) 25–34. [2] T. Miller, C. F. Hempelmann, I. Gurevych, SemEval-2017 Task 7: Detection and interpretation of English puns, in: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017, pp. 58–68. doi:10. 18653/v1/S17-2005. [3] L. Bobchynets, Lexico-semantic means of creation of pun in spanish and por- tuguese jokes, Nova fìlologìâ (2020). doi:10.26661/2414-1135-2020-80-1-11. [4] E. S. Rudenko, R. I. Bachieva, Wordplay as a translation problem, Bulletin of the Moscow State Regional University (Linguistics) (2020). doi:10.18384/ 2310-712x-2020-2-78-85. [5] R. Tuzzikriah, H. Ardi, Students’ perception on the problem in translating humor text, Proceedings of the Eighth International Conference on English Language and Teaching (ICOELT-8 2020) (2021). doi:10.2991/assehr.k.210914.061. [6] T. Miller, The punster’s amanuensis: The proper place of humans and machines in the translation of wordplay, in: Proceedings of the Second Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT 2019), 2019, pp. 57–64. doi:10.26615/issn.2683-0078.2019_007. [7] Y. He, Challenges and countermeasures of translation teaching in the era of artificial intelligence, Journal of Physics: Conference Series 1881 (2021). doi:10.1088/1742-6596/1881/2/022086. [8] H. Ardi, M. A. Hafizh, I. Rezqi, R. Tuzzikriah, Can machine translations translate humorous texts?, Humanus (2022). doi:10.24036/humanus.v21i1.115698. [9] F. Regattin, Traduction automatique et jeux de mots : l’incursion (ludique) d’un inculte, 2021. URL: https://motsmachines.github.io/2021/en/submissions/ Mots-Machines-2021_paper_5.pdf. [10] F. R. B. Kembaren, The challenges and solutions of translating puns and jokes from english to indonesian, VISION (2020). doi:10.30829/vis.v16i2.807. [11] O. G. Hniedkova, Z. O. Karpenko, Peculiarities of pun formation and translation of pun as a type of wordplay, “Scientific notes of V. I. Vernadsky Taurida National University”, Series: “Philology. Journalism” 2 (2021) 254–261. doi:10.32838/ 2710-4656/2021.1-2/44. [12] G. Kovács, Translating humour – a didactic perspective, Acta Universitatis Sapientiae, Philologica 12 (2020) 68–83. [13] L. Ermakova, T. Miller, O. Puchalski, F. Regattin, É. Mathurin, S. Araújo, A.-G. Bosser, C. Borg, M. Bokiniec, G. L. Corre, B. Jeanjean, R. Hannachi, G. ̇ Mallia, G. Matas, M. Saki, CLEF Workshop JOKER: Automatic Wordplay and Humour Translation, in: M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, V. Setty (Eds.), Advances in Information Retrieval, volume 13186 of Lecture Notes in Computer Science, Springer International Publishing, Cham, 2022, pp. 355–363. doi:10.1007/978-3-030-99739-7_45. [14] F. Regattin, Traduire les jeux de mots : une approche intégrée, Atelier de traduction (2015) 129–151. URL: http://www.diacronia.ro/ro/indexing/details/ A19521/pdf. [15] J. Henry, La Traduction Des Jeux De Mots, Presses de la Sorbonne Nouvelle, Paris, 2003. [16] D. Delabastita, Wordplay as a translation problem: a linguistic perspective, in: Ein internationales Handbuch zur Übersetzungsforschung, volume 1, De Gruyter Mouton, 2008, pp. 600–606. doi:10.1515/9783110137088.1.6.600. [17] D. Farwell, S. Helmreich, Pragmatics-based MT and the translation of puns, in: Proceedings of the 11th Annual Conference of the European Association for Machine Translation, 2006, pp. 187–194. URL: http://www.mt-archive.info/ EAMT-2006-Farwell.pdf. [18] T. Miller, The punster’s amanuensis: The proper place of humans and machines in the translation of wordplay, in: Proceedings of the Second Workshop on Human-Informed Translation and Interpreting Technology, 2019, pp. 57–64. doi:10.26615/issn.2683-0078.2019_007. [19] W. Kolb, T. Miller, Human–computer interaction in pun translation, in: J. Hadley, K. Taivalkoski-Shilov, C. S. C. Teixeira, A. Toral (Eds.), Using Technologies for Creative-Text Translation, Routledge, 2022. To appear. [20] A. Valitutti, H. Toivonen, A. Doucet, J. M. Toivanen, “let everything turn well in your wife”: Generation of adult humor using lexical constraints, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, volume 2, Association for Computational Linguistics, 2013, p. 243–248. URL: https://aclanthology.org/P13-2044. [21] B. A. Hong, E. Ong, Automatically extracting word relationships as templates for pun generation, in: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, Association for Computational Linguistics, Boulder, Colorado, 2009, pp. 24–31. URL: https://aclanthology.org/W09-2004. [22] Z. Yu, J. Tan, X. Wan, A neural approach to pun generation, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume 1, Association for Computational Linguistics, 2018, p. 1650–1660. URL: https://aclanthology.org/P18-1153. doi:10.18653/v1/P18-1153. [23] H. He, N. Peng, P. Liang, Pun Generation with Surprise, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 1734–1744. URL: https://aclanthology.org/N19-1172. doi:10.18653/v1/N19-1172. [24] A. Nijholt, A. Niculescu, A. Valitutti, R. E. Banchs, Humor in human-computer interaction: a short survey, in: A. Joshi, D. K. Balkrishan, G. Dalvi, M. Winckler (Eds.), Adjunct Proceedings: INTERACT 2017 Mumbai, Industrial Design Centre, Indian Institute of Technology Bombay, 2017, pp. 199–220. URL: https://www. interact2017.org/downloads/INTERACT_2017_Adjunct_v4_final_24jan.pdf. [25] B. Ghanem, J. Karoui, F. Benamara, V. Moriceau, P. Rosso, IDAT@FIRE2019: Overview of the track on irony detection in Arabic tweets, in: Proceedings of the 11th Forum for Information Retrieval Evaluation, Association for Computing Machinery, 2019, p. 10–13. doi:10.1145/3368567.3368585. [26] J. Karoui, F. Benamara, V. Moriceau, V. Patti, C. Bosco, N. Aussenac-Gilles, Exploring the impact of pragmatic phenomena on irony detection in tweets: a multilingual corpus study, in: 15th Conference of the European Chapter of the Association for Computational Linguistics, volume 1, Association for Computational Linguistics, 2017, p. 262–272. URL: https://oatao.univ-toulouse. fr/18921/. [27] J. Karoui, B. Farah, V. Moriceau, N. Aussenac-Gilles, L. Hadrich-Belguith, To- wards a contextual pragmatic model to detect irony in tweets, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, volume 2, Association for Computational Linguistics, 2015, pp. 644–650. URL: http://aclweb.org/anthology/P15-2106. doi:10.3115/v1/P15-2106. [28] A. Reyes, P. Rosso, D. Buscaldi, From humor recognition to irony detection: the figurative language of social media, Data & Knowledge Engineering 74 (2012) 1–12. doi:10.1016/j.datak.2012.02.005. [29] G. Guibon, L. Ermakova, H. Seffih, A. Firsov, G. Le Noé-Bienvenu, Multilingual Fake News Detection with Satire, in: CICLing: International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, 2019. URL: https://halshs.archives-ouvertes.fr/halshs-02391141. [30] C. Francesconi, C. Bosco, F. Poletto, M. Sanguinetti, Error Analysis in a Hate Speech Detection Task: the Case of HaSpeeDe-TW at EVALITA 2018, in: R. Bern- ardi, R. Navigli, G. Semeraro (Eds.), Proceedings of the 6th Italian Conference on Computational Linguistics, 2018. URL: http://ceur-ws.org/Vol-2481/paper32.pdf. [31] T. Miller, C. F. Hempelmann, I. Gurevych, SemEval-2017 Task 7: Detection and interpretation of English puns, in: Proceedings of the 11th International Work- shop on Semantic Evaluation, 2017, pp. 58–68. doi:10.18653/v1/S17-2005. [32] A. Jain, P. Yadav, H. Javed, Equivoque: Detection and interpretation of English puns, in: Proceedigns of the 8th International Conference System Modeling and Advancement in Research Trends, 2019, pp. 262–265. doi:10.1109/SMART46866. 2019.9117433. [33] P. Potash, A. Romanov, A. Rumshisky, SemEval-2017 Task 6: #HashtagWars: Learning a sense of humor, in: Proceedings of the 11th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 2017, pp. 49–57. doi:10.18653/v1/S17-2004. [34] N. Hossain, J. Krumm, M. Gamon, H. Kautz, SemEval-2020 task 7: Assessing humor in edited news headlines, in: Proceedings of the Fourteenth Workshop on Semantic Evaluation, International Committee for Computational Linguistics, Barcelona (online), 2020, pp. 746–758. URL: https://aclanthology.org/2020. semeval-1.98. doi:10.18653/v1/2020.semeval-1.98. [35] J. A. Meaney, S. Wilson, L. Chiruzzo, A. Lopez, W. Magdy, Semeval-2021 task 7: Hahackathon, detecting and rating humor and offense, in: Proceedings of the 15th International Workshop on Semantic Evaluation, Association for Computational Linguistics, 2021, p. 105–119. URL: https://aclanthology.org/ 2021.semeval-1.9. doi:10.18653/v1/2021.semeval-1.9. [36] R. Mihalcea, C. Strapparava, Making computers laugh: Investigations in auto- matic humor recognition, in: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference, Association for Computational Linguistics, Stroudsburg, PA, 2005, pp. 531–538. URL: http://www.aclweb.org/anthology/H/H05/H05-1067. doi:10.3115/1220575.1220642. [37] A. Cattle, X. Ma, Recognizing humour using word associations and humour anchor extraction, in: Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, 2018, p. 1849–1858. URL: https://www.aclweb.org/anthology/ C18-1157. [38] D. Yang, A. Lavie, C. Dyer, E. Hovy, Humor recognition and humor anchor extraction, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2015, p. 2367–2376. URL: https://www.aclweb.org/anthology/D15-1284. doi:10. 18653/v1/D15-1284. [39] A. Reyes, D. Buscaldi, P. Rosso, An analysis of the impact of ambiguity on automatic humour recognition, in: V. Matoušek, P. Mautner (Eds.), Text, Speech and Dialogue, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2009, pp. 162–169. doi:10.1007/978-3-642-04208-9_25. [40] V. Blinov, V. Bolotova-Baranova, P. Braslavski, Large dataset and language model fun-tuning for humor recognition, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2019, pp. 4027–4032. doi:10.18653/v1/P19-1394. [41] A. Ermilov, N. Murashkina, V. Goryacheva, P. Braslavski, Stierlitz meets SVM: Humor detection in Russian, in: D. Ustalov, A. Filchenkov, L. Pivovarova, J. Žižka (Eds.), Artificial Intelligence and Natural Language: 7th International Conference, AINL 2018, volume 930 of Communications in Computer and Information Science, Springer, Cham, Switzerland, 2018, pp. 178–184. doi:10. 1007/978-3-030-01204-5_17. [42] S. Castro, L. Chiruzzo, A. Rosá, D. Garat, G. Moncecchi, A crowd-annotated Spanish corpus for humor analysis, in: Proceedings of the Sixth Interna- tional Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, 2018, p. 7–11. URL: https://www.aclweb.org/ anthology/W18-3502. doi:10.18653/v1/W18-3502. [43] L. Ermakova, T. Miller, F. Regattin, A.-G. Bosser, E. Mathurin, G. L. Corre, S. Araújo, J. Boccou, A. Digue, A. Damoy, B. Jeanjean, Overview of JOKER@CLEF 2022: Automatic Wordplay and Humour Translation workshop, in: A. Barrón- Cedeño, G. Da San Martino, M. Degli Esposti, F. Sebastiani, C. Macdonald, G. Pasi, A. Hanbury, M. Potthast, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Thirteenth International Conference of the CLEF Association (CLEF 2022), volume 13390 of LNCS, 2022. [44] O. Lieber, O. Sharir, B. Lentz, Y. Shoham, Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021. URL: https://uploads-ssl.webflow. com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_ tech_paper.pdf. [45] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel, mT5: A massively multilingual pre-trained text-to-text transformer, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2021, p. 483–498. URL: https://aclanthology.org/2021.naacl-main. 41. doi:10.18653/v1/2021.naacl-main.41. [46] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, volume 1, Association for Computational Linguistics, 2019, p. 4171–4186. URL: https://doi.org/10.18653/v1/n19-1423. doi:10.18653/ v1/n19-1423. [47] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, 2020. arXiv:2005.14165. [48] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Mod- els Are Unsupervised Multitask Learners, Technical report, OpenAI, 2019. URL: https://cdn.openai.com/better-language-models/language_models_are_ unsupervised_multitask_learners.pdf. [49] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, arXiv:1706.03762 [cs] (2017). URL: http://arxiv.org/abs/1706.03762. [50] C. Jiang, M. Maddela, W. Lan, Y. Zhong, W. Xu, Neural crf model for sentence alignment in text simplification, arXiv:2005.02324 [cs] (2020). URL: http://arxiv. org/abs/2005.02324. [51] D. Delabastita, There’s a Double Tongue: an Investigation into the Translation of Shakespeare’s Wordplay, with Special Reference to Hamlet, Rodopi, Amsterdam, 1993. [52] P. Vrticka, J. M. Black, A. L. Reiss, The neural basis of humour processing, Nature Reviews Neuroscience 14 (2013) 860–868. doi:10.1038/nrn3566. [53] C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, Cam- bridge, MA, 1998. [54] L. Ermakova, F. Regattin, T. Miller, A.-G. Bosser, S. Araújo, C. Borg, G. L. Corre, J. Boccou, A. Digue, A. Damoy, P. Campen, O. Puchalski, Overview of the CLEF 2022 JOKER Task 1: Classify and explain instances of wordplay, in: G. Faggioli, N. Ferro, A. Hanbury, M. Potthast (Eds.), Proceedings of the Working Notes of CLEF 2022: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2022. [55] F. Dhanani, M. Rafi, M. A. Tahir, FAST_MT participation for the JOKER CLEF- 2022 automatic pun and human translation tasks, in: Proceedings of the Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th to 8th, 2022, CEUR Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022, p. 14. [56] L. Glemarec, Use of SimpleT5 for the CLEF workshop JokeR: Automatic Pun and Humor Translation, in: Proceedings of the Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th to 8th, 2022, CEUR Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022, p. 11. [57] L. J. G. Galeano, LJGG @ CLEF JOKER Task 3: An improved solution joining with dataset from task, in: Proceedings of the Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th to 8th, 2022, CEUR Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022, p. 7. [58] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html. [59] J. Tiedemann, S. Thottingal, OPUS-MT – building open translation services for the world, in: Proceedings of the 22nd Annual Conference of the European Asso- ciation for Machine Translation, European Association for Machine Translation, 2020. [60] S. Zhao, R. Meng, D. He, A. Saptono, B. Parmanto, Integrating Transformer and Paraphrase Rules for Sentence Simplification, in: Proc. of EMNLP 2018, ACL, Brussels, Belgium, 2018, pp. 3164–3173. URL: https://www.aclweb.org/ anthology/D18-1355. [61] L. Glémarec, A.-G. Bosser, L. Ermakova, Generating Humourous Puns in French, in: Proceedings of the Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th to 8th, 2022, CEUR Workshop Proceedings, CEUR-WS.org, Bologna, Italy, 2022, p. 8. [62] G. Faggioli, N. Ferro, A. Hanbury, M. Potthast (Eds.), Proceedings of the Working Notes of CLEF 2022: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2022.