Applying MIPVU Metaphor Identification Procedure on Czech Dalibor Pavlas, Ondřej Vrabeľ, Jiří Kozmér Palacký University Olomouc Křížkovského 511/8, 771 47 Olomouc dalibor.pavlas@gmail.com, ondra.vrabel@seznam.cz, jiri.kozmer@gmail.com Abstract This paper represents the current state of the research project aimed at modifying the MIPVU protocol for metaphor annotation for usage on Czech-language texts. Three annotators were trained to use metaphor identification procedure MIPVU and annotated 2 short text excerpts of about 600 tokens length, then the reliability of annotation was measured using Fleiss’ kappa. The resultant inter- annotator agreement of 0.70 was below kappa values reported by annotators of VU Amsterdam Metaphor Corpus (Steen et al., 2010) and very similar to the agreement that researchers (Badryzlova et al., 2013) got in their first reliability test with unmodified MIPVU procedure applied on Russian texts. Some modifications of the annotation procedure are proposed in order for it to be more suitable for Czech language. The modifications are based on the observations made by annotators in error analysis and by authors of similar projects aimed to transfer MIPVU procedure to Slavic/inflected languages. The functionality of the annotation procedure refinements now have to be tested in the second reliability test. Keywords: Metaphor, MIPVU, MIP, annotation, Metaphor Identification Procedure, inter-annotator agreement, Fleiss’ kappa, Czech language metaphoricity is almost unnoticeable. This caused need 1. Introduction for clearly defined guidelines for metaphor identification in text but due to complexity of the task it was not until This paper represents the current state of the research project aimed at modifying the MIPVU protocol for 2007 before such a procedure was established. It was done by a group of researchers which called themselves metaphor annotation for usage on Czech-language texts. It Pragglejaz group. is the initial stage of creation of Czech metaphor corpus which could be a very valuable resource for several fields Their method called MIP (Metaphor Identification Procedure; Pragglejaz group (2007)) was then refined in of linguistic research (such as computational, cognitive several ways and applied on data from The British and corpus linguistics). This initial stage includes: National Corpus. The upgraded procedure is called MIPVU and the resulting annotated source is VU 1) Modification of the MIPVU protocol for reliable Amsterdam Metaphor Corpus (VUAMC; Steen et al., linguistic metaphor identification in Czech 2) Introducing an alternative tag (located in parallel 2010). It consists of approximately 200,000 words taken from the BNC’s Baby Corpus and it is divided into four to original MIPVU tags) which, if needed, will genres: academic, news, fiction, and conversation. allow us to filter out the highly conventionalized cases of metaphors. In MIPVU, lexical units (words) whose contextual meanings are opposed to their basic meanings are The process of modifying the MIPVU procedure is considered metaphor-related words (MRWs). Annotators described in the following parts of this work. The addition of the alternative tag for highly conventionalized establish the basic and the contextual meaning for each word in the corpus using dictionary. metaphors is motivated by the desire to use the resulting If basic meaning of a word is: corpus for training of systems for automatic identification of metaphor. a) more concrete; what it evokes is easier to imagine, see, hear, feel, smell and taste; Lexicalized cases of metaphors can be successfully b) related to bodily action; interpreted using standard word sense disambiguation techniques (Shutova, 2015), which means that if they are c) more precise (as opposed to vague); the word is marked as MRW. labelled metaphorical in training data it may be causing The history of a lexical unit is usually not taken into metaphor identification system to be less effective. Our goal is to keep the data for metaphor usage statistics, account, which is one of the differences between MIP and MIPVU. so it can be directly comparable with the same statistics available for English, and, at the same time, make the 2.2 Applications of MIPVU to different resulting corpus more suitable for computational approaches to metaphor. languages Yulia Badryzlova and her colleagues (2013) modified the 2. Related work MIPVU protocol for Russian-language texts and attempted to extend annotation to the level of conceptual 2.1 MIP and MIPVU mappings “deep annotation”. Since early ninety-eighties, when conceptual metaphor They measured the inter-annotator agreement on texts theory (CMT; Lakoff and Johnson, 1980) was introduced, using original MIPVU and their modified version and there has been a great interest in metaphor research. At the compared it with the results of the same tests made by same time metaphor, even if we take into account only its Steen and his colleagues (2010) in the process of manifestation in language, is a very complex establishing MIPVU procedure. In the second test their phenomenon. It varies from novel and very creative resulting inter-annotator agreement outperformed the expressions to extremely lexicalized ones, whose agreement reported for VUAMC. The project was then discontinued, but recently Badryzlova and Lyashevskaya 37 (2017) renewed the pursuit for creation of Russian Applying Russian Russian VU metaphor corpus. They used an annotation procedure MIPVU corpus of corpus of Amsterdam based on MIPVU but modified in several ways. In their on Czech; conceptual conceptual Metaphor project, linguistic metaphor annotation is added as a new 3 annotators, metaphor; metaphor; Corpus; layer to SynTagRus, the Russian syntactical dependencies 1209 tokens 3 annotators, 3 annotators, 4 annotators, treebank. approx. 2000 approx. 2000 1921 tokens Justina Urbonaitė (2015) examined metaphors of law tokens tokens related concepts in English and Lithuanian using MIPVU procedure for annotation. Although unable to report inter- (Badryzlova (Badryzlova (Steen et al. annotator agreement as she was the only annotator, her et al. 2013) et al. 2013) 2010) work offered very useful remarks on applying MIPVU on an inflected language. Reliability Reliability Reliability Reliability For the current stage of our project we are using a model test 1 test 1 test 2 test 6 similar to work of (Badryzlova et al., 2013) and are trying 0.70 0.68 0.90 0.85 to utilize the findings and observations from all the three above mentioned sources. Table 2: Comparison of inter-annotator agreement in other MIPVU projects 3. Reliability test It shows that our kappa is yet below the desired numbers and very similar to the agreement that Badryzlova and her We annotated two text excerpts each of about 600 tokens colleagues got in their first reliability test with unmodified length. First excerpt (598 tokens) belonged in the fiction genre and was taken from short story “Zasraný vánoce” MIPVU procedure. by Michal Viewegh. The second one (611 tokens) was taken from a transcription of proceedings of European 4. Error analysis and proposed Parliament. These transcriptions are available from the modifications parallel corpus InterCorp (Rosen et al., 2017), which is a 4.1 Cases of disagreement part of The Czech National Corpus project. Dictionary of Standard Czech Language (Vácha et al., The table 3 shows disagreement count for both annotated 1971; abbreviation SSJČ is commonly used) and texts in total and in respect of different parts of speech. Dictionary of Standard Czech (Kroupová et al., 2005; Part of speech which in both annotated excerpts SSČ) were used to establish basic meanings. manifested most of the disagreement were verbs, followed Two of the 3 annotators were Ph.D. students and the by prepositions in case of the fiction text by Michal remaining one was a Master's student, all of them in the Viewegh, and by nouns in the case of European field of linguistics and with prior experience in conceptual Parliament proceedings. metaphor studies. POS Viewegh Europarl Sum of The reliability of the annotation was measured using disagreement Fleiss' kappa, a statistical measure of inter-annotator Nouns 6 18 24 agreement which corrects for chance agreement between Verbs 18 30 48 analysts (Artstein and Poesio, 2008). Adjectives 6 6 12 In this first reliability test, the annotators were trained in Adverbs 5 4 9 MIPVU protocol and instructed to follow it. The Prepositions 11 16 27 annotation was performed in the manner similar to Conjunctions 0 1 1 reliability tests in the process of making VUAMC, which All POS 46 75 121 means the annotators worked only with plain text and marked each lexical unit with either 1 (MRW) or 0 (non- Table 3: Disagreement count MRW). The Fleiss' kappa calculation as well as It is noteworthy that while the annotated excerpt of determination of the cases of disagreement was carried European Parliament proceedings shows more out by a Python program designed specifically for this disagreements in annotation it nevertheless shows higher task. inter-annotator agreement (as seen in Tab. 1). This is The results can be seen in Tab. 1. caused by the fact that more than twice as many Percentage unanimous metaphors are present in the text compared to the other Text Tokens Not Fleiss’κ excerpt. This corresponds with the findings of Steen and MRW Total his colleagues (2010) that from the four registers, MRW Viewegh 598 87.46 4.85 92.31 0.65 (academic, news, fiction, and conversation) only Europarl 611 76.76 10.97 87.73 0.72 conversation had lower frequency of MRWs than fiction Total Fleiss’ κ 0.70 texts. Part of the disagreement in verb annotation seems to be Table 1: Resultant inter-annotator agreement caused more by a bias of individual annotators than a The minimum thresholds accepted for Fleiss' kappa are systematic pattern in the annotation protocol. In case of commonly stated to be 0.67, 0.7 or 0.8 (Artstein and the European Parliament proceedings one of the Poesio, 2008; Badryzlova et al., 2013), more important is annotators did not marked several metaphorically used the comparison of the resultant inter-annotator agreement lexical units as MRWs. The reason was that in case of with the agreement observed on VUAMC and with the some verbs the annotator overlooked personifying work (Badryzlova et al., 2013). See the comparison in connection between the verb and its subject if the latter Tab. 2. was highly abstract (e.g. luck, possibility, right or 38 freedom), the annotator have realized this omission the contextual one, so it is annotated as not-MRW, which immediately after the annotation course was finished. matches better with the general sense of the sentence. 1 The approach we have chosen for dealing with Similarly, Czech auxiliary verbs such as “bych” are disagreements in preposition annotation is showed in considered integral parts of the full verb’s conjugation chapter 4.2. forms. Therefore for reflexive pronouns “se/si” and auxiliary 4.2 Prepositions verbs we applied the same policy as annotators of In English and presumably in many languages, VUAMC used for phrasal verbs in English, which means prepositions are the most metaphor-rich part of speech as that they count as one lexical unit altogether with the full they are reported to account for 38.5-46.9% of metaphor- verb. related words in VUAMC (Steen et al., 2010). Czech On the other hand, meanings commonly expressed by prepositions are more homonymous than prepositions in phrasal verbs in English tend to be expressed by prefixes English and there was a substantial disagreement between in Czech which are already parts of the word as seen in 7). the annotators. 7) zesílit; turn up Just like Badryzlova and her colleagues (2013) did, we made a list of major prepositions’ basic meanings. We 4.4 Set expressions followed the Czech linguistic tradition where Dealing with set expressions, we followed remarks on prepositions’ meanings are distinguished by grammatical MIPVU made recently by the main author of VUAMC case (Veselková, 1986; Štícha et al., 2013). This helped to (Steen, 2017), which is to treat each word of set filter out homonymy and made it possible to choose just expression as a lexical unit itself. This renders the one basic meaning. demarcation line between metaphor and idiom unclear. Take for example these expressions containing On the other hand, using dictionaries to determine set preposition “za”. While it is clear that in sentences 3) and expressions as (Badryzlova et al., 2013) did, seemed to be 4) “za” is a MRW, in the case of 1) and 2) both meanings problematic because unlike the dictionaries used in the are clearly distinct but equally concrete and bodily related. original MIPVU procedure, dictionaries available for 1) Petr stojí za mnou; Petr stands behind me Czech are neither corpus based, nor contemporary. 2) Chytil jsem ho za nohu; I caught him by the leg 3) Za 2 roky to bude hotové; It will be done in 5. Summary 2 years So far, we have applied MIPVU on Czech texts and tested 4) Vyměnil jsem kolo za auto; I traded the bike for inter-annotator agreement. Direct transferability of the the car MIPVU procedure to Czech language turned out to be If we distinguish between “za” in instrumental (expression problematic, which we expected, as the same 1)) and in accusative 2), we can have basic meaning for complications were reported by researchers applying the each one, moreover “accusative za” standing for basic procedure on Russian (Badryzlova, 2013) and Lithuanian meaning of this preposition in sentences 3) and 4) which (Urbonaitė, 2015). both are MRWs. After the error analysis, we have proposed several minor 4.3 Reflexive pronouns “se/si” and auxiliary modifications of the guidelines in order to make them more suitable for Czech and we plan to conduct second verbs reliability test as soon as possible. Reflexive pronouns “se/si” are used either when the The next step after successfully transferring MIPVU to be subject and object of the sentence are identical 5) or as an used on Czech texts would be to annotate the data with an integral part of a reflexive verb whose lexical meanings additional tag for highly lexicalized metaphors. It is meant they often determine. The presence of a reflexive pronoun to work not by asking whether the contextual meaning is “se/si” can result in a complete shift of meaning as different from basic one but rather whether there is a illustrated in 6). literal word in use which can express the given contextual 5) umyji se; I will wash myself meaning. If there is not, it is probably a highly 6) rozvést / rozvést se; to develop (an idea) / to conventionalized metaphor. divorce Nevertheless, there are several yet unanswered questions Expectably, the original MIPVU procedure does not regarding this approach, the most important one being if account for this phenomenon. The table 4 shows its effect annotators will agree sufficiently on those cases. on an actual annotated sentence. 6. Acknowledgements Annotated sentence Když se před třemi lety rozvedl [...] This work was funded by Ministry of Education, Youth Original MIPVU 0 0 1 0 0 1 and Sport of the Czech Republic as a part of the project Modified MIPVU 0 0 1 0 0 0 “Počáteční fáze tvorby korpusu metafory v češtině” Table 4: Annotation of a sentence where reflexive (Grant number IGA_FF_2018_026). pronoun causes a shift of meaning The highlighted tokens, when treated as separate lexical 1 units, will render the basic meaning of the word “rozvedl” In the first course of annotation the interconnection of words is to be “he developed/expanded (something)” and the realized simply by giving the reflexive pronoun (or an auxiliary verb) always the same value of metaphoricity which is given to contextual meaning which is “he got divorced” should its corresponding verb. This naive method is justifiable because therefore be a MRW. On the other hand, the expression this stage of the project only serves to refine the annotation “se” + “rozvedl”, when counted as one lexical unit which manual. It is not suitable for actual corpus generation as it would is distinct from “rozvedl”, has an equal basic meaning to influence the metaphor usage statistics. 39 Bibliographical References Steen, G. (2017). Identifying metaphors in language. In Arstein, R. and Poesio, M. (2008). Inter-coder agreement Semino, E., Demjén, Z. (Eds.) The Routledge for computational linguistics. Computational Handbook of Metaphor and Language. London: Linguistics, 34(4): 554–596. Routledge, chapt. 5. Badryzlova Y., Lyashevskaya O. (2017). Metaphor Shifts Steen, G., Aletta, G., Dorst, J., Herrmann, B., Kaal, A. A., in Constructions: the Russian Metaphor Corpus. In Krennmayr, T., Pasma, T. (2010). A method for AAAI Spring Symposium Series, pp. 127-130. linguistic metaphor identification: From MIP to Badryzlova, Y., Shekhtman, N., Isaeva, Y., Kerimov, R. MIPVU. Amsterdam, John Benjamins. (2013). Annotating a Russian corpus of conceptual Štícha, F., et al. (2013). Akademická gramatika spisovné metaphor: a bottom-up approach. In Proceedings of the češtiny. Praha: Academia. First Workshop on Metaphor in NLP. Atlanta, GA: Urbonaitė, J. (2015). Metaphor identification procedure Association for Computational Linguistics, pp. 77–86. MIPVU: an attempt to apply it to Lithuanian. Taikomoji Kroupová, L. et al. (2005). Slovník spisovné češtiny pro kalbotyra, (7):1–26. školu a veřejnost: s Dodatkem Ministerstva školství, Vácha, J., editor, et al. (1971). Slovník spisovného jazyka mládeže a tělovýchovy České republiky. Praha: českého. Praha: Academia. Academia. Veselková, J., et al. (1986). Mluvnice češtiny. 2, Lakoff, G., Johnson, M. (1980). Metaphors We Live By. Tvarosloví. Praha: Academia. University of Chicago Press. Pragglejaz Group (2007). MIP: A method for identifying Language Resource References metaphorically used words in discourse. Metaphor and Rosen, A., Vavřín, M., Zasina, A. J. (2017): InterCorp, Symbol, 22(1):1–39. 10.0, Institute of the Czech National Corpus, Charles Shutova, E. (2015). Design and Evaluation of Metaphor University, Prague. Available from: Processing Systems. Computational Linguistics, 41(4): http://www.korpus.cz 579-623. 40