Tracing metaphors in time through self-distance in vector spaces Marco Del Tredici Malvina Nissim Andrea Zaninello ILLC, Univ. of Amsterdam CLCG, Univ. of Groningen Zanichelli editore Amsterdam, The Netherlands Groningen, The Netherlands Bologna, Italy marcodeltredici@gmail.com m.nissim@rug.nl azaninello@zanichelli.it Abstract example regarding food, use of medicines, and the like (Example 2).1 English. From a diachronic corpus of Ita- lian, we build consecutive vector spaces (1) (lit.) l’operazione [...] ha permesso di ar- in time and use them to compare a term’s restare un talebano esperto in esplosivi cosine similarity to itself in different time (2) (fig.) [...] senza l’atteso top player, e di un spans. We assume that a drop in simi- allenatore talebano della tattica larity might be related to the emergence of a metaphorical sense at a given time. If the metaphorical meaning becomes commonly Similarity-based observations are matched used, it might get recorded in reference dictionaries, to the actual year when a figurative mean- too. Indeed, for the case of “talebano” the Italian ing was documented in a reference dictio- dictionary Zingarelli (Zingarelli, 1993–2017) has nary and through manual inspection of cor- recorded the metaphorical extension (“che (o chi) pus occurrences. è dogmatico, integralista”) in the year 2009, while Italiano. Nel presente esperimento costru- until then only the literal meaning was included. iamo spazi vettoriali progressivi nel tempo Most of the computational work on metaphors su un corpus diacronico dell’italiano e has focused on their identification and interpreta- calcoliamo la distanza di alcuni termini tion using a variety of techniques and models, such rispetto a loro stessi in differenti periodi. as clustering (Shutova and Sun, 2013), LDA topic L’ipotesi è che un calo di similitudine possa modeling (Heintz et al., 2013), tree kernels (Hovy essere indicativo dell’acquisizione di un et al., 2013), but all from a purely synchronic per- significato metaforico. Tale ipotesi è valu- spective.2 The way metaphors develop across time, tata attraverso una risorsa lessicografica instead, and whether the shift of a word’s literal esterna e l’annotazione manuale dei con- meaning to a figurative one can be automatically testi dei termini nel corpus. detected and modelled is as of now a little investi- gated aspect. As a contribution in this sense, we build on the 1 Introduction basic observation that if a metaphorical meaning is acquired by a term at a certain point in time, the It is widely acknowledged that metaphors are per- context of use of that term will, at least partially, vasive in language use, and that their detection and change. In this paper we offer a proof of concept interpretation are crucial to language processing of this assumption, based on a selection of terms. (Group, 2007; Turney et al., 2011; Shutova, 2015). (Dis)similarity of contexts is measured relying on One tricky aspect related to metaphors is their the distributional semantics approach, and thus on dynamic nature: new metaphors are created all the terms’ vector representations, and the existence the time. For example, in recent years the Ital- of a metaphoric shift is derived from the Zingarelli ian term “talebano” (‘Taliban’), previously only dictionary of Italian. used to refer to the Islamic fundamentalist political 1 movement founded in the Nineties in Afghanistan All of the examples in this paper are from the newspaper la Repubblica, see Section 4.2. (Example 1), has come to define more generally 2 For a detailed survey on current NLP systems for someone who is extreme in his or her positions, for metaphor modeling see (Shutova, 2015). 2 Approach different techniques. Among these, most recently, Latent Semantic Analysis (Sagi et al., 2011; Ja- According to the principle of distributional seman- towt and Duh, 2014), topic clustering (Wijaya and tics, the meaning of a word is represented by vec- Yeniterzi, 2011) and dynamic topic modeling (Fr- tors that encode the contextual information of that ermann and Lapata, 2016). Vector representations word in a corpus (Turney et al., 2010). All vectors for diachronic shift of meaning have been used representing words are included in a distributional by Gulordava and Baroni (2011), with a simple co- semantic space in which similar words are repre- occurence matrix of target words and context terms. sented by vectors that are close in that space, while Jatowt and Duh (2014) and Xu and Kemp (2015) different words are distant. experimented both with a bag-of-words approach We rely on the intuition that if a term develops and a more linguistically motivated representation a metaphoric sense, its contexts of occurrence will that also captures the relative position of lexical start to differ, at least partially, from those observed items in relation to the target word. for the very same term at the time the metaphorical Recently, Word Embeddings (Mikolov and Dean meaning had not emerged yet. This implies that (2013), see also Section 4.3) have been used to detecting a distance in space across time could be investigate diachronic meaning shifts: vectors are indicative of a meaning shift. Hence, instead of usually created independently for each time span comparing different terms synchronically, we focus and then mapped from one year to another via a on their self-distance across time, thus tracing their transformation matrix, thus leveraging the stabil- diachronic evolution of meaning. ity of the relative positions of vectors in different Practically, we train vector representations of spaces (Kulkarni et al., 2015; Zhang et al., 2015; words in consecutive time spans, and compare such Hamilton et al., 2016). representations to one another, for a set of pilot An alternative approach, which we also adopt – terms. As a default, a term is expected to exhibit with a slight change – in our work, is introduced a vector representation roughly similar to itself by Kim et al. (2014), who propose a simple but across time. If we observe a drop in similarity effective methodology to make vectors trained on between vectors in consecutive spaces, we can hy- different corpora directly comparable: embeddings pothesise the emergence of a new sense for this created for year y are used to initialise the vectors term, potentially metaphoric. for year y +1. The process is progressively applied By using the information recorded for the se- to all time spans. lected terms in a reference dictionary for the Italian language, we observe whether there is some corre- 4 Experiment spondence between the observed similarity drop, if present, and the time of inclusion of a figurative Following the approach described in Section 2, we sense. Finally, for each year cluster, we manually selected a small set of pilot terms from a lexico- inspect the occurrences of our target terms in order graphic reference, and observed their space devel- to see if changes of use can be observed. opment across time, on a diachronic corpus for We are aware of the fact that changes in distance Italian that we collected for this purpose. Due to of a word to itself across time might be triggered by the absence of datasets in which words are anno- phenomena other than the rise of a metaphoric shift. tated for meaning change, a qualitative analysis of Indeed, especially for polysemous words, extra- a set of hand-selected words like the one we pro- linguistic factors could cause the dominance of one pose has established itself as a common evaluation sense over the others at a given time. In a larger- method in previous work on diachronic meaning scale, bottom-up approach to detect metaphorical change (Frermann and Lapata, 2016). shifts, this would need to be properly accounted for. In the context of this proof-of-concept, we control 4.1 Lexicographic reference and term for this factor by choosing words that are not or are selection minimally polysemous (see Section 4.1). The Zingarelli dictionary is a reference dictionary for the Italian language, updated and published 3 Related Work every year, both in digital and paper version. The The automatic modelling of diachronic shift of dictionary is traditionally dated one year ahead of meaning has been investigated employing several the year it is published, hence the Zingarelli 2017 Table 1: Selected terms. a-date = first attested; d-date = decision date for extended meaning to be included in dictionary; i-date = actual inclusion date in Zingarelli for extended meaning. term literal figurative a-date d-date i-date implosione implosion cedimento, tracollo improvviso (collapse) 1932 2013 2015 kamikaze kamikaze chi compie un’impresa rischiosa o destinata al 1944 2007 2009 fallimento (daredevil, reckless) rottamatore dismantler nel linguaggio giornalistico e della politica, 1990 2012 2014 chi si propone di allontanare e sostituire un gruppo dirigente considerato antiquato (new broom) talebano Taliban che (o chi) è dogmatico, integralista (hard- 1995 2007 2009 liner, extremist) tsunami tsunami evento che determina lo sconvolgimento di un 1907 2008 2010 assetto costituito (devastation, havoc) is published in June 2016, and it refers to decisions 4.2 Corpus about new words and new meanings (including We created a diachronic corpus of approximately metaphorical ones) made up until December 2015. 60 millions tokens by collecting articles from the We analysed the behaviour of a small set of terms Italian newspaper la Repubblica from 1984 (the extracted from the dictionary. We searched the first year for which data is available digitally) to 2017 edition to extract nouns that record a figura- 2015. All texts were tokenised and lowercased. tive meaning, limiting our search to words whose Because we are interested in how a term’s context first occurrence is recorded in the 20th or 21st cen- changes over time, we had to determine time-spans tury. Newly born words (including borrowings) are for our corpus, and we settled on two-year blocks, more likely to show a meaning shift in the time for a total of 16 time spans, the first one being 1984- span considered in our search (1984-2015) than 1985 and the last 2014-2015. These subcorpora are older words (especially if derived directly form used to train consecutive vector space models. Latin, where the figurative meaning was also origi- nally highly available, so probably arisen earlier). 4.3 Model Out of a total of 447 hits, five target words were We implemented vector representations using the chosen for this pilot study. They are reported in skip-gram architecture introduced by Mikolov and Table 1 together with relevant information. Dean (2013). Such representations (Word Embed- In order to minimise (at least in the context of dings) are low dimensional, dense and real-valued this experiment) the influence of polysemy in the vectors that have been proved to preserve syntac- observable similarity distance across years, we ver- tic and semantic information in several NLP tasks ified that the selected terms are not polysemous, (Baroni et al., 2014). or minimally so. For the words “rottamatore”, Vectors created on different corpora cannot be “talebano”, and “tsunami”, the Zingarelli records directly compared, since every semantic space im- one sense only. For the word “implosione” three plements arbitrary orthogonal transformations and senses in total are recorded, two of which are how- hence there is no direct correspondence between ever technical language, in the fields of linguistics word vectors in different semantic spaces (Zhang (phonology) and psychology, and we assume will et al., 2015). This would hold true also for our data, not be used much in newswire. For “kamikaze” since we create a different corpus for each time the Zingarelli records one meaning only (Japanese span. Therefore, in order to create comparable vec- pilot) to which is associated the extended sense of tor representations for each word in any time span, someone who kills himself in a terrorist attack; in we adopt the methodology introduced by Kim et al. our corpus the extended meaning is clearly the pri- (2014) (see Section 3), slightly modifying it. While mary one, and the figurative sense that we consider Kim et al. (2014) use vectors of span y to initialise is derived from it (see also Section 4.4). the vectors for year y + 1, we do the opposite, i.e. we start with 2014-15, and use those vectors to ini- and “implosione”, instead, show a more stable evo- tialise the 2012-13 time span, and thus backwards lution of meaning in time, with no clear drop in until 1984-85. cosine similarity, and thus no evident correlation This methodological choice is due to the fact between changes in vector representations and in- that the majority of the words in the set we con- sertion of a figurative meaning in dictionary. sidered for this experiment (included the selected For (ii), we manually inspected the contexts in target words, see 4.1) have few or no occurrences which target terms occur in the the corpus as literal in the first time spans of the corpus: for example, or metaphoric, in order to check if some relevant “rottamatore” and “talebano” occur for the first time change in words usage could be observed in cor- in 96/97. Indeed, using Kim et al. (2014)’s original respondence to drops in cosine similarity between approach, which we implemented in a preliminary time spans. experiment, the vectors for these words were cor- “Tsunami” occurs 27 times between 84/85 and rectly initialised, but were basically random vectors 02/03: in 88.9% of the cases the word is used liter- with no meaningful information. Conversely, our ally, with only 3 metaphorical uses in 98/99 (mir- reverse setting, while still offering the same oppor- rored in a slight drop in cosine similarity). Of the tunity to trace shifts of meaning across time, allows 930 occurrences from 04/05 to 14/15, only 59.1% to initialise all target words on a time span (14/15) are literal. In Figure 1 we can observe a major in which they occur a number of times sufficient to drop in cosine similarity exactly between 04/05 create a more stable, meaningful representation. and 06/06. Using the gensim library (Řehůřek and Sojka, “Rottamatore” occurs 4 times between 84/85 and 2010), we trained the models with the following 08/09, always used literally. From 10/11 on, there parameters: window size of 5, learning rate of 0.01 are 156 occurrences, all metaphorical. Thus, the and dimensionality of 200. We filtered out words drop corresponds to change in usage here too. with frequency lower than 5 occurrences. The vo- “Talebano” occurs 12 times between 84/85 and cabulary was initialised over the whole dataset. 02/03, with 83.3% of literal usage. Once again, the drop in cosine coincides with the time span in 4.4 Results and discussion which the term started to be used metaphorically: between 02/03 and 08/09 40% of the occurrences of Figure 1 shows the similarity values for one time “talebano” are metaphorical. Then, another relevant span to the next (dotted line), together with the av- drop is observed between 08/09 and 10/11, and erage shift of meaning of a subset of 5000 nouns this is due to the sudden return of the literal usage randomly selected (solid line). While we cannot of this word (86.1%), which continues also in the draw any statistically significant conclusions from following years. such little data, we aim at potentially observing pat- terns of shift of meaning through change of vector As already noticed, “kamikaze” and “implo- representations that could be used for developing sione” do not seem to undergo a clear shift. As predictive metrics of metaphorical shifts in time. for the former, the analysis of its contexts of use reveals that indeed it is not possible to clearly iden- We interpret the results of our models according tify, in our corpus, when exactly the term started to to (i) information in the Zingarelli dictionary and be used metaphorically: of the 25 occurrences of (ii) a manual inspection of the context of use of our “kamikaze” in 84/85, 32% are metaphorical. This target words in the corpus. trend is fairly constant, and it explains why the vec- For (i), we verify if, for a given term, an ob- tor representation of “kamikaze”, which from the servable correlation exists between changes in its very beginning conflates literal and metaphorical vector representations and the insertion of a figura- usages, is stable in time. There is only a relevant tive sense in the dictionary. Results show that such change starting from 10/11: from this period on- a correlation exists for “talebano”, “rottamatore”, wards, the metaphorical use decreases, and almost and “tsunami”. For these words a drop in cosine all the occurrences are literal.3 Accordingly, this similarity can be observed between three and five years before the insertion of the figurative meaning 3 Interestingly, this increase of literal usage is observed in in the dictionary. This fits well with the timing the same years also for “talebano”, a term that is semantically related to “kamikaze”. This observation would require further for new meanings to be recorded in lexicographic investigation in connection with the socio-political events of resources (see Section 4.1). The nouns “kamikaze” those time spans. Figure 1: Cosine similarity values across time spans for target words (dotted line), average similarity of nouns (solid line) and date of insertion of metaphorical meaning in the Zingarelli dictionary (red dot). almost exclusively return to the literal meaning cor- sine similarity of the term to itself across time. responds to a slight increase in cosine similarity Such assumption has been partially confirmed by between the two last time spans. the comparison to the Zingarelli dictionary, while “Implosione” occurs 433 times overall and in we found a more robust evidence when inspecting 92.4% of them is used metaphorically, but in few the terms’ contexts of use manually. and specific contexts. A metaphorical, quite spe- Future work will stem from methodology and cific, sense of “implosione” is thus the main sense observations discussed here. Specifically, we plan for this term in our corpus, and this is why we to investigate further several aspects of this initial observe, on average, a high similarity across time work, including the relation between changes in co- spans. There is only a small drop between 10/11 sine similarity and frequency of use of a word: to and 12/13, when the word started to be used in which extent a change of the former relates to an in- the context of the economical crisis (“l’implosione crease of the latter? Mostly though, we plan to run dell’euro”). experiments on larger sets of words with the aim To sum up, both “kamikaze” and “implosione” to consolidate and then further exploit the mainly show a similar stable behaviour in time, with only qualitative observations reported here towards the small drops. However, while for “kamikaze” such development of reliable predictive metrics which stability is due to a relatively constant ratio between can serve to detect the emergence of shifts automat- literal and metaphorical meanings, in the case of ically, in a completely bottom-up fashion. “implosione” the observed stability is given by the constant predominance of the metaphorical sense Acknowledgments across all the time spans. Malvina Nissim would like to thank the ILC-CNR ItaliaNLP Lab for their hospitality while working 5 Conclusion and future work on this project. We are also grateful to the anony- This work was meant as an exploration of the as- mous reviewers who provided insightful comments sumption that the emergence of the metaphorical that doubtlessly contributed to improve this paper. use of a term might be mirrored in changes in co- References T Mikolov and J Dean. 2013. Distributed representa- tions of words and phrases and their compositional- Marco Baroni, Georgiana Dinu, and Germán ity. Advances in neural information processing sys- Kruszewski. 2014. Don’t count, predict! a tems. systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings Radim Řehůřek and Petr Sojka. 2010. Software Frame- of the 52nd Annual Meeting of the Association work for Topic Modelling with Large Corpora. In for Computational Linguistics (Volume 1: Long Proceedings of the LREC 2010 Workshop on New Papers), pages 238–247, Baltimore, Maryland, June. Challenges for NLP Frameworks, pages 45–50, Val- Association for Computational Linguistics. letta, Malta, May. ELRA. Lea Frermann and Mirella Lapata. 2016. A bayesian Eyal Sagi, Stefan Kaufmann, and Brady Clark. 2011. model of diachronic meaning change. Transactions Tracing semantic change with latent semantic analy- of the Association for Computational Linguistics, sis. Current methods in historical semantics, pages 4:31–45. 161–183. Pragglejaz Group. 2007. MIP: A method for Ekaterina Shutova and Lin Sun. 2013. Unsupervised identifying metaphorically used words in discourse. metaphor identification using hierarchical graph fac- Metaphor and symbol, 22(1):1–39. torization clustering. In HLT-NAACL, pages 978– 988. Kristina Gulordava and Marco Baroni. 2011. A dis- tributional similarity approach to the detection of se- Ekaterina Shutova. 2015. Design and evaluation of mantic change in the Google books ngram corpus. metaphor processing systems. Computational Lin- In Proceedings of the GEMS 2011 Workshop on GE- guistics, 41(4):579–623. ometrical Models of Natural Language Semantics, pages 67–71. Association for Computational Lin- Peter D Turney, Patrick Pantel, et al. 2010. From guistics. frequency to meaning: Vector space models of se- mantics. Journal of artificial intelligence research, William L Hamilton, Jure Leskovec, and Dan Juraf- 37(1):141–188. sky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint Peter D Turney, Yair Neuman, Dan Assaf, and Yohai arXiv:1605.09096. Cohen. 2011. Literal and metaphorical sense iden- tification through concrete and abstract context. In Ilana Heintz, Ryan Gabbard, Mahesh Srinivasan, David Proceedings of the 2011 Conference on the Empiri- Barner, Donald S Black, Marjorie Freedman, and cal Methods in Natural Language Processing, pages Ralph Weischedel. 2013. Automatic extraction of 680–690. linguistic metaphor with LDA topic modeling. In Derry Tanti Wijaya and Reyyan Yeniterzi. 2011. Un- Proceedings of the First Workshop on Metaphor in derstanding semantic change of words over cen- NLP, pages 58–66. turies. In Proceedings of the 2011 international Dirk Hovy, Shashank Srivastava, Sujay Kumar Jauhar, workshop on DETecting and Exploiting Cultural di- Mrinmaya Sachan, Kartik Goyal, Huiying Li, Whit- versiTy on the social web, pages 35–40. ACM. ney Sanders, and Eduard Hovy. 2013. Identifying Y. Xu and C. Kemp. 2015. A computational evaluation metaphorical word use with tree kernels. In Pro- of two laws of semantic change. In Proceedings of ceedings of the First Workshop on Metaphor in NLP, the 37th Annual Conference of the Cognitive Science pages 52–57. Society. Adam Jatowt and Kevin Duh. 2014. A framework for Yating Zhang, Adam Jatowt, Sourav S Bhowmick, and analyzing semantic change of words across time. In Katsumi Tanaka. 2015. Omnia mutantur, nihil in- Proceedings of the 14th ACM/IEEE-CS Joint Con- terit: Connecting past with present by finding corre- ference on Digital Libraries, pages 229–238. IEEE sponding terms across time. In Proc. of ACL, pages Press. 645–655. Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, N. Zingarelli. 1993–2017. Lo Zingarelli - Vocabolario and Slav Petrov. 2014. Temporal analysis of lan- della lingua italiana. Zanichelli editore, Bologna. guage through neural language models. In Proceed- ings of the ACL 2014 Workshop on Language Tech- nologies and Computational Social Science, pages 61–65. Association for Computational Linguistics. Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Statistically significant detec- tion of linguistic change. In Proceedings of the 24th International Conference on World Wide Web, pages 625–635. ACM.