=Paper=
{{Paper
|id=Vol-1749/paper20
|storemode=property
|title=Tracing Metaphors in Time through Self–Distance in Vector Spaces
|pdfUrl=https://ceur-ws.org/Vol-1749/paper20.pdf
|volume=Vol-1749
|authors=Marco Del Tredici,Malvina Nissim,Andrea Zaninello
|dblpUrl=https://dblp.org/rec/conf/clic-it/TrediciNZ16
}}
==Tracing Metaphors in Time through Self–Distance in Vector Spaces==
<pdf width="1500px">https://ceur-ws.org/Vol-1749/paper20.pdf</pdf>
<pre>
        Tracing metaphors in time through self-distance in vector spaces

       Marco Del Tredici           Malvina Nissim        Andrea Zaninello
    ILLC, Univ. of Amsterdam  CLCG, Univ. of Groningen   Zanichelli editore
   Amsterdam, The Netherlands Groningen, The Netherlands  Bologna, Italy
marcodeltredici@gmail.com m.nissim@rug.nl azaninello@zanichelli.it


                      Abstract                           example regarding food, use of medicines, and the
                                                         like (Example 2).1
     English. From a diachronic corpus of Ita-
     lian, we build consecutive vector spaces            (1)       (lit.) l’operazione [...] ha permesso di ar-
     in time and use them to compare a term’s                      restare un talebano esperto in esplosivi
     cosine similarity to itself in different time
                                                         (2)       (fig.) [...] senza l’atteso top player, e di un
     spans. We assume that a drop in simi-
                                                                   allenatore talebano della tattica
     larity might be related to the emergence
     of a metaphorical sense at a given time.
                                                         If the metaphorical meaning becomes commonly
     Similarity-based observations are matched
                                                         used, it might get recorded in reference dictionaries,
     to the actual year when a figurative mean-
                                                         too. Indeed, for the case of “talebano” the Italian
     ing was documented in a reference dictio-
                                                         dictionary Zingarelli (Zingarelli, 1993–2017) has
     nary and through manual inspection of cor-
                                                         recorded the metaphorical extension (“che (o chi)
     pus occurrences.
                                                         è dogmatico, integralista”) in the year 2009, while
     Italiano. Nel presente esperimento costru-          until then only the literal meaning was included.
     iamo spazi vettoriali progressivi nel tempo             Most of the computational work on metaphors
     su un corpus diacronico dell’italiano e             has focused on their identification and interpreta-
     calcoliamo la distanza di alcuni termini            tion using a variety of techniques and models, such
     rispetto a loro stessi in differenti periodi.       as clustering (Shutova and Sun, 2013), LDA topic
     L’ipotesi è che un calo di similitudine possa      modeling (Heintz et al., 2013), tree kernels (Hovy
     essere indicativo dell’acquisizione di un           et al., 2013), but all from a purely synchronic per-
     significato metaforico. Tale ipotesi è valu-       spective.2 The way metaphors develop across time,
     tata attraverso una risorsa lessicografica          instead, and whether the shift of a word’s literal
     esterna e l’annotazione manuale dei con-            meaning to a figurative one can be automatically
     testi dei termini nel corpus.                       detected and modelled is as of now a little investi-
                                                         gated aspect.
                                                             As a contribution in this sense, we build on the
 1   Introduction                                        basic observation that if a metaphorical meaning
                                                         is acquired by a term at a certain point in time, the
 It is widely acknowledged that metaphors are per-       context of use of that term will, at least partially,
 vasive in language use, and that their detection and    change. In this paper we offer a proof of concept
 interpretation are crucial to language processing       of this assumption, based on a selection of terms.
 (Group, 2007; Turney et al., 2011; Shutova, 2015).      (Dis)similarity of contexts is measured relying on
    One tricky aspect related to metaphors is their      the distributional semantics approach, and thus on
 dynamic nature: new metaphors are created all           the terms’ vector representations, and the existence
 the time. For example, in recent years the Ital-        of a metaphoric shift is derived from the Zingarelli
 ian term “talebano” (‘Taliban’), previously only        dictionary of Italian.
 used to refer to the Islamic fundamentalist political
                                                             1
 movement founded in the Nineties in Afghanistan               All of the examples in this paper are from the newspaper
                                                         la Repubblica, see Section 4.2.
 (Example 1), has come to define more generally              2
                                                               For a detailed survey on current NLP systems for
 someone who is extreme in his or her positions, for     metaphor modeling see (Shutova, 2015).
2   Approach                                               different techniques. Among these, most recently,
                                                           Latent Semantic Analysis (Sagi et al., 2011; Ja-
According to the principle of distributional seman-
                                                           towt and Duh, 2014), topic clustering (Wijaya and
tics, the meaning of a word is represented by vec-
                                                           Yeniterzi, 2011) and dynamic topic modeling (Fr-
tors that encode the contextual information of that
                                                           ermann and Lapata, 2016). Vector representations
word in a corpus (Turney et al., 2010). All vectors
                                                           for diachronic shift of meaning have been used
representing words are included in a distributional
                                                           by Gulordava and Baroni (2011), with a simple co-
semantic space in which similar words are repre-
                                                           occurence matrix of target words and context terms.
sented by vectors that are close in that space, while
                                                           Jatowt and Duh (2014) and Xu and Kemp (2015)
different words are distant.
                                                           experimented both with a bag-of-words approach
   We rely on the intuition that if a term develops
                                                           and a more linguistically motivated representation
a metaphoric sense, its contexts of occurrence will
                                                           that also captures the relative position of lexical
start to differ, at least partially, from those observed
                                                           items in relation to the target word.
for the very same term at the time the metaphorical
                                                              Recently, Word Embeddings (Mikolov and Dean
meaning had not emerged yet. This implies that
                                                           (2013), see also Section 4.3) have been used to
detecting a distance in space across time could be
                                                           investigate diachronic meaning shifts: vectors are
indicative of a meaning shift. Hence, instead of
                                                           usually created independently for each time span
comparing different terms synchronically, we focus
                                                           and then mapped from one year to another via a
on their self-distance across time, thus tracing their
                                                           transformation matrix, thus leveraging the stabil-
diachronic evolution of meaning.
                                                           ity of the relative positions of vectors in different
   Practically, we train vector representations of
                                                           spaces (Kulkarni et al., 2015; Zhang et al., 2015;
words in consecutive time spans, and compare such
                                                           Hamilton et al., 2016).
representations to one another, for a set of pilot
                                                              An alternative approach, which we also adopt –
terms. As a default, a term is expected to exhibit
                                                           with a slight change – in our work, is introduced
a vector representation roughly similar to itself
                                                           by Kim et al. (2014), who propose a simple but
across time. If we observe a drop in similarity
                                                           effective methodology to make vectors trained on
between vectors in consecutive spaces, we can hy-
                                                           different corpora directly comparable: embeddings
pothesise the emergence of a new sense for this
                                                           created for year y are used to initialise the vectors
term, potentially metaphoric.
                                                           for year y +1. The process is progressively applied
   By using the information recorded for the se-
                                                           to all time spans.
lected terms in a reference dictionary for the Italian
language, we observe whether there is some corre-
                                                           4     Experiment
spondence between the observed similarity drop,
if present, and the time of inclusion of a figurative      Following the approach described in Section 2, we
sense. Finally, for each year cluster, we manually         selected a small set of pilot terms from a lexico-
inspect the occurrences of our target terms in order       graphic reference, and observed their space devel-
to see if changes of use can be observed.                  opment across time, on a diachronic corpus for
   We are aware of the fact that changes in distance       Italian that we collected for this purpose. Due to
of a word to itself across time might be triggered by      the absence of datasets in which words are anno-
phenomena other than the rise of a metaphoric shift.       tated for meaning change, a qualitative analysis of
Indeed, especially for polysemous words, extra-            a set of hand-selected words like the one we pro-
linguistic factors could cause the dominance of one        pose has established itself as a common evaluation
sense over the others at a given time. In a larger-        method in previous work on diachronic meaning
scale, bottom-up approach to detect metaphorical           change (Frermann and Lapata, 2016).
shifts, this would need to be properly accounted for.
In the context of this proof-of-concept, we control        4.1    Lexicographic reference and term
for this factor by choosing words that are not or are             selection
minimally polysemous (see Section 4.1).                    The Zingarelli dictionary is a reference dictionary
                                                           for the Italian language, updated and published
3   Related Work
                                                           every year, both in digital and paper version. The
The automatic modelling of diachronic shift of             dictionary is traditionally dated one year ahead of
meaning has been investigated employing several            the year it is published, hence the Zingarelli 2017
Table 1: Selected terms. a-date = first attested; d-date = decision date for extended meaning to be
included in dictionary; i-date = actual inclusion date in Zingarelli for extended meaning.

 term           literal       figurative                                     a-date        d-date     i-date
 implosione     implosion     cedimento, tracollo improvviso (collapse)       1932          2013       2015
 kamikaze       kamikaze      chi compie un’impresa rischiosa o destinata al 1944           2007       2009
                              fallimento (daredevil, reckless)
 rottamatore    dismantler    nel linguaggio giornalistico e della politica, 1990           2012      2014
                              chi si propone di allontanare e sostituire un
                              gruppo dirigente considerato antiquato (new
                              broom)
 talebano       Taliban       che (o chi) è dogmatico, integralista (hard- 1995            2007      2009
                              liner, extremist)
 tsunami        tsunami       evento che determina lo sconvolgimento di un 1907             2008      2010
                              assetto costituito (devastation, havoc)


is published in June 2016, and it refers to decisions   4.2   Corpus
about new words and new meanings (including
                                                        We created a diachronic corpus of approximately
metaphorical ones) made up until December 2015.
                                                        60 millions tokens by collecting articles from the
   We analysed the behaviour of a small set of terms    Italian newspaper la Repubblica from 1984 (the
extracted from the dictionary. We searched the          first year for which data is available digitally) to
2017 edition to extract nouns that record a figura-     2015. All texts were tokenised and lowercased.
tive meaning, limiting our search to words whose        Because we are interested in how a term’s context
first occurrence is recorded in the 20th or 21st cen-   changes over time, we had to determine time-spans
tury. Newly born words (including borrowings) are       for our corpus, and we settled on two-year blocks,
more likely to show a meaning shift in the time         for a total of 16 time spans, the first one being 1984-
span considered in our search (1984-2015) than          1985 and the last 2014-2015. These subcorpora are
older words (especially if derived directly form        used to train consecutive vector space models.
Latin, where the figurative meaning was also origi-
nally highly available, so probably arisen earlier).    4.3   Model
Out of a total of 447 hits, five target words were      We implemented vector representations using the
chosen for this pilot study. They are reported in       skip-gram architecture introduced by Mikolov and
Table 1 together with relevant information.             Dean (2013). Such representations (Word Embed-
   In order to minimise (at least in the context of     dings) are low dimensional, dense and real-valued
this experiment) the influence of polysemy in the       vectors that have been proved to preserve syntac-
observable similarity distance across years, we ver-    tic and semantic information in several NLP tasks
ified that the selected terms are not polysemous,       (Baroni et al., 2014).
or minimally so. For the words “rottamatore”,              Vectors created on different corpora cannot be
“talebano”, and “tsunami”, the Zingarelli records       directly compared, since every semantic space im-
one sense only. For the word “implosione” three         plements arbitrary orthogonal transformations and
senses in total are recorded, two of which are how-     hence there is no direct correspondence between
ever technical language, in the fields of linguistics   word vectors in different semantic spaces (Zhang
(phonology) and psychology, and we assume will          et al., 2015). This would hold true also for our data,
not be used much in newswire. For “kamikaze”            since we create a different corpus for each time
the Zingarelli records one meaning only (Japanese       span. Therefore, in order to create comparable vec-
pilot) to which is associated the extended sense of     tor representations for each word in any time span,
someone who kills himself in a terrorist attack; in     we adopt the methodology introduced by Kim et al.
our corpus the extended meaning is clearly the pri-     (2014) (see Section 3), slightly modifying it. While
mary one, and the figurative sense that we consider     Kim et al. (2014) use vectors of span y to initialise
is derived from it (see also Section 4.4).              the vectors for year y + 1, we do the opposite, i.e.
we start with 2014-15, and use those vectors to ini-     and “implosione”, instead, show a more stable evo-
tialise the 2012-13 time span, and thus backwards        lution of meaning in time, with no clear drop in
until 1984-85.                                           cosine similarity, and thus no evident correlation
   This methodological choice is due to the fact         between changes in vector representations and in-
that the majority of the words in the set we con-        sertion of a figurative meaning in dictionary.
sidered for this experiment (included the selected          For (ii), we manually inspected the contexts in
target words, see 4.1) have few or no occurrences        which target terms occur in the the corpus as literal
in the first time spans of the corpus: for example,      or metaphoric, in order to check if some relevant
“rottamatore” and “talebano” occur for the first time    change in words usage could be observed in cor-
in 96/97. Indeed, using Kim et al. (2014)’s original     respondence to drops in cosine similarity between
approach, which we implemented in a preliminary          time spans.
experiment, the vectors for these words were cor-           “Tsunami” occurs 27 times between 84/85 and
rectly initialised, but were basically random vectors    02/03: in 88.9% of the cases the word is used liter-
with no meaningful information. Conversely, our          ally, with only 3 metaphorical uses in 98/99 (mir-
reverse setting, while still offering the same oppor-    rored in a slight drop in cosine similarity). Of the
tunity to trace shifts of meaning across time, allows    930 occurrences from 04/05 to 14/15, only 59.1%
to initialise all target words on a time span (14/15)    are literal. In Figure 1 we can observe a major
in which they occur a number of times sufficient to      drop in cosine similarity exactly between 04/05
create a more stable, meaningful representation.         and 06/06.
   Using the gensim library (Řehůřek and Sojka,          “Rottamatore” occurs 4 times between 84/85 and
2010), we trained the models with the following          08/09, always used literally. From 10/11 on, there
parameters: window size of 5, learning rate of 0.01      are 156 occurrences, all metaphorical. Thus, the
and dimensionality of 200. We filtered out words         drop corresponds to change in usage here too.
with frequency lower than 5 occurrences. The vo-            “Talebano” occurs 12 times between 84/85 and
cabulary was initialised over the whole dataset.         02/03, with 83.3% of literal usage. Once again,
                                                         the drop in cosine coincides with the time span in
4.4   Results and discussion                             which the term started to be used metaphorically:
                                                         between 02/03 and 08/09 40% of the occurrences of
Figure 1 shows the similarity values for one time
                                                         “talebano” are metaphorical. Then, another relevant
span to the next (dotted line), together with the av-
                                                         drop is observed between 08/09 and 10/11, and
erage shift of meaning of a subset of 5000 nouns
                                                         this is due to the sudden return of the literal usage
randomly selected (solid line). While we cannot
                                                         of this word (86.1%), which continues also in the
draw any statistically significant conclusions from
                                                         following years.
such little data, we aim at potentially observing pat-
terns of shift of meaning through change of vector          As already noticed, “kamikaze” and “implo-
representations that could be used for developing        sione” do not seem to undergo a clear shift. As
predictive metrics of metaphorical shifts in time.       for the former, the analysis of its contexts of use
                                                         reveals that indeed it is not possible to clearly iden-
   We interpret the results of our models according
                                                         tify, in our corpus, when exactly the term started to
to (i) information in the Zingarelli dictionary and
                                                         be used metaphorically: of the 25 occurrences of
(ii) a manual inspection of the context of use of our
                                                         “kamikaze” in 84/85, 32% are metaphorical. This
target words in the corpus.
                                                         trend is fairly constant, and it explains why the vec-
   For (i), we verify if, for a given term, an ob-
                                                         tor representation of “kamikaze”, which from the
servable correlation exists between changes in its
                                                         very beginning conflates literal and metaphorical
vector representations and the insertion of a figura-
                                                         usages, is stable in time. There is only a relevant
tive sense in the dictionary. Results show that such
                                                         change starting from 10/11: from this period on-
a correlation exists for “talebano”, “rottamatore”,
                                                         wards, the metaphorical use decreases, and almost
and “tsunami”. For these words a drop in cosine
                                                         all the occurrences are literal.3 Accordingly, this
similarity can be observed between three and five
years before the insertion of the figurative meaning        3
                                                              Interestingly, this increase of literal usage is observed in
in the dictionary. This fits well with the timing        the same years also for “talebano”, a term that is semantically
                                                         related to “kamikaze”. This observation would require further
for new meanings to be recorded in lexicographic         investigation in connection with the socio-political events of
resources (see Section 4.1). The nouns “kamikaze”        those time spans.
Figure 1: Cosine similarity values across time spans for target words (dotted line), average similarity of
nouns (solid line) and date of insertion of metaphorical meaning in the Zingarelli dictionary (red dot).


almost exclusively return to the literal meaning cor-     sine similarity of the term to itself across time.
responds to a slight increase in cosine similarity        Such assumption has been partially confirmed by
between the two last time spans.                          the comparison to the Zingarelli dictionary, while
   “Implosione” occurs 433 times overall and in           we found a more robust evidence when inspecting
92.4% of them is used metaphorically, but in few          the terms’ contexts of use manually.
and specific contexts. A metaphorical, quite spe-            Future work will stem from methodology and
cific, sense of “implosione” is thus the main sense       observations discussed here. Specifically, we plan
for this term in our corpus, and this is why we           to investigate further several aspects of this initial
observe, on average, a high similarity across time        work, including the relation between changes in co-
spans. There is only a small drop between 10/11           sine similarity and frequency of use of a word: to
and 12/13, when the word started to be used in            which extent a change of the former relates to an in-
the context of the economical crisis (“l’implosione       crease of the latter? Mostly though, we plan to run
dell’euro”).                                              experiments on larger sets of words with the aim
   To sum up, both “kamikaze” and “implosione”            to consolidate and then further exploit the mainly
show a similar stable behaviour in time, with only        qualitative observations reported here towards the
small drops. However, while for “kamikaze” such           development of reliable predictive metrics which
stability is due to a relatively constant ratio between   can serve to detect the emergence of shifts automat-
literal and metaphorical meanings, in the case of         ically, in a completely bottom-up fashion.
“implosione” the observed stability is given by the
constant predominance of the metaphorical sense           Acknowledgments
across all the time spans.                                Malvina Nissim would like to thank the ILC-CNR
                                                          ItaliaNLP Lab for their hospitality while working
5   Conclusion and future work
                                                          on this project. We are also grateful to the anony-
This work was meant as an exploration of the as-          mous reviewers who provided insightful comments
sumption that the emergence of the metaphorical           that doubtlessly contributed to improve this paper.
use of a term might be mirrored in changes in co-
References                                                  T Mikolov and J Dean. 2013. Distributed representa-
                                                              tions of words and phrases and their compositional-
Marco Baroni, Georgiana Dinu, and Germán                     ity. Advances in neural information processing sys-
 Kruszewski. 2014. Don’t count, predict! a                    tems.
 systematic comparison of context-counting vs.
 context-predicting semantic vectors. In Proceedings        Radim Řehůřek and Petr Sojka. 2010. Software Frame-
 of the 52nd Annual Meeting of the Association                work for Topic Modelling with Large Corpora. In
 for Computational Linguistics (Volume 1: Long                Proceedings of the LREC 2010 Workshop on New
 Papers), pages 238–247, Baltimore, Maryland, June.           Challenges for NLP Frameworks, pages 45–50, Val-
 Association for Computational Linguistics.                   letta, Malta, May. ELRA.
Lea Frermann and Mirella Lapata. 2016. A bayesian           Eyal Sagi, Stefan Kaufmann, and Brady Clark. 2011.
  model of diachronic meaning change. Transactions            Tracing semantic change with latent semantic analy-
  of the Association for Computational Linguistics,           sis. Current methods in historical semantics, pages
  4:31–45.                                                    161–183.

Pragglejaz Group.    2007.     MIP: A method for            Ekaterina Shutova and Lin Sun. 2013. Unsupervised
  identifying metaphorically used words in discourse.         metaphor identification using hierarchical graph fac-
  Metaphor and symbol, 22(1):1–39.                            torization clustering. In HLT-NAACL, pages 978–
                                                              988.
Kristina Gulordava and Marco Baroni. 2011. A dis-
  tributional similarity approach to the detection of se-   Ekaterina Shutova. 2015. Design and evaluation of
  mantic change in the Google books ngram corpus.             metaphor processing systems. Computational Lin-
  In Proceedings of the GEMS 2011 Workshop on GE-             guistics, 41(4):579–623.
  ometrical Models of Natural Language Semantics,
  pages 67–71. Association for Computational Lin-           Peter D Turney, Patrick Pantel, et al. 2010. From
  guistics.                                                   frequency to meaning: Vector space models of se-
                                                              mantics. Journal of artificial intelligence research,
William L Hamilton, Jure Leskovec, and Dan Juraf-             37(1):141–188.
  sky. 2016. Diachronic word embeddings reveal
  statistical laws of semantic change. arXiv preprint       Peter D Turney, Yair Neuman, Dan Assaf, and Yohai
  arXiv:1605.09096.                                           Cohen. 2011. Literal and metaphorical sense iden-
                                                              tification through concrete and abstract context. In
Ilana Heintz, Ryan Gabbard, Mahesh Srinivasan, David          Proceedings of the 2011 Conference on the Empiri-
   Barner, Donald S Black, Marjorie Freedman, and             cal Methods in Natural Language Processing, pages
   Ralph Weischedel. 2013. Automatic extraction of            680–690.
   linguistic metaphor with LDA topic modeling. In
                                                            Derry Tanti Wijaya and Reyyan Yeniterzi. 2011. Un-
   Proceedings of the First Workshop on Metaphor in
                                                              derstanding semantic change of words over cen-
   NLP, pages 58–66.
                                                              turies. In Proceedings of the 2011 international
Dirk Hovy, Shashank Srivastava, Sujay Kumar Jauhar,           workshop on DETecting and Exploiting Cultural di-
  Mrinmaya Sachan, Kartik Goyal, Huiying Li, Whit-            versiTy on the social web, pages 35–40. ACM.
  ney Sanders, and Eduard Hovy. 2013. Identifying           Y. Xu and C. Kemp. 2015. A computational evaluation
  metaphorical word use with tree kernels. In Pro-             of two laws of semantic change. In Proceedings of
  ceedings of the First Workshop on Metaphor in NLP,           the 37th Annual Conference of the Cognitive Science
  pages 52–57.                                                 Society.
Adam Jatowt and Kevin Duh. 2014. A framework for            Yating Zhang, Adam Jatowt, Sourav S Bhowmick, and
  analyzing semantic change of words across time. In          Katsumi Tanaka. 2015. Omnia mutantur, nihil in-
  Proceedings of the 14th ACM/IEEE-CS Joint Con-              terit: Connecting past with present by finding corre-
  ference on Digital Libraries, pages 229–238. IEEE           sponding terms across time. In Proc. of ACL, pages
  Press.                                                      645–655.
Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde,         N. Zingarelli. 1993–2017. Lo Zingarelli - Vocabolario
  and Slav Petrov. 2014. Temporal analysis of lan-            della lingua italiana. Zanichelli editore, Bologna.
  guage through neural language models. In Proceed-
  ings of the ACL 2014 Workshop on Language Tech-
  nologies and Computational Social Science, pages
  61–65. Association for Computational Linguistics.

Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and
  Steven Skiena. 2015. Statistically significant detec-
  tion of linguistic change. In Proceedings of the 24th
  International Conference on World Wide Web, pages
  625–635. ACM.

</pre>