Panta rei: Tracking Semantic Change with Distributional Semantics in
                           Ancient Greek

    Martina A. Rodda                    Marco S.G. Senaldi               Alessandro Lenci
 Scuola Normale Superiore            Scuola Normale Superiore               CoLing Lab
   Piazza dei Cavalieri, 7             Piazza dei Cavalieri, 7           Università di Pisa
    56126 Pisa – ITALY                  56126 Pisa – ITALY                via S. Maria 36
  martina.rodda@sns.it                 marco.senaldi@sns.it         alessandro.lenci@unipi.it


                                                   1    Introduction and Related Work
                  Abstract
                                                   Distributional Semantics is grounded on the as-
  English. We present a method to explore          sumption that the meaning of a word can be de-
  semantic change as a function of varia-          scribed as a function of its collocates in a corpus.
  tion in distributional semantic spaces. In       This suggests that diachronic meaning shifts can
  this paper we apply this approach to au-         be traced through changes in the distribution of
  tomatically identify the areas of semantic       these collocates over time (Sagi et al., 2011).
  change in the lexicon of Ancient Greek           While some studies focused on testing the ex-
  between the pre-Christian and Christian          planatory power of this method over frequency-
  era. Distributional Semantic Models are          and syntax-based approaches (Wijaya and Ye-
  used to identify meaningful clusters and         niterzi, 2011; Kulkarni et al., 2015), more ad-
  patterns of semantic shift within a set of       vanced contributions to the field explored how
  target words, defined through a purely           distributional models can be used to test compet-
  data-driven approach. The results empha-         ing hypotheses about semantic change (Xu and
  size the role played by the diffusion of         Kemp, 2015), or to investigate the productivity
  Christianity and by technical languages          of constructions in diachrony (Perek, 2016). The
  in determining semantic change in An-            results attest the explanatory power of distribu-
  cient Greek and show the potentialities of       tional methods in modeling diachronic shifts in
  distributional models in diachronic se-          meaning.
  mantics.                                            In this paper, we propose a method to identify
                                                   semantic change through the Representational
  Italiano. Si presenta un metodo per in-          Similarity Analysis (RSA; Kriegeskorte and
  dagare il cambiamento semantico come             Kievit, 2013) of distributional vector spaces built
  funzione della variazione all’interno di         from diachronic corpora. RSA is a method exten-
  spazi semantici. Questo approccio è ap-          sively used in neuroscience to test cognitive and
  plicato per identificare automaticamente         computational models by comparing the geome-
  aree di cambiamento semantico nel lessi-         try of their representation spaces (Edelman,
  co greco antico tra età pre-cristiana e          1998). Stimuli are represented with a representa-
  cristiana. Modelli della Semantica Di-           tional dissimilarity matrix that contains a meas-
  stribuzionale sono usati per identificare        ure of the dissimilarity relations of the stimuli
  cluster e pattern di cambiamento seman-          with each other. Different matrices are compared
  tico in una lista di parole target, definita     to evaluate the correspondence of the representa-
  con un approccio puramente data-driven.          tional spaces built from different sources (e.g.,
  I risultati mostrano il ruolo della diffu-       behavioral and neuroimaging data). We argue
  sione del Cristianesimo e dei linguaggi          that this method can be applied to compare dis-
  tecnici nel determinare cambiamenti se-          tributional representations of the lexicon at dif-
  mantici in greco antico, nonché le poten-        ferent temporal stages. The hypothesis is that the
  zialità dei modelli distribuzionali nella        elements in the lexical spaces showing larger ge-
  semantica diacronica.                            ometrical variations in time correspond to the
                                                   lexical areas that have undergone major semantic
changes. To the best of our knowledge, this is the     occurrences were computed within a window of
first time RSA is used in diachronic distribution-     11 words (5 content words to the right and to the
al semantics.                                          left of each target word). Association scores were
   Here we present a case study that applies RSA       weighted using positive point-wise mutual in-
to track patterns of semantic change within the        formation (PPMI) (Evert, 2008); the resulting
lexicon of Ancient Greek. We focus on the first        matrices were reduced to 300 latent dimensions
few centuries AD, when the rise of Christianity        using Singular Value Decomposition (SVD).
caused a deep and widespread cultural shift with-
in the Hellenic world. We predict that this shift      2.1    RSA of the distributional vector spaces
will be reflected in the Greek lexicon of the time.    We have adapted the RSA method to discover
In addition to past studies (Boschetti, 2009;          semantic changes between the two vector spaces:
O’Donnell, 2005 is a general introduction), we         1. we identified the words occurring in both sub-
apply a bottom-up approach to the detection of         corpora with a frequency higher than 100 tokens,
semantic change, with no prior definition of a list    obtaining 3,977 lemmas;
of lemmas to be analyzed. The goal is to develop       2. we built a representational similarity matrix
a quantitative “discovery procedure” to detect         (RSM) from the BC-Space (RSMBC) and one
lexical semantic changes.                              from the AD-Space (RSMAD). Each RSM is a
   From a methodological standpoint, this study        square matrix indexed horizontally and vertically
aims to show how Distributional Semantics can          by the 3,977 lemmas and containing in each cell
be applied fruitfully to such a small and literary     the cosine similarity of a lemma with the other
corpus as the collection of Ancient Greek texts.       lemmas in a vector space (this is a minor varia-
The results will also highlight the ways in which      tion with respect to the original RSA method,
Distributional Semantics can complement the in-        which instead uses dissimilarity matrices). A
tuition of the researcher in analyzing semantic        RSM is a global representation of the semantic
change in Ancient Greek, providing a useful tool       space geometry in a given period: vectors repre-
for future studies in Classics.                        sent lemmas in terms of their position relative to
                                                       the other lemmas in the semantic space;
2    Materials and Methods                             3. for each lemma, we computed the Pearson cor-
The corpus used for this study is based on the         relation coefficient between its vector in RSMBC
TLG-E (Thesaurus Linguae Graecae) collection           and the corresponding vector in RSMAD.
of Ancient Greek literary texts. The database was      The Pearson coefficient measures the degree of
divided into two sub-corpora, the first of which       semantic shift across the two temporal slices.
contains texts from the 7th to the 1st century BC      The lower the correlation, the more a word
(pre-Christian era), while the second one spans        changed its meaning.
from the 1st to the 5th century AD (early Christian
                                                       3     Discussion of Results
era). The pre-Christian sub-corpus contains
6,795,253 tokens, while the Christian sub-corpus       The following section focuses on the words that
totalizes 29,051,269 tokens.                           underwent the biggest changes, i.e. those for
   The texts were lemmatized using Morpheus            which the correlation scores are lower. The pri-
(Crane, 1991). Any issues with the lemmatization       mary goal will be to establish whether these
should not have a significant impact on the re-        words can be clustered into meaningful groups.
sults unless otherwise stated (cf. Boschetti, 2009,    This would allow us to pinpoint the areas within
page 60 for a discussion). After filtering for stop-   the lexicon of Ancient Greek that have under-
words (mainly particles, pronouns and connec-          gone a significant semantic shift during the early
tives) and lemmas occurring with a frequency           centuries of Christianity.
below 100 tokens, the pre-Christian and Chris-
tian sub-corpus contain, respectively, 4,109 and       3.1    Qualitative Analysis
10,052 lemmas, which were used both as targets         The 50 lemmas with the lowest correlation coef-
and dimensions in our vector spaces.                   ficients were scrutinized in order to establish
   A vector space model was then built for each        whether meaningful subgroups emerge. (This list
sub-corpus using the DISSECT toolkit (Dinu et          of words is not reproduced here due to space
al., 2013). Henceforth, we refer to the pre-           constraints. They are a subset of the 200 words
Christian era model as the BC-Space, and to the        used to build the plot in section 4.3.) The find-
Cristian era model as the AD-Space. Co-                ings in this section, while inevitably limited by
the intuition of the researcher, will provide the      from military terms such as πολιορκία (poliorkía
starting point for a more sophisticated analysis to    “siege”) and στρατόπεδον (stratópedon “en-
be performed in the following sections.                campment, army”) to the physical and philosoph-
   The lemmas under consideration form a               ical domain, with the closest term being ἐνέργεια
somewhat heterogeneous collection, including           (enérgeia “activity, actuality”, an antonym of
concrete nouns and relatively common verbs             δύναμις in its philosophical sense of “potentiali-
such as ζυγόν (zygón “yoke”) and ἕπομαι (hé-           ty”). The case of δύναμις also shows how nearest
pomai “follow”), as well as some proper nouns.         neighbor analysis can reveal shifts in the usage
This notwithstanding, a promising subset of            of heavily polysemous words.
words emerges even at this preliminary stage.             Not all changes observed through the analysis
These are a number of nouns designating emi-           of nearest neighbors, however, are so easily pre-
nently Christian concepts, such as παραβολή            dictable. Thus, for instance, the neighbors for
(parabolé “parable”, previously “comparison”),         μοῖρα (môira, another highly polysemous word
λαός (laós, used for the Christian community as        with meanings spanning from “part” to “desti-
opposed to non-Christians, previously “people”),       ny”) in the AD-Space come exclusively from the
κτίσις (ktísis “creation”, previously “founding,       domain of astronomy, showing a strong speciali-
settling”).                                            zation towards a technical usage (“degree” or
   These findings are in line with the idea that the   “division” of the Zodiac). Another remarkable
diffusion of Christianity played a substantial role    result comes from a geographical adjective,
in semantic change in the first centuries AD (cf.      Ποντικός (Pontikós “coming from Pontus”),
Boschetti, 2009). Other Christian terms, such as       whose nearest neighbors shift from proper names
θεός (theós “God”), ἄγγελος (ángelos “angel”,          and philosophical terms in the pre-Christian age
previously “messenger”), πατήρ (patér “father”),       (an association due, without doubt, to the usage
υἱός (hyiós “son”), also occur among the 100           of “Ponticus” as an epithet for authors, e.g. Her-
words with the lowest correlation coefficients.        aclides) to names of currency and trade wares,
   Another group of lemmas comprises technical         probably as a reflection of the integration of Pon-
terms whose usage seems to have undergone a            tus as a Roman province (with the obvious reper-
specialization or a shift from one domain of           cussions on trade) in the 1st century AD.
knowledge to another. These include words such
as ὑπόστασις (hypóstasis “substance”, previously       3.3   t-SNE Plot
“sediment, foundation”), δύναμις (dýnamis              As a final analysis, we embedded the RSMAD
“property (of beings)”, previously “power”), or        vectors for the 200 words with the lowest corre-
ῥητός (rhetós “literal” as opposed to “allegori-       lation coefficient with the corresponding RSMBC
cal”, previously “stated”).                            vectors in a two-dimensional space with t-SNE
                                                       (Figure 1), a technique for dimensionality reduc-
3.2   Analysis of Nearest Neighbors
                                                       tion and data visualization that overcomes some
To corroborate the intuitions detailed above, the      of the limitations of standard multidimensional
10 nearest neighbors for each of the last 50           scaling (van der Maaten and Hinton, 2008). This
words according to the correlation coefficient         procedure allows for easy identification of clus-
were retrieved using DISSECT. The process was          ters, thus revealing the semantic relation between
repeated for each sub-corpus and the results           the most recent meanings of the words that un-
compared in order to look for visible shifts, es-      derwent the greatest semantic change.
pecially those involving different semantic do-           A number of small clusters can be observed in
mains. A few examples of the results should suf-       the plot. Near the left periphery, the most rele-
fice to confirm the findings in the last section.      vant group is composed of terms pertaining to
   For instance, among the nearest neighbors for       Christian theology (from κύριος kýrios “Lord”,
πνεῦμα (pnêuma “spirit”, previously “breath”) in       λαός and θεός, to παρουσία parousía “Advent”
the AD-Space we find such words as θεάομαι             and ποιμήν poimén “shepherd”). The position of
(theáomai “contemplate”), ἀληθινός (alethinós          ψῦχος (psŷkhos “cold”) nearby is due to the mis-
“true”), κτίσις, υἱός, θεός and so forth, while in     lemmatization of some inflected forms of ψυχή
the BC-Space the strongest similarity is with          (psyché “soul”) under this lemma, as revealed by
terms pertaining to the domain of physics, such        nearest neighbor analysis. To the left of this
as ἀήρ (aér “air”), ὑγρός (hygrós “moist”),            group, a small cluster of terms pertaining to
θερμός (thermós “hot”). Another clear-cut exam-        Christian exegesis (ῥητός, παραβολή, διασαφέω
ple is that of δύναμις, whose neighbors change         diasaphéo “illustrate”) can be recognized.
 Figure 1. Relative positions within the AD-Space of the 200 words with the lowest correlation scores.
 Dimensionality reduction was performed using t-SNE (van der Maaten and Hinton, 2008).
                                                       (in a broader sense) and/or technical language.
   The upper portion of the plot houses technical      Within these domains, some more fine-grained
terms from the domains of medicine (the upper-         relations between words that underwent signifi-
most groups), astronomy and geometry, while            cant semantic shifts can be observed.
philosophical terminology is found in the outer
right area. Some smaller groups are also noticea-      4   Conclusion
ble, such as μνᾶ (mnâ “mina”) and δραχμή
                                                       This paper shows how Distributional Semantics
(drakhmé “drachma”), both units of currency, on
                                                       can be used as an exploratory tool to detect se-
the left, and πρώτιστος (prótistos “the very first”)
                                                       mantic change. In this case study on Ancient
and Τίμαιος (the proper name Tímaios, Latin
                                                       Greek, the proposed method based on distribu-
Timaeus), both connected to (Neo-)Platonic phi-
                                                       tional RSA not only confirms the hypothesis that
losophy, on the right.
                                                       the diffusion of Christianity was a crucial cause
   All in all, despite a certain amount of noise,
                                                       of semantic change in the Greek lexicon, but also
the plot in Figure 1 supports the findings detailed
                                                       allows for the identification of unexpected pat-
so far. We can see how the main semantic chang-
                                                       terns of evolution, such as the apparent speciali-
es in the Greek lexicon between the pre-Christian
                                                       zation in the usage of technical terms. This last
and Christian era affected the domains of religion
                                                       phenomenon could also be influenced by the fact
that the AD-corpus is richer in philosophical and          Kriegeskorte, Nikolaus and Roger A. Kievit. 2013.
technical treatises; however, a documented                   Representational geometry: integrating cognition,
change in the proportion of different possible us-           computation, and the brain. Trends in Cognitive
ages of a word is in itself a very informative re-           Sciences, 17(8):401–412.
sult, especially in a field such as Classics, where        Kulkarni, Vivek, Rami Al-Rfou, Bryan Perozzi and
the analysis of (literary) texts is paramount. Fur-          Steven Skiena. 2015. Statistically significant detec-
ther research should undoubtedly highlight the               tion of linguistic change. In Proceedings of the
effect of corpus composition. A focus on shorter             24th International Conference on World Wide Web
periods of time might be of interest, since, for             (WWW ‘15), pages 625–635, Firenze.
instance, the rise of technical prose writing is a         Van der Maaten, Laurens and Geoffrey Hinton. 2008.
characteristic of the Hellenistic Age (cf. e.g.              Visualizing data using t-SNE. Journal of Machine
Gutzwiller 2007, pages 154-167).                             Learning Research, 9:2579–2605.
   From a methodological standpoint, the fact              O’Donnell, Matthew Brook. 2005. Corpus Linguistics
that the results obtained from such a small corpus           and the Greek of the New Testament (New Testa-
of purely literary texts are both meaningful and             ment Monographs, 6). Sheffield Phoenix Press,
informative is of great relevance. Furthermore,              Sheffield.
the choice to adopt a data-driven approach                 Perek, Florent. 2016. Using distributional semantics
proved fruitful, in that it brought to light direc-          to study syntactic productivity in diachrony: A case
tions of change that were not expected a priori.             study. Linguistics, 54(1):149–188.
For traditional research in Classics, a computa-
                                                           Sagi, Eyal, Stefan Kaufmann and Brady Clark. 2011.
tional approach to the lexicon of Ancient Greek
                                                             Tracing semantic change with Latent Semantic
is compelling because it provides new infor-                 Analysis. In Kathryin Allan and Justyna A. Robin-
mation about a language for which the judgments              son, editors, Current Methods in Historical Seman-
of native speakers are unavailable (cf. Perek,               tics, pages 161–183, Boston, MA.
2016). The results of this study show how Distri-
                                                           Wijaya, Derry Tanti and Reyyan Yeniterzi. 2011. Un-
butional Semantics can complement the asser-
                                                             derstanding semantic change of words over centu-
tions of the philologist, as well as help discover           ries. In Proceedings of the 2011 International
patterns of lexical change that would otherwise              Workshop on DETecting and Exploiting Cultural
be impossible to grasp beyond an intuitive level.            diversiTy on the Social Web (DETECT ‘11), pages
                                                             35–40, Glasgow.
References
                                                           Xu, Yang and Charles Kemp. 2015. A computational
Boschetti, Federico. 2009. A Corpus-based Approach           evaluation of two laws of semantic change. In Pro-
  to Philological Issues. PhD Thesis, University of          ceedings of the 37th Annual Meeting of the Cogni-
  Trento, Trento.                                            tive Science Society (CogSci 2015), Pasadena, CA.
Crane, Gregory. 1991. Generating and parsing Classi-
  cal Greek. Literary and Linguistic Computing,
  6(4):243–245.
Dinu, Georgiana, Nghia The Pham and Marco Baroni.
  2013. DISSECT – DIStributional SEmantics Com-
  position Toolkit. In Proceedings of the 51st Annual
  Meeting of the Association for Computational Lin-
  guistics: System Demonstrations, pages 31–36, So-
  fia.
Edelman, Shimon. 1998. Representation is representa-
  tion of similarities. Behavioral and Brain Sciences,
  21:449–467.
Evert, Stefan. 2008. Corpora and collocations. In An-
  ke Lüdeling and Merja Kytö, editors, Corpus Lin-
  guistics. An International Handbook, pages 1212–
  1248, Berlin.
Gutzwiller, Kathryn J. 2007. A guide to Hellenistic
  literature (Blackwell guides to Classical literature).
  Blackwell Publishing, Oxford.