Panta rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek Martina A. Rodda Marco S.G. Senaldi Alessandro Lenci Scuola Normale Superiore Scuola Normale Superiore CoLing Lab Piazza dei Cavalieri, 7 Piazza dei Cavalieri, 7 Università di Pisa 56126 Pisa – ITALY 56126 Pisa – ITALY via S. Maria 36 martina.rodda@sns.it marco.senaldi@sns.it alessandro.lenci@unipi.it 1 Introduction and Related Work Abstract Distributional Semantics is grounded on the as- English. We present a method to explore sumption that the meaning of a word can be de- semantic change as a function of varia- scribed as a function of its collocates in a corpus. tion in distributional semantic spaces. In This suggests that diachronic meaning shifts can this paper we apply this approach to au- be traced through changes in the distribution of tomatically identify the areas of semantic these collocates over time (Sagi et al., 2011). change in the lexicon of Ancient Greek While some studies focused on testing the ex- between the pre-Christian and Christian planatory power of this method over frequency- era. Distributional Semantic Models are and syntax-based approaches (Wijaya and Ye- used to identify meaningful clusters and niterzi, 2011; Kulkarni et al., 2015), more ad- patterns of semantic shift within a set of vanced contributions to the field explored how target words, defined through a purely distributional models can be used to test compet- data-driven approach. The results empha- ing hypotheses about semantic change (Xu and size the role played by the diffusion of Kemp, 2015), or to investigate the productivity Christianity and by technical languages of constructions in diachrony (Perek, 2016). The in determining semantic change in An- results attest the explanatory power of distribu- cient Greek and show the potentialities of tional methods in modeling diachronic shifts in distributional models in diachronic se- meaning. mantics. In this paper, we propose a method to identify semantic change through the Representational Italiano. Si presenta un metodo per in- Similarity Analysis (RSA; Kriegeskorte and dagare il cambiamento semantico come Kievit, 2013) of distributional vector spaces built funzione della variazione all’interno di from diachronic corpora. RSA is a method exten- spazi semantici. Questo approccio è ap- sively used in neuroscience to test cognitive and plicato per identificare automaticamente computational models by comparing the geome- aree di cambiamento semantico nel lessi- try of their representation spaces (Edelman, co greco antico tra età pre-cristiana e 1998). Stimuli are represented with a representa- cristiana. Modelli della Semantica Di- tional dissimilarity matrix that contains a meas- stribuzionale sono usati per identificare ure of the dissimilarity relations of the stimuli cluster e pattern di cambiamento seman- with each other. Different matrices are compared tico in una lista di parole target, definita to evaluate the correspondence of the representa- con un approccio puramente data-driven. tional spaces built from different sources (e.g., I risultati mostrano il ruolo della diffu- behavioral and neuroimaging data). We argue sione del Cristianesimo e dei linguaggi that this method can be applied to compare dis- tecnici nel determinare cambiamenti se- tributional representations of the lexicon at dif- mantici in greco antico, nonché le poten- ferent temporal stages. The hypothesis is that the zialità dei modelli distribuzionali nella elements in the lexical spaces showing larger ge- semantica diacronica. ometrical variations in time correspond to the lexical areas that have undergone major semantic changes. To the best of our knowledge, this is the occurrences were computed within a window of first time RSA is used in diachronic distribution- 11 words (5 content words to the right and to the al semantics. left of each target word). Association scores were Here we present a case study that applies RSA weighted using positive point-wise mutual in- to track patterns of semantic change within the formation (PPMI) (Evert, 2008); the resulting lexicon of Ancient Greek. We focus on the first matrices were reduced to 300 latent dimensions few centuries AD, when the rise of Christianity using Singular Value Decomposition (SVD). caused a deep and widespread cultural shift with- in the Hellenic world. We predict that this shift 2.1 RSA of the distributional vector spaces will be reflected in the Greek lexicon of the time. We have adapted the RSA method to discover In addition to past studies (Boschetti, 2009; semantic changes between the two vector spaces: O’Donnell, 2005 is a general introduction), we 1. we identified the words occurring in both sub- apply a bottom-up approach to the detection of corpora with a frequency higher than 100 tokens, semantic change, with no prior definition of a list obtaining 3,977 lemmas; of lemmas to be analyzed. The goal is to develop 2. we built a representational similarity matrix a quantitative “discovery procedure” to detect (RSM) from the BC-Space (RSMBC) and one lexical semantic changes. from the AD-Space (RSMAD). Each RSM is a From a methodological standpoint, this study square matrix indexed horizontally and vertically aims to show how Distributional Semantics can by the 3,977 lemmas and containing in each cell be applied fruitfully to such a small and literary the cosine similarity of a lemma with the other corpus as the collection of Ancient Greek texts. lemmas in a vector space (this is a minor varia- The results will also highlight the ways in which tion with respect to the original RSA method, Distributional Semantics can complement the in- which instead uses dissimilarity matrices). A tuition of the researcher in analyzing semantic RSM is a global representation of the semantic change in Ancient Greek, providing a useful tool space geometry in a given period: vectors repre- for future studies in Classics. sent lemmas in terms of their position relative to the other lemmas in the semantic space; 2 Materials and Methods 3. for each lemma, we computed the Pearson cor- The corpus used for this study is based on the relation coefficient between its vector in RSMBC TLG-E (Thesaurus Linguae Graecae) collection and the corresponding vector in RSMAD. of Ancient Greek literary texts. The database was The Pearson coefficient measures the degree of divided into two sub-corpora, the first of which semantic shift across the two temporal slices. contains texts from the 7th to the 1st century BC The lower the correlation, the more a word (pre-Christian era), while the second one spans changed its meaning. from the 1st to the 5th century AD (early Christian 3 Discussion of Results era). The pre-Christian sub-corpus contains 6,795,253 tokens, while the Christian sub-corpus The following section focuses on the words that totalizes 29,051,269 tokens. underwent the biggest changes, i.e. those for The texts were lemmatized using Morpheus which the correlation scores are lower. The pri- (Crane, 1991). Any issues with the lemmatization mary goal will be to establish whether these should not have a significant impact on the re- words can be clustered into meaningful groups. sults unless otherwise stated (cf. Boschetti, 2009, This would allow us to pinpoint the areas within page 60 for a discussion). After filtering for stop- the lexicon of Ancient Greek that have under- words (mainly particles, pronouns and connec- gone a significant semantic shift during the early tives) and lemmas occurring with a frequency centuries of Christianity. below 100 tokens, the pre-Christian and Chris- tian sub-corpus contain, respectively, 4,109 and 3.1 Qualitative Analysis 10,052 lemmas, which were used both as targets The 50 lemmas with the lowest correlation coef- and dimensions in our vector spaces. ficients were scrutinized in order to establish A vector space model was then built for each whether meaningful subgroups emerge. (This list sub-corpus using the DISSECT toolkit (Dinu et of words is not reproduced here due to space al., 2013). Henceforth, we refer to the pre- constraints. They are a subset of the 200 words Christian era model as the BC-Space, and to the used to build the plot in section 4.3.) The find- Cristian era model as the AD-Space. Co- ings in this section, while inevitably limited by the intuition of the researcher, will provide the from military terms such as πολιορκία (poliorkía starting point for a more sophisticated analysis to “siege”) and στρατόπεδον (stratópedon “en- be performed in the following sections. campment, army”) to the physical and philosoph- The lemmas under consideration form a ical domain, with the closest term being ἐνέργεια somewhat heterogeneous collection, including (enérgeia “activity, actuality”, an antonym of concrete nouns and relatively common verbs δύναμις in its philosophical sense of “potentiali- such as ζυγόν (zygón “yoke”) and ἕπομαι (hé- ty”). The case of δύναμις also shows how nearest pomai “follow”), as well as some proper nouns. neighbor analysis can reveal shifts in the usage This notwithstanding, a promising subset of of heavily polysemous words. words emerges even at this preliminary stage. Not all changes observed through the analysis These are a number of nouns designating emi- of nearest neighbors, however, are so easily pre- nently Christian concepts, such as παραβολή dictable. Thus, for instance, the neighbors for (parabolé “parable”, previously “comparison”), μοῖρα (môira, another highly polysemous word λαός (laós, used for the Christian community as with meanings spanning from “part” to “desti- opposed to non-Christians, previously “people”), ny”) in the AD-Space come exclusively from the κτίσις (ktísis “creation”, previously “founding, domain of astronomy, showing a strong speciali- settling”). zation towards a technical usage (“degree” or These findings are in line with the idea that the “division” of the Zodiac). Another remarkable diffusion of Christianity played a substantial role result comes from a geographical adjective, in semantic change in the first centuries AD (cf. Ποντικός (Pontikós “coming from Pontus”), Boschetti, 2009). Other Christian terms, such as whose nearest neighbors shift from proper names θεός (theós “God”), ἄγγελος (ángelos “angel”, and philosophical terms in the pre-Christian age previously “messenger”), πατήρ (patér “father”), (an association due, without doubt, to the usage υἱός (hyiós “son”), also occur among the 100 of “Ponticus” as an epithet for authors, e.g. Her- words with the lowest correlation coefficients. aclides) to names of currency and trade wares, Another group of lemmas comprises technical probably as a reflection of the integration of Pon- terms whose usage seems to have undergone a tus as a Roman province (with the obvious reper- specialization or a shift from one domain of cussions on trade) in the 1st century AD. knowledge to another. These include words such as ὑπόστασις (hypóstasis “substance”, previously 3.3 t-SNE Plot “sediment, foundation”), δύναμις (dýnamis As a final analysis, we embedded the RSMAD “property (of beings)”, previously “power”), or vectors for the 200 words with the lowest corre- ῥητός (rhetós “literal” as opposed to “allegori- lation coefficient with the corresponding RSMBC cal”, previously “stated”). vectors in a two-dimensional space with t-SNE (Figure 1), a technique for dimensionality reduc- 3.2 Analysis of Nearest Neighbors tion and data visualization that overcomes some To corroborate the intuitions detailed above, the of the limitations of standard multidimensional 10 nearest neighbors for each of the last 50 scaling (van der Maaten and Hinton, 2008). This words according to the correlation coefficient procedure allows for easy identification of clus- were retrieved using DISSECT. The process was ters, thus revealing the semantic relation between repeated for each sub-corpus and the results the most recent meanings of the words that un- compared in order to look for visible shifts, es- derwent the greatest semantic change. pecially those involving different semantic do- A number of small clusters can be observed in mains. A few examples of the results should suf- the plot. Near the left periphery, the most rele- fice to confirm the findings in the last section. vant group is composed of terms pertaining to For instance, among the nearest neighbors for Christian theology (from κύριος kýrios “Lord”, πνεῦμα (pnêuma “spirit”, previously “breath”) in λαός and θεός, to παρουσία parousía “Advent” the AD-Space we find such words as θεάομαι and ποιμήν poimén “shepherd”). The position of (theáomai “contemplate”), ἀληθινός (alethinós ψῦχος (psŷkhos “cold”) nearby is due to the mis- “true”), κτίσις, υἱός, θεός and so forth, while in lemmatization of some inflected forms of ψυχή the BC-Space the strongest similarity is with (psyché “soul”) under this lemma, as revealed by terms pertaining to the domain of physics, such nearest neighbor analysis. To the left of this as ἀήρ (aér “air”), ὑγρός (hygrós “moist”), group, a small cluster of terms pertaining to θερμός (thermós “hot”). Another clear-cut exam- Christian exegesis (ῥητός, παραβολή, διασαφέω ple is that of δύναμις, whose neighbors change diasaphéo “illustrate”) can be recognized. Figure 1. Relative positions within the AD-Space of the 200 words with the lowest correlation scores. Dimensionality reduction was performed using t-SNE (van der Maaten and Hinton, 2008). (in a broader sense) and/or technical language. The upper portion of the plot houses technical Within these domains, some more fine-grained terms from the domains of medicine (the upper- relations between words that underwent signifi- most groups), astronomy and geometry, while cant semantic shifts can be observed. philosophical terminology is found in the outer right area. Some smaller groups are also noticea- 4 Conclusion ble, such as μνᾶ (mnâ “mina”) and δραχμή This paper shows how Distributional Semantics (drakhmé “drachma”), both units of currency, on can be used as an exploratory tool to detect se- the left, and πρώτιστος (prótistos “the very first”) mantic change. In this case study on Ancient and Τίμαιος (the proper name Tímaios, Latin Greek, the proposed method based on distribu- Timaeus), both connected to (Neo-)Platonic phi- tional RSA not only confirms the hypothesis that losophy, on the right. the diffusion of Christianity was a crucial cause All in all, despite a certain amount of noise, of semantic change in the Greek lexicon, but also the plot in Figure 1 supports the findings detailed allows for the identification of unexpected pat- so far. We can see how the main semantic chang- terns of evolution, such as the apparent speciali- es in the Greek lexicon between the pre-Christian zation in the usage of technical terms. This last and Christian era affected the domains of religion phenomenon could also be influenced by the fact that the AD-corpus is richer in philosophical and Kriegeskorte, Nikolaus and Roger A. Kievit. 2013. technical treatises; however, a documented Representational geometry: integrating cognition, change in the proportion of different possible us- computation, and the brain. Trends in Cognitive ages of a word is in itself a very informative re- Sciences, 17(8):401–412. sult, especially in a field such as Classics, where Kulkarni, Vivek, Rami Al-Rfou, Bryan Perozzi and the analysis of (literary) texts is paramount. Fur- Steven Skiena. 2015. Statistically significant detec- ther research should undoubtedly highlight the tion of linguistic change. In Proceedings of the effect of corpus composition. A focus on shorter 24th International Conference on World Wide Web periods of time might be of interest, since, for (WWW ‘15), pages 625–635, Firenze. instance, the rise of technical prose writing is a Van der Maaten, Laurens and Geoffrey Hinton. 2008. characteristic of the Hellenistic Age (cf. e.g. Visualizing data using t-SNE. Journal of Machine Gutzwiller 2007, pages 154-167). Learning Research, 9:2579–2605. From a methodological standpoint, the fact O’Donnell, Matthew Brook. 2005. Corpus Linguistics that the results obtained from such a small corpus and the Greek of the New Testament (New Testa- of purely literary texts are both meaningful and ment Monographs, 6). Sheffield Phoenix Press, informative is of great relevance. Furthermore, Sheffield. the choice to adopt a data-driven approach Perek, Florent. 2016. Using distributional semantics proved fruitful, in that it brought to light direc- to study syntactic productivity in diachrony: A case tions of change that were not expected a priori. study. Linguistics, 54(1):149–188. For traditional research in Classics, a computa- Sagi, Eyal, Stefan Kaufmann and Brady Clark. 2011. tional approach to the lexicon of Ancient Greek Tracing semantic change with Latent Semantic is compelling because it provides new infor- Analysis. In Kathryin Allan and Justyna A. Robin- mation about a language for which the judgments son, editors, Current Methods in Historical Seman- of native speakers are unavailable (cf. Perek, tics, pages 161–183, Boston, MA. 2016). The results of this study show how Distri- Wijaya, Derry Tanti and Reyyan Yeniterzi. 2011. Un- butional Semantics can complement the asser- derstanding semantic change of words over centu- tions of the philologist, as well as help discover ries. In Proceedings of the 2011 International patterns of lexical change that would otherwise Workshop on DETecting and Exploiting Cultural be impossible to grasp beyond an intuitive level. diversiTy on the Social Web (DETECT ‘11), pages 35–40, Glasgow. References Xu, Yang and Charles Kemp. 2015. A computational Boschetti, Federico. 2009. A Corpus-based Approach evaluation of two laws of semantic change. In Pro- to Philological Issues. PhD Thesis, University of ceedings of the 37th Annual Meeting of the Cogni- Trento, Trento. tive Science Society (CogSci 2015), Pasadena, CA. Crane, Gregory. 1991. Generating and parsing Classi- cal Greek. Literary and Linguistic Computing, 6(4):243–245. Dinu, Georgiana, Nghia The Pham and Marco Baroni. 2013. DISSECT – DIStributional SEmantics Com- position Toolkit. In Proceedings of the 51st Annual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pages 31–36, So- fia. Edelman, Shimon. 1998. Representation is representa- tion of similarities. Behavioral and Brain Sciences, 21:449–467. Evert, Stefan. 2008. Corpora and collocations. In An- ke Lüdeling and Merja Kytö, editors, Corpus Lin- guistics. An International Handbook, pages 1212– 1248, Berlin. Gutzwiller, Kathryn J. 2007. A guide to Hellenistic literature (Blackwell guides to Classical literature). Blackwell Publishing, Oxford.