1. Introduction

Cagliari, Italy $ pierluigi.cassotti@gu.se (P. Cassotti); nina.tahmasebi@gu.se (N. Tahmasebi)

A Hypothesis-Driven Framework for Detecting Lexical Semantic Change

Pierluigi Cassotti

Nina Tahmasebi

0 0 University of Gothenburg, Department of Philosophy, Linguistics and Theory of Science , Gothenburg , Sweden

2025

000 0 0001

This paper introduces a hypothesis-driven framework aimed at detecting lexical semantic change, addressing the limitations of current computational methods that struggle with the dynamic and contextually modulated nature of word meanings. Traditional approaches, such as Word Sense Disambiguation (WSD), fail to capture the fluidity of senses, whereas Word Sense Induction (WSI), while more flexible, lacks the precision necessary to align with predefined semantic structures. Our approach systematically combines expert-defined sense hypotheses with advanced computational techniques, including generative models, encoding and prototyping methods, and targeted semantic analysis. Using words historically significant in scientific contexts-such as theory, gene, and force-we demonstrate the efectiveness of our method in tracing fine semantic changes and metaphorical extensions over time, highlighting its advantages over naive computational strategies.

eol>lexical semantic change lexical semantics diachronic historical linguistics

1. Introduction Induction (WSI) is better suited, as it derives sense struc

tures directly from data. However, WSI’s open-ended Polysemy, the phenomenon where a single word carries nature makes it challenging to align derived senses with multiple meanings, has long intrigued researchers. Often, a predefined ground truth, especially when attempting to words reach a polysemic state, through a process of se- track meaning changes across centuries of a language’s mantic change in which the (set of) senses of a word has history. been altered. Dictionaries serve as vital resources in this Current computational models often fail to align with ifeld, cataloging the various senses of words. However, ground truth sense representations unless explicitly they are not all-encompassing and the granularity of guided. One way to address this is by starting with predethe recorded senses varies across dictionaries, reflecting ifned search hypotheses, which can simplify the modelthe approaches of lexicographers, who are often catego- ing process and provide a clearer framework for tracking rized as "lumpers" or "splitters." Lumpers favor broader, meaning shifts over time. more encompassing definitions, while splitters distin- By establishing research hypotheses, we can predefine guish senses with subtle nuances. the organization and structure of word senses, guiding

This variability ties into contextual modulation [ 1 ], computational models toward a predetermined ground where a word’s core meaning remains stable but shifts truth. However, this remains challenging with standard slightly depending on its context. Such shifts become technologies, which require models capable of adapting more pronounced over time, as word meanings evolve to meaning representations without relying on specific in response to cultural and social changes. For instance, senses. the Oxford English Dictionary [ 2 ] defines "phone" simply In this paper, we present our hypothesis-driven theas a “telephone apparatus,” a broad enough definition to oretical framework for detecting meaning change (Secencompass its evolution from landline phones to public tion 3). We also demonstrate a practical implementation telephone booths to modern smartphones. of this framework using recently developed computa

This dynamic nature of meaning poses significant tional models (Section 2). Furthermore, we provide a challenges for computational modeling. Traditional ap- concrete example by comparing our approach to naive proaches like Word Sense Disambiguation (WSD) [ 3 ] WSI methods (Section 4), highlighting the advantages of struggle because they assume fixed meanings, ignoring the hypothesis-driven approach. the fluid continuity of senses. In contrast, Word Sense

Detecting changes in word meaning typically involves two stages: first, representing the meaning of words in individual time periods, and second, verifying whether a 2. Related Work

2.1. Representation of Word Meanings model may generate a definition that reflects contextual modulation. While this is not rewarded in the evaluation Representing word meanings in historical texts poses of the models (where generated definitions are evaluunique challenges for computational models [ 4 ]. These ated against dictionary definitions), it is often a desirable models must understand historical contexts, avoid re- outcome when we want to study meaning change. liance on lexicographic resources that may omit new or Another way to use the potential of large language obsolete senses, and ideally capture subtle temporal shifts models (LLMs) is by using them as computational annowithin a word’s meaning, rather than just the addition or tators. This involves prompting instructed LLMs to inremoval of senses. For example, the word "horse" once terpret the meaning of a word (by solving the WiC task) referred to the primary mode of transportation but no in a zero-shot setting, without requiring task-specific longer holds that role in our daily lives today. training. For example, in [ 12 ], we compared GPT-4 with

To address these challenges, approaches to represent- contextualized models like BERT and XL-LEXEME on ing word meanings often use a greater degree of freedom tasks such as Word-in-Context (WiC), Word Sense Inthat allow for nuanced representations. Models for word duction (WSI), and Lexical Semantic Change Detection meaning representation can be viewed on a continuum. (LSCD). The results demonstrate that XL-LEXEME and At one end, Word Sense Disambiguation (WSD) models zero-shot GPT-4 perform comparably across all tasks, deassign all instances of a word’s meaning to a single sense, spite GPT-4 having significantly more parameters (1,000 ofering limited flexibility. At the other end, contextu- times larger) and higher computational costs. alized models [ 5 ] treat each instance as a unique entity, providing greater freedom but often encoding extraneous information, such as syntactic or morphological varia- 2.2. Detection of changes tions, which may not be relevant for tracking meaning The process for detecting changes in word meaning over change. WSD-based models, while precise, are often too time typically follows a standard pipeline, c.f. [ 13 ]: rigid to capture subtle variations within a sense.

In recent years, research has focused on developing 1. Collect the occurrences of a word over time, balanced solutions—models that are nearly as flexible as denoted as 1, 2, . . . , , where represents contextualized approaches but prioritize semantic charac- the instances in which the word appears at teristics over other linguistic aspects. This enables more time . efective modeling of contextual modulation. 2. Encode the uses of the word into vectors, result

One such model is XL-LEXEME [ 6 ], a bi-encoder based ing in the sequence 1, 2, . . . , , where repon SBERT [ 7 ] with a Siamese architecture and an XLM-R resents the vectors encoding the uses of the word [ 8 ] backbone. XL-LEXEME has been trained on the Word- at time . in-Context (WiC) [ 9 ] task to predict whether a target 3. Select a metric for comparing the vectors, choword has the same meaning in two given sentences (1 for sen from the following options [14]: the same meaning, 0 for diferent meanings). This is done by generating two XL-LEXEME vector representations • Average Pairwise Distance (APD): Comof the word’s meaning in each sentence by aggregating putes and averages distances between all subword embeddings from the entire sentence. These pairs of vectors from two time points. vectors are compared using cosine similarity, and a con- • Prototype Distance (PRT): Calculates trastive loss function encourages higher similarity for the distance between centroids (protomatching meanings and lower similarity otherwise. types) of two time points.

However, XL-LEXEME’s output—cosine similarity • Cluster-based Jensen-Shannon Disscores between sentence pairs—lacks the interpretabil- tance (JSD): Clusters data irrespective of ity needed to fully understand the processes underlying time, computes the frequency of senses for meaning change. each time period separately, treats them

Recently, we have seen novel methods for modeling as probability distributions, and calculates meaning, namely definition generation , where for a given the distance between two time points via target word in context, the method generates a dictionary- Jensen-Shannon distance of the probability like definition [ 10, 11 ]. Such definition generation models distributions. produce definitions that capture the intended word meaning but may deviate from ground-truth definitions for three main reasons. First, like humans, models may express the same concept using diferent words, requiring mappings to the underlying sense. Second, errors such as hallucinations can compromise performance. Third, a

4. Compare the vectors using the metric accord

ing to a specific strategy, e.g.

a) Comparison with the first period:

(1, 2), (1, 3), . . . , (1, ) b) Comparison with the last period: (1, ), (2, ), . . . , ( − 1, )

To tailor the pipeline to specific computational models, certain modifications can be introduced. For definition generation, an additional step can be inserted after step (1). First, generate definitions for each instance of word use. Then, in step 2, encode these definitions into vectors instead of the word uses themselves. For large language models (LLMs) as computational annotators, LLMs provide a semantic distance value for pairs of word uses directly. In this case, steps (1) and (2) are bypassed, and the Average Pairwise Distance (APD) is used to compute the average distances between pairs of time points.

c) Comparison with the previous period:

(1, 2), (2, 3), . . . , ( − 1, ) d) Comparison within a window of size : To investigate the historical evolution of word senses, (, (− , +)), (+, (, +2)), . . . we propose a hypothesis-driven methodology. For instance, a research hypothesis might posit that the word gene began to be used metaphorically shortly after its establishment in the biological sciences during the 1950s, reflecting its profound influence on modern thought. Our goal is to trace the evolution of gene across the 20th century and identify its earliest occurrences in various senses, a task traditionally performed by experts manually examining thousands of concordances.

A conventional word sense disambiguation (WSD) system, often based on resources like WordNet [16], is limited in this context. WordNet, for example, provides only a single definition for gene:

2.3. Historical Word Usage Generation

(genetics) a segment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons; it is considered a unit of heredity.

The study of lexical semantic change requires large-scale,

diachronic sense-annotated corpora, yet such resources are scarce due to the time, expertise, and cost involved in annotating historical texts. To overcome this barrier, Janus [15], a generative model fine-tuned on the Llama 3 Instead, OED contains a second sense: 8B architecture using 1,191,851 example sentences from In figurative and extended use, esp. with the Oxford English Dictionary (OED), was developed. reference to qualities regarded as deeply inJanus generates historically accurate and sense-specific grained or (often humorously) as inherited. word usages for any given word, its sense definition , and Often in plural. a target year from 1700 onward. This capability enables the creation of extensive datasets for tasks such as word Such systems struggle with historical texts due to (i) their sense disambiguation and detecting semantic shifts over incompatibility with archaic language and (ii) their intime. complete coverage of senses, particularly metaphorical or

Janus produces sentences that reflect the intended emerging uses. Large language model (LLM)-based modmeaning of a word in a specific historical context. Its els, on the other hand, ofer improved sense identification performance was compared to baseline models, including but are computationally expensive and environmentally GPT-3.5, GPT-4o, and Llama 3 Instruct variants, across unsustainable for analyzing thousands of word occurthree key metrics: (i) context variability, which measures rences in large historical corpora. the diversity of generated sentences to ensure varied expressions of the same sense; (ii) temporal accuracy, which assesses how well the language aligns with the 3.1. Our Approach specified historical period (e.g., avoiding "airplane" be- We propose a scalable, hypothesis-driven framework fore 1903); and (iii) semantic accuracy, which evaluates comprising three components: an encoder , a protohow closely the generated sentences match the provided typer , and a comparison function . This framework sense definition. Janus outperforms baselines in context systematically analyzes word sense evolution by combinvariability and temporal accuracy, producing diverse sen- ing expert-defined sense definitions with computational tences with a root mean squared error (RMSE) of 54.75 techniques. years for historical alignment (in line with the baseline).

Qualitative analysis highlights Janus’s ability to emulate temporal linguistic shifts, such as the declining use of archaic pronouns like "thee" and the evolving meaning of "awful" from impressive to negative. 3. Hypothesis-Driven LSCD 1. Definition of Senses : Let = {1, 2, . . . , } represent a set of sense definitions for the target word (e.g., gene), crafted to align with the research hypotheses. For each sense , we use a generative model (e.g., Janus) to produce a collection of synthetic examples, = {1 , 2 , . . . , }, representing the word’s usage in that sense across the target time period.

. 3. Corpus Analysis: Let = {1, 2, . . . , } denote the set of actual occurrences of the target word in the historical corpus. Each occurrence is encoded into a vector = ( ). The comparison function : R × R → R measures the similarity between each corpus vector and each prototype vector . For each sense and time period , we identify the most relevant corpus occurrences by ranking ( , ). 4. Analysis and Interpretation This approach enables experts to examine the highest-ranked sentences for each sense and time period, facilitating the identification of when a particular sense, such as a metaphorical use of gene, first emerged. By leveraging encoded representations and prototype-based comparisons, our method provides a scalable and systematic alternative to manual concordance analysis, while maintaining interpretability for domain experts.

4. Use case In this section, we outline a comprehensive pipeline for

analyzing semantic shifts in three words relevant to the history of science: theory, gene, and force. Our approach combines exploratory analysis using traditional Lexical Semantic Change Detection (LSCD) methods (outlined

4.1. LSCD Metrics

To evaluate lexical semantic change, we employed three distinct metrics—APD, PRT, and JSD—to quantify shifts in the meanings of the words theory, gene, and force over time, as depicted in Figure 2. These metrics were applied to vector representations generated by XL-LEXEME. For each word, we calculated the three metrics with respect to the first time point (e.g., ⟨1, ⟩).

APD The APD metric computes the average cosine distance between all pairs of vectors representing word uses from two time periods. Figure 2(a–c) illustrates that APD values for theory show moderate fluctuations, indicating subtle shifts in usage, while gene exhibits a sharp increase in APD around the 1900s, reflecting the emergence of its biological sense. Similarly, force displays

1This pipeline and these results were presented first in a keynote for

the workshop Large Language Models for the History, Philosophy, and Sociology of Science. (b) gene (c) force varying APD trends, with peaks corresponding to the 1950s.

PRT The PRT metric measures the cosine distance between centroid vectors (prototypes) of word uses at different time points. For each word, prototypes were generated by averaging the XL-LEXEME embeddings for all occurrences within a time period. Figure 2(a–c) shows that PRT distances for gene increase significantly post1950, while for theory and force, PRT reveals more stable transitions.

JSD The JSD metric involves clustering word use embeddings (using agglomerative clustering with a distance threshold of 0.5, as shown in Figure ??) and treating the frequency of senses as probability distributions. JSD then quantifies semantic change by computing the distance between distributions of two time periods. Figure 2(a–c) indicates that JSD captures pronounced shifts for gene and force, while for theory values remain relatively low. This is because only one cluster is mainly present across all time points for theory, with two small clusters appearing only in the final two periods.

4.2. Labeling Clusters with Definitions We employed LLama-Dictionary to generate context

specific definitions for the words force, theory, and gene.

For each word, sense clusters were induced in Section 4.

A representative instance from each cluster was selected, and LLama-Dictionary generated a definition reflecting the word’s meaning in that context. These definitions, presented in Table 1, provide a structured representation (b) gene (c) force of the senses for each word. For gene, Table 1 identifies seven senses, including its

For the word force, Table 1 lists seven distinct senses, modern biological meaning (e.g., a distinct sequence of ranging from physical influences (e.g., an influence tend- nucleotides forming part of a chromosome) and clusters ing to change the motion of a body) to military contexts containing instances with OCR errors (e.g., to go or a set (e.g., a military unit engaged in a particular operation or of generations). mission) and coercive actions (e.g., to cause (something) to perform an action against its will or inclinations). These 4.3. Hypotesis-Driven Investigation definitions highlight the word’s polysemy, capturing both concrete and abstract uses across historical contexts. In our hypothesis-driven investigation, we conducted an

The word theory has three identified senses in Table 1: in-depth semantic analysis of the lexical items theory, a speculative belief (a belief that is based on speculation force, and gene. In particular, we selected word sense rather than adequate evidence), a fashion-related sense definitions from the OED that do not appear to emerge (a fashion theory, a style of fashion design), and a narra- through the traditional pipeline. For theory, which aptive account (a narrative account of a phenomenon, event peared to have only one dominant sense in previous or chain of events). These definitions reflect the word’s analyses, we identified two sub-senses: one relating to evolution from abstract intellectual constructs to more the arts and another to mathematics. For force, we chose specific, domain-related meanings. the specific sense associated with physics, while for gene, Theory Gene

Cluster Definition 0 A body of water or air moving under the influence of a force; 1 To cause (something) to perform an action against its will or inclination; 2 An influence tending to change the motion of a body or produce motion or stress in a stationary body; 3 To put out a runner by requiring him to run; 4 A military unit engaged in a particular operation or mission; 5 To advance or mature by natural or inevitable progression; 6 To cause (a result) by the exertion of force; 7 An army. 0 A belief that is based on speculation rather than adequate evidence as to its truth; 1 A fashion theory, a style of fashion design; 2 A narrative account of a phenomenon, event or chain of events. 0 To go; 1 A distinct sequence of nucleotides forming part of a chromosome, the order of which determines the order of monomers in a polypeptide or nucleic acid molecule which a cell (or virus) may synthesize; 2 A unit of heredity which is transferred from a parent to ofspring and is held to determine some characteristic of the ofspring; 3 A set of genetic instructions; 4 A set or class; 5 A name, especially a shortened name; 6 A set of people descended from a common ancestor; 7 A set of generations.

5. Conclusion

we focused on the metaphorical sense referring to inherited traits. Table 2 illustrates representative sentences from historical periods for each targeted sense, along In this work, we introduced a hypothesis-driven framewith corresponding similarity scores. work for detecting lexical semantic change. By integrat

For theory, we identified clear semantic distinctions ing expert-defined sense definitions with SOTA computabetween its mathematical and arts-related conceptual- tional models like XL-LEXEME and Janus, our framework izations. The mathematical sense consistently empha- systematically traces the evolution of word meanings sizes structured systems of knowledge or deduction, no- across historical corpora. Starting with a word and its tably stable across historical contexts with high similarity senses (or only the ones that we want to study), we utilize scores (ranging from 0.9632 in 1850 to 0.9835 in 1950). the strength of LLMs to allow for easy investigation into Conversely, the artistic sense of theory reflects broader relevant corpus data. The method is not limited in terms cultural and philosophical applications, maintaining mod- of data it can be applied to, thus the user can choose the erate similarity scores (around 0.96) but allowing varia- data of interest, and limit to the relevant senses. We envitions tied to aesthetics and criticism. sion that the researcher can also define senses of interest,

The physical sense of force remains remarkably stable rather than using those listed in dictionaries, for example and contextually consistent, as evidenced by similarity by adding connotational information. This would allow scores consistently exceeding 0.96 across time periods. for the investigation of when word sense e.g., became

Applying the same methodology to gene, specifically more positive in meaning. focusing on its metaphorical sense, clarified the earlier The proposed hypothesis-driven framework ofers a observed anomaly. Early instances from the 1800s were robust methodology for accurately detecting and anaOCR errors (e.g., "genie rose," "genie really"). Genuine lyzing lexical semantic changes in historical texts. By metaphorical usage of "gene" emerged gradually, with integrating predefined hypotheses, generative language similarity values steadily increasing until the metaphori- models, and vector encoding techniques, our approach cal sense became clearly established around the 2000s. not only results interpretable for domain experts but also

The hypothesis-driven investigation provides signifi- systematically scales to large historical corpora. The cant precision and interpretability advantages over the case studies on words like "theory," "gene," and "force" traditional lexical semantic change detection pipeline. By illustrate the framework’s capability to reveal significant explicitly defining and targeting specific subsenses, such shifts in meaning, particularly those reflective of cultural as distinguishing between the mathematical and artistic and scientific developments. senses of theory, identifying the metaphorical usage of gene, and isolating the physical meaning of force, our Acknowledgments method captures semantic diferences that previously remained hidden within broader senses. Moreover, by directly analyzing real corpus sentences from the CCOHA dataset, experts gain improved control over the interpretation and validation of results.

This work has in part been funded by the research program Change is Key! supported by Riksbankens Jubileumsfond (under reference number M21-0021). The computational resources were provided by the National

Concept

Year

Most Similar Sentence (Similarity) Theory (Mathematics) — The body of knowledge relating to the properties of a particular mathematical concept; a collection of theorems forming a connected system.

1800 ...Fourier ’s large work , entitled , Theory of Universal Unity. (0.9761) 1850 ...the real object of the law is the mental image , the theory of the thing. (0.9632) 1900 ...a strictly consistent deduction from the theory... (0.9714) 1950 ...to place the theory of abstraction in a perspective unchallenged... (0.9835) 2000 ..., 2000 ) , is bio-informational theory ( Lang , 1979 , 1985 ). (0.9669) Theory (Arts) — An approach to the study of literature, the arts, and culture that incorporates concepts from disciplines such as philosophy, psychoanalysis, and the social sciences.

1800 ...to accommodate himself to his theory frequently involves him in a dialect... (0.9587) 1850 ...error of his theory of poetry , and is the source of his one conspicuous failure... (0.9665) 1900 ...a knowledge of aesthetic history and philosophy , theory and practice... (0.9663) 1950 ...grammar is a theory of language , and a works. (0.9597) 2000 ...snake oil of art criticism and elixir of theory. (0.9712) Force — Used in various senses developed from the older popular uses, and corresponding to modern scientific uses of Latin vis. The cause of any one of the classes of physical phenomena, e.g., of motion, heat, electricity, etc., conceived as consisting in principle or power inherent in, or coexisting with, matter.

1800 ...the force d e , which it exerts upon D B. (0.9688) 1850 ...as a mechanical force , and as an agent in efecting chemical changes... (0.9828) 1900 ...It is the force of a body in motion. (0.9821) 1950 ...flowed a the force of gravity. (0.9823) 2000 ...the nuclear force is a short-range force , acting mainly over the distance... (0.9668) Gene — In figurative and extended use, esp. with reference to qualities regarded as deeply ingrained or (often humorously) as inherited. Often in plural.

1800 ...evinced in a more familiar way , by the gene ’. (0.8829) 1850 ...some people complained of a certain ’gene’ in him... (0.9280) 1900 ...started life with the very best of mental genes? (0.9335) 1950 Apparently Johnny got all the family ’s genes for music... (0.9531) 2000 ...lack of the self-awareness gene , spearheads the awkwardness. (0.9665)

Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.

Declaration on Generative AI During the preparation of this work, the author(s) did not use any generative AI tools or services.

[1]

C. S.

Armendariz ,

Purver ,

Ulčar ,

Pollak ,

Ljubešić , M. Granroth-Wilding, CoSimLex: A Resource for Evaluating Graded Word Similarity in Context , in: Proc. of LREC , ELRA, Marseille, France, 2020 , pp. 5878 - 5886 .

[2] O. E. D. OED , Oxford english dictionary, Simpson, Ja & Weiner, Esc 3 ( 1989 ).

[3]

Navigli , Word Sense Disambiguation: A Survey, ACM Comput. Surv . 41 ( 2009 ). URL: https://doi.org/ 10.1145/1459352.1459355. doi: 10 .1145/1459352. 1459355.

[4]

Tahmasebi ,

Borin ,

Jatowt , Survey of Computational Approaches to Lexical Semantic Change Detection, Language Science Press, Berlin, 2021 , pp. 1 - 91 . doi: 10 .5281/zenodo.5040302.

[5]

Devlin ,

Chang ,

Lee ,

Toutanova , BERT: pre-training of deep bidirectional transformers for language understanding , CoRR abs/ 1810 .04805 ( 2018 ). URL: http://arxiv.org/abs/ 1810 .04805. arXiv: 1810 .04805.

[6]

Cassotti ,

Siciliani , M. DeGemmis, G. Semeraro,

Basile , XL-LEXEME: WiC pretrained model for cross-lingual LEXical sEMantic changE , in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2 : Short

Papers)

, Association for Computational Linguistics , Toronto, Canada, 2023 , pp. 1577 - 1585 . URL: https://aclanthology.org/ 2023 .acl-short. 135 . doi: 10 . 18653/v1/ 2023 .acl-short. 135 .

[7]

Reimers , I. Gurevych , Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , Association for Computational Linguistics , Hong Kong, China, 2019 , pp. 3982 - 3992 . URL: https://aclanthology.org/ induction, in: N. Tahmasebi , S. Montariol , A. KuD19 - 1410 . doi: 10 .18653/v1/ D19 -1410. tuzov,

Alfter ,

Periti ,

Cassotti , N. Hueb-

[8]

Conneau ,

Khandelwal ,

Goyal , V. Chaud- scher (Eds.), Proceedings of the 5th Workshop hary, G. Wenzek,

Guzmán , E. Grave, M. Ott, on Computational Approaches to Historical LanL. Zettlemoyer,

Stoyanov , Unsupervised Cross- guage Change , Association for Computational Linlingual Representation Learning at Scale , in: guistics, Bangkok, Thailand, 2024 , pp. 108 - 119 . D. Jurafsky , J.

Chai , N.

Schluter , J. R.

Tetreault

URL

: https://aclanthology.org/ 2024 .lchange- 1 .10/. (Eds.), Proceedings of the 58th Annual Meeting doi:10 .18653/v1/ 2024 .lchange- 1 .10. of the Association for Computational Linguistics , [14]

Giulianelli ,

Del Tredici ,

Fernández , ACL 2020 , Online, July 5- 10 , 2020 , Association for Analysing lexical semantic change with contexComputational Linguistics , 2020 , pp. 8440 - 8451 . tualised word representations , in: ProceedURL: https://doi.org/10.18653/v1/ 2020 .acl-main. 747. ings of the 58th Annual Meeting of the Asdoi:10 .18653/v1/ 2020 . acl-main.747. sociation for Computational Linguistics , Asso-

[9]

M. T.

Pilehvar , J. Camacho-Collados, WiC: the ciation for Computational Linguistics, Online, Word-in-Context Dataset for Evaluating Context- 2020 , pp. 3960 - 3973 . URL: https://www.aclweb.org/ Sensitive Meaning Representations, in: J. Burstein, anthology/ 2020 .acl-main. 365 . doi: 10 .18653/v1/ C. Doran, T. Solorio (Eds.), Proceedings of the 2020.acl-main.365 . 2019 Conference of the North American Chap- [15]

Cassotti ,

Tahmasebi , Sense-specific historter of the Association for Computational Linguis- ical word usage generation, Transactions of the tics: Human Language Technologies, NAACL-HLT Association for Computational Linguistics ( 2025 ). 2019 , Minneapolis, MN, USA, June 2-7, 2019 , Vol- [16]

G. A.

Miller , WORDNET: a lexical database for ume 1 (Long and Short Papers), Association for english , in: Speech and Natural Language: ProceedComputational Linguistics , 2019 , pp. 1267 - 1273 . ings of a Workshop Held at Harriman, New York, URL: https://doi.org/10.18653/v1/n19- 1128 . doi:10. USA, February 23 - 26 , 1992 , Morgan Kaufmann, 18653/v1/n19- 1128 . 1992 . URL: https://aclanthology.org/H92-1116/.

[10]

Fedorova ,

Kutuzov ,

Scherrer , Definition [17]

Alatrash ,

Schlechtweg ,

Kuhn , S. S. im Walde, generation for lexical semantic change detection , in: CCOHA: Clean Corpus of Historical American L. -W. Ku , A. Martins , V. Srikumar (Eds.), Findings English, in: N. Calzolari , F. Béchet , P. Blache, of the Association for Computational Linguistics K. Choukri,

Cieri ,

Declerck , S. Goggi, ACL 2024 , Association for Computational Linguis- H.

Isahara , B.

Maegaard , J.

Mariani , H. Mazo, tics, Bangkok, Thailand and virtual meeting, 2024 ,

Moreno ,

Odijk , S. Piperidis (Eds.), Proceedpp. 5712 - 5724 . URL: https://aclanthology.org/ 2024 . ings of The 12th Language Resources and Evalifndings-acl . 339 . uation Conference, LREC 2020 , European Lan-

[11]

Periti ,

Alfter ,

Tahmasebi , Automatically guage Resources Association, Marseille, France, generated definitions and their utility for modeling 2020 , pp. 6958 - 6966 . URL: https://www.aclweb.org/ word meaning, in: Y. Al-Onaizan , M. Bansal , Y.-N. anthology/ 2020 .lrec- 1 .859/. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Association for Computational Linguistics, Miami, Florida, USA, 2024 , pp. 14008 - 14026 . URL: https: //aclanthology.org/ 2024 .emnlp-main. 776 /. doi: 10 . 18653/v1/ 2024 .emnlp-main. 776 .

[12]

Periti ,

Tahmasebi , A systematic comparison of contextualized word embeddings for lexical semantic change , in: K. Duh,

Gomez , S. Bethard (Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics , Mexico City, Mexico, 2024 , pp. 4262 - 4282 . URL: https://aclanthology.org/ 2024 . naacl-long . 240 .

[13]

Periti ,

Tahmasebi , Towards a complete solution to lexical semantic change: an extension to multiple time periods and diachronic word sense