=Paper=
{{Paper
|id=Vol-3878/22_main_long
|storemode=property
|title=DWUGs-IT: Extending and Standardizing Lexical Semantic Change Detection for Italian
|pdfUrl=https://ceur-ws.org/Vol-3878/22_main_long.pdf
|volume=Vol-3878
|authors=Pierluigi Cassotti,Pierpaolo Basile,Nina Tahmasebi
|dblpUrl=https://dblp.org/rec/conf/clic-it/CassottiBT24
}}
==DWUGs-IT: Extending and Standardizing Lexical Semantic Change Detection for Italian==
DWUGs-IT: Extending and Standardizing Lexical Semantic
Change Detection for Italian
Pierluigi Cassotti1,* , Pierpaolo Basile2 and Nina Tahmasebi1
1
University of Gothenburg, Department of Philosophy, Linguistics and Theory of Science, Gothenburg, Sweden
2
University of Bari Aldo Moro, Department of Computer Science, via E. Orabona, 70125, Bari, Italy
Abstract
Lexical Semantic Change Detection (LSCD) is the task of determining whether a word has undergone a change in meaning
over time. There has been a marked increase in interest in this task, accompanied by a corresponding growth in the scientific
community involved in developing computational approaches to semantic change. In recent years, a number of resources
have been made available for the evaluation of LSC models in a number of languages, including English, Swedish, German,
Latin, Russian and Chinese. DIACR-ITA is the only existing resource for LSCD in Italian. However, DIACR-ITA has a different
format from that used for other languages. In this paper, we present DWUGs-IT, which extends the DIACR-ITA dataset with
additional target words and usage-sense pair annotations and adapts it to the DURel format, including the first implementation
of a LSCD graded task for Italian.
Keywords
Lexical Semantic Change, Sense-annotated corpora, Italian, Historical Linguistics
1. Introduction acquired an auto-antonym meaning, i.e. a meaning that
is the opposite of its original meaning. In addition to its
As is the case with both society and culture, language original connotation of poor quality or negative, it has
is subject to change over time. Two key factors cause also acquired the opposite connotation of good or cool.
such linguistic change. Firstly, there are purely evolu- The term meat has undergone a process of specialization
tionary and linguistic considerations driven by the need in its meaning, whereby it has shifted from referring to
for more efficient communication [1]. One example of any kind of food in general to exclusively denoting the
this is the use of abbreviations and acronyms, such as meat of animals consumed as food.
LOL (Laughing Out Loud), which have become common- While traditional linguistic methods are informative,
place on social media platforms. Secondly, changes in they are often based on small, carefully curated sam-
society and culture lead to changes in language. This can ples. In contrast, linguistic analyses using computational
be seen, for example, in the adoption of a more inclu- models not only accelerate our understanding of lan-
sive language, as evidenced by grammatically gendered guage change but also provide broader and more detailed
languages, including Italian and the introduction of @ to insights, thereby facilitating the study of vast corpora
replace masculine and feminine endings [2]. across a wider range of genres and time [4, 5].
Language may undergo alteration at various levels, From a computational perspective, two key challenges
including morphological, syntactic, and semantic. Se- emerge in the study of semantic change: the modelling
mantic change concerns the alteration of the meaning of word meanings over time and the detection of
of words over time. The study of semantic change is a change [6, 7]. At the synchronic level, ignoring the
prominent area of research in Historical Linguistics, with temporal dimension with a focus on modern corpora,
the aim of investigating the linguistic mechanisms that the Natural Language Processing community has made
characterize the change and the causes that trigger it. For significant strides in modelling word meanings, with ap-
instance, Blank [3] provides a broad study on the charac- proaches such as Word Sense Disambiguation (WSD)
terization of semantic change, identifying a number of [8] playing a pivotal role. Computational modelling of
different types of change, including metaphor, metonymy, semantic change introduces a significant level of com-
generalization, specialization, co-hyponym transfer and plexity, as it necessitates the handling of meanings that
auto-antonym. The English word bad, for example, has are either extinct or novel in comparison to existing lexi-
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, cographic resources, such as WordNet, as well as dynam-
Dec 04 — 06, 2024, Pisa, Italy ically changing meaning representations.
*
Corresponding author. In recent years, great efforts have been made to ad-
$ pierluigi.cassotti@gu.se (P. Cassotti); pierpaolo.basile@uniba.it vance the field of computational methods for Lexical
(P. Basile); nina.tahmasebi@gu.se (N. Tahmasebi)
Semantic Change Detection. With initiatives such as
0000-0001-7824-7167 (P. Cassotti); 0000-0002-0545-1105
(P. Basile); 0000-0003-1688-1845 (N. Tahmasebi) the Workshop on Computational Approaches to Histor-
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ical Language Change [9] promoting research in this
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
field or shared tasks such as SemEval 2020 Task 1 [10], (2), Closely Related (3) or Identical (4). The scale of se-
RuShiftEval [11], DIACR-ITA [12], or LSCD Discovery mantic relatedness is derived from the cognitive model
[13] leading to the development of the first evaluation re- proposed by Blank [20] and corresponds to the values of
sources. DIACR-Ita, hosted in EVALITA 2020 [14], is the Homonymy (1), Polysemy (2), Context Variance (3) and
first shared task specifically created for the evaluation Identity (4).
of models for Lexical Semantic Change in Italian. The The annotations are then presented in the form of a
majority of the evaluation resources follow a two-task ap- graph, specifically a Word Usage Graph (WUGs) or a Di-
proach: (1) a binary task, which requires the assignment achronic Word Usage Graph (DWUGs) [21] in cases
of a word to either the changed or stable label, based on where the usages originate from different time periods. In
whether the word has undergone a change in meaning these graphs, the nodes correspond to the word uses and
or not; and (2) a graded (ranking) task, which requires the edges correspond to the median of the annotations.
the sorting of words based on the extent of their change The diachronic graph is then subjected to clustering in
(over time). These labels are assigned on the basis of order to identify the senses. Before clustering, a new
human-annotated data, typically in the form of a graded graph is created by binarizing the edges, where an edge
word-in-context task. between two uses is established if the score of the origi-
DIACR-Ita, however, diverges from the evaluation pro- nal edge weight is less than 2.5, or in other words if the
cess employed in SemEval 2020 Task 1, RushiftEval and average annotation for this pair of uses is less than 2.5.
several other datasets that emerged subsequently. This Since the graph typically exhibits considerable sparsity,
results in a distinct configuration of the task and the re- which limits the applicability of conventional clustering
leased data. For example, DIACR-Ita only has a binary algorithms, a variation of the correlation clustering al-
task but does not include a graded task. Moreover, only gorithm [22] is typically used, as it is able to model this
the target words with their gold truth labels were made type of sparsely connected graph.
available for the shared task, while the remaining data Once the (diachronic) clusters have been obtained, they
produced during the annotation process were not. In this can be considered to represent the senses. The distribu-
paper, tion of the usages from different time periods in each
1 cluster (sense) is then analyzed to obtain a change score.
1. we release DWUGs-IT , a new dataset for Lexical
For instance, one can determine a graded change score
Semantic Change Detection for Italian, which:
by computing the Jensen-Shannon Distance (JSD) on the
• extends the original DIACR-ITA with 12 probability distributions of senses across various time
new words; periods. This is expressed as
• provides sense-annotated usages with √︂
the respective sense labels 𝐷(𝑃 || 𝑀 ) + 𝐷(𝑄 || 𝑀 )
• standardizes DIACR-ITA providing the 2
data in the DURel format [15, 16, 17] where 𝑃 and 𝑄 represent the probability distributions of
• introduces the first LSC graded task for clusters from different historical periods, 𝐷 denotes the
Italian Kullback-Leibler divergence, and 𝑀 = (𝑃 +𝑄) 2
[23, 24].
Furthermore, a binary label can be obtained, whereby
2. we evaluate DWUGs-IT using XL-LEXEME[18],
words that have undergone a change in meaning over
the state-of-the-art model for Lexical Semantic
time are assigned a changed label (words that have
Change Detection [19]
gained/lost a sense), while words that have retained their
meaning are labelled stable. The label is typically as-
2. Related Work signed by evaluating the frequency of senses in different
time periods and establishing thresholds to distinguish
DURel [15] is a framework for the annotation of Lexical stable and changed words.
Semantic Change across a pair of time periods or corpora.
The annotation involves human labelling of pairs of sen- Datasets based on DURel SemEval 2020 Task 1 [10]
tences containing the target word. The sentences can be is the first initiative to standardize the evaluation of com-
contemporary, i.e. originating from the same time period, putational approaches to semantic change. SemEval 2020
or diachronic, denoting a divergence in time between Task 1 focuses on English, German, Swedish and Latin
the two periods under consideration. An annotator has and proposes a common evaluation framework with two
to decide whether the meaning expressed by the word tasks: classifying target words as those whose meaning
in the two sentences is Unrelated (1), Distantly Related has changed or remained stable, and ranking words ac-
cording to their degree of change. Special attention is
1
DWUGs-IT is available on Zenodo https://zenodo.org/records/ given to Latin due to the lack of native speakers. There-
13941618. fore, in the annotation of the Latin dataset, usage-sense
pairs are considered rather than usage-usage pairs, and tionary for the period 1948-1970 (Group 1) and the new
the annotator is asked to decide how related the consid- senses introduced after 1970 (Group 2). The annotators
ered usage is to a particular sense, using the DURel scale were required to determine whether the sense of each
from Unrelated to Identical. RuShiftEval [11] aimed to word usage belonged to Group 1, Group 2, or to another
detect semantic shifts in Russian across pre-Soviet, So- category if the word sense did not align with either group
viet, and post-Soviet periods. The dataset included 111 (Other). Additionally, the annotator may indicate a pref-
Russian nouns, with participants ranking them by their erence of Cannot decide for the uses in which they were
degree of change (using the COMPARE measure [15], an uncertain. Five annotators fluent in Italian annotated
approximation of the JSD). The task focused on ranking DIACR-ITA. Each sentence was annotated by two an-
changes, with evaluations based on Spearman rank cor- notators. The disagreement cases were resolved by the
relations. LSC Discovery [13] focused on detecting and two annotators involved, analyzing the disagreement and
discovering semantic changes in Spanish. It is divided deciding on an unambiguous label.
into Graded Change Discovery and Binary Change De- Each target word was labelled as stable or changed. A
tection. The task required evaluations for all vocabulary word was considered changed if there was at least one
words in the corpus, covering periods from 1810-1906 instance of Group 2 among the extracted usages from the
and 1994-2020. NorDiaChange [25] studied diachronic period between 1990 and 2014 and no instances of Group
semantic change in Norwegian. The dataset included 2 among the extracted usages from the period between
80 nouns reflecting significant historical periods, such 1948 and 1970. The final dataset consisted of 18 words,
as pre- and post-war events and technological advances. of which 6 were changed and 12 were stable.
ZhShiftEval [26, 27] assessed semantic change in Chinese
over 50 years, focusing on the period around Reform and Corpus Period #Tokens
Opening Up. The dataset used texts from the People’s L’Unità 1948-1970 52,287,734
Daily and included 20 words chosen for their frequency L’Unità 1990-2014 196,539,403
and noted changes. Table 1
Sub-corpora statistics.
3. DIACR-ITA
The DIACR-ITA annotation was conducted on word us-
ages extracted from L’Unità corpus [28]. L’Unità corpus 4. DWUGs-IT
comprises a collection of Italian texts extracted from the
DWUGs-IT builds on the DIACR-ITA dataset, adapting
newspaper L’Unità. In order to evaluate semantic change,
it to the DURel format and adding eight new words. It
the corpus has been divided into two sub-corpora, cover-
also provides the usage-sense annotated pairs that were
ing the period from 1948 to 1970 and the period from 1990
not initially released, as summarized in Table 2. For each
to 2014, respectively. A time window of 20 years between
target word, we format the annotated usages following
the sub-corpora ensures sufficient distance between the
the WUG style, including the time period of the usage
two periods, allowing for the tracking of potentially more
and the word’s position in the sentence. Similarly, we
pronounced semantic changes. The sub-corpora statistics
format and release the annotated sense labels in a way
are presented in Table 1.
similar to DWUG LA [29].
The selection of target words was based on the in-
Unlike the traditional WUG approach, where sense
formation provided in the Sabatini-Coletti dictionary of
preference is not explicitly marked, in DIACR-ITA, anno-
the Italian language, which records the year of the first
tators clearly indicate their preference for one sense over
occurrence of a word’s sense. The initial step involved
others. For example, in the usage of the word api (Italian
the extraction of a list of words from Sabatini-Coletti for
for bees), in the sentence “Dalle api un dolce dono” (“From
which the dictionary reported a semantic change, i.e. the
bees, a sweet gift”), the annotators choose the sense insect
introduction of at least one new sense after 1970. More-
while discarding the alternative sense means of transport.
over, an examination of the set of words was conducted
For each use-sense pair not selected by annotators, a
to ensure that the sampled words appeared at least 10
rating of 1 (Unrelated) is assigned, while matched pairs
times in each of the two periods and that the occurrences
receive a rating of 4 (Identical), in line with the DURel
of these words were not significantly affected by OCR
scale.
errors. Consequently, 26 target words were identified.
Since human annotators already provide the sense
For each target word, up to 50 occurrences from each of
labels, we do not cluster usages automatically (as is typi-
the two sub-corpora were extracted.
cally done in the WUG approach), but directly assign the
The senses of each word were classified into two
annotated meanings. All subsequent calculations, such as
groups: the senses recorded in the Sabatini-Coletti dic-
Lemma Group 1 Group 2 Other
ultima Che viene dopo tutti gli altri in una serie numerica, in una Nel l. fam., l’ultima cosa; la novità, la notizia più recente: la
classifica, in una graduatoria o in una successione spaziale o sai l’ultima?
temporale
emulare Prendere qlcu. a modello, imitarne meriti e virtù: e. i genitori, ambito informatico
le imprese di uno scalatore
affido Affidamento di un minore ✔
bombetta S1. Cappello maschile di feltro rigido a cupola con tese corte S2. Fialetta puzzolente che i ragazzi lanciano per diverti- ✔
leggermente rialzate ai lati mento per strada o in ambienti chiusi
cantieristica maschile - Di cantiere, relativo ai cantieri: il settore c. oppurre Attività di costruzione, riparazione navale
con riferimento al cantiere
fondista Giornalista che scrive l’articolo di fondo su un quotidiano - Nel gergo della finanza, sottoscrittore di fondi di investi- ✔
Atleta mento
portatile Che può essere trasportato agevolmente da una persona: Piccolo computer facilmente trasportabile, funzionante an-
televisore p. che a batteria e quindi utilizzabile in viaggio - telefono por-
tatile
impegnativa agg. che richiede impegno Dichiarazione con cui si assume un impegno; in partic. nel l.
burocr., documento con cui un ente mutualistico si impegna
a coprire, nella misura prevista dalla legge, le spese sanitarie
di un suo iscritto: fare l’i. per le analisi
Table 2
Newly introduced words together with the senses of Group 1 (1948-1970), Group 2 which involves senses introduced after 1970,
and an indication of the presence of other senses not listed in Group 1 and Group 2.
change scores and related statistics, follow the standard Sense Induction step, we cluster the vectors into senses
WUG methodology. using Agglomerative Clustering 2 with a cosine threshold
of 0.5 and Average Linkage, which merges clusters with
a similarity greater than 0.5.
5. Evaluation
XL-LEXEME has been tested on different languages be- 5.2. Metrics
fore but has never been evaluated on Italian. In this sec- We test the ability of XL-LEXEME in ranking words ac-
tion, we evaluate XL-LEXEME on the new DWUGs-IT cording to their change scores (Graded Change Detection)
dataset using the traditional evaluation pipeline for the using Spearman Correlation. Cluster quality is assessed
DWUGs [19, 30]. We assess the ability to derive a reliable using the Adjusted Rand Index (ARI) [34], which is de-
change score (Graded Change Detection) and evaluate fined as follows:
the possibility of clustering the XL-LEXEME vectors to
automatically induce target word senses, which are then 𝑅𝐼 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝐼
𝐴𝑅𝐼 =
compared to the DWUGs-IT annotations via the Adjusted 𝑚𝑎𝑥(𝑅𝐼) − 𝐸𝑥𝑝𝑒𝑡𝑒𝑑𝑅𝐼
Rand Index and the Purity measure.
𝑅𝐼 stands for the Rand Index, which measures the num-
ber of pair agreements within the data – that is, pairs
5.1. Model of instances that are correctly placed in the same clus-
XL-LEXEME, built on XLM-RoBERTa large [31], is ter. The 𝐸𝑥𝑝𝑒𝑡𝑐𝑡𝑒𝑑𝑅𝐼 is the expected number of such
trained for the Word-in-Context (WiC) task [32], which agreements by chance, calculated based on the distribu-
determines if a word has the same meaning in two sen- tion of the clusters, while the 𝑚𝑎𝑥(𝑅𝐼) is the maximum
tences. Using a Siamese architecture [33], it creates word possible value of 𝑅𝐼, which occurs when all pairs are
vectors. The loss function adjusts weights via cosine classified perfectly. We use Purity in addition to ARI to
distance, aligning vectors for the same meanings and capture cluster homogeneity and provide clearer insight
separating them for different meanings. To calculate the about how mixed the clusters are in terms of class labels,
change score, a classic approach is to use the Average i.e.
1 ∑︁
Pairwise Distance between the vectors computed over Purity = max |𝑐𝑘 ∩ 𝑡𝑗 |
𝑁 𝑗
the two different periods: 𝑘
𝑁 𝑀
where 𝑁 is the total number of instances, 𝑐𝑘 denotes
LSC(𝑠𝑡𝑤0 , 𝑠𝑡𝑤1 ) =
1 ∑︁ ∑︁
𝛿(𝑠𝑡𝑤,𝑖
0
, 𝑠𝑡𝑤,𝑗
1
) (1) cluster 𝑘, and 𝑡𝑗 represents class 𝑗.
𝑁 · 𝑀 𝑖=0 𝑗=0
2
https://scikit-learn.org/stable/modules/generated/sklearn.
where 𝛿 is the cosine distance and 𝑠𝑡𝑤 is the set of sen- cluster.AgglomerativeClustering.html#sklearn.cluster.
tences containing the word 𝑤 at time 𝑡. For the Word AgglomerativeClustering
fied an additional meaning (Other) that refers to a named
entity, i.e., Il barone rampante written by Italo Calvino.
The instances of Il barone rampante fall in the middle of
the cluster of the rearing and ambitious meanings. Inter-
estingly, the only instance annotated as Cannot decide
falls in the rearing cluster:
Uno rampante » non ci aia ancora nulla da fare,
comunque i tecnici....supremazia di le Ferrari. (en. A
rampant » there is still nothing to be done, in any case the
technicians.... supremacy of the Ferrari.)
(a) palmare
This instance is ambiguous since the subject of rampante
is missing in the sentence. However, interestingly, XL-
LEXEME assumes it to have the rearing meaning, proba-
bly due to the presence of the word Ferrari, referring to
the Ferrari logo. Figure 1c shows how the embeddings
of the usages of pilotato are perfectly split according to
the sense labels. However, one instance of the meaning
driven falls in the cluster of the manipulated instances,
which can be considered ambiguous and open to inter-
pretation:
(b) rampante Twingo Easy offre la grande comodità di un cambio con
frizione pilotata, ovvero: non c’ è più il pedale della
frizione. (en. Twingo Easy offers the great convenience of a
gearbox with a piloted clutch, that is: there is no longer a
clutch pedal.)
The quantitative results of XL-LEXEME are reported
in Table 3. Compared to LSCD benchmarks in other lan-
guages, XL-LEXEME shows similar results for the GCD
score (ranging from 0.567 in NO to 0.851 in RU) and the
ARI score (ranging from 0.249 in SV to 0.400 in ES). It also
performs slightly better using the purity measure (rang-
(c) pilotato
ing from 0.766 in SV to 0.836 in ZH). These results likely
stem from the properties of the dataset that includes sev-
Figure 1: t-SNE visualization of XL-LEXEME embeddings with eral monosemous words, but also from the process that
respect to the annotated clusters for changed words palmare, has been used for DWUGs-IT where senses are modeled
rampante, and pilotato. explicitly. Purity measures the extent to which clusters
contain a single class. With many monosemous words,
achieving high purity is easier since these words inher-
5.3. Results ently belong to one sense group. ARI, on the other hand,
evaluates the similarity between the clustering results
We begin to discuss qualitative results. Figure 1 illustrates and the ground truth, accounting for both the clustering
the t-SNE visualization of XL-LEXEME embeddings for quality and the number of clusters. In DWUGs-IT, most
the usages of the words palmare, rampante, and pilotato. groups of word senses have just one meaning. But some-
For palmare (Figure 1a), the senses are well separated times, a group of words can have several meanings, and
except for some instances of the sense relating to the how often each meaning is used can change over time.
palm, clear, evident that are placed closer to the PDA For example, the word palmare has three meanings in its
device meaning, for example: Group 1: i) related to the palm of the hand, ii) something
sono state n levate di le impronte palmari che saranno that fits in your hand, and iii) something that is obvi-
inviate al1’ archivio generale segnaletico di Roma. (en. ous or clear. Over time, some of these meanings might
The palm prints have been removed and will be sent to the be used more or less often. However, because all three
general sign archive of Rome.) meanings are grouped together, DWUGs-IT does not
take into account how the use of each of those meanings
For the word rampante (Figure 1b), the annotators identi- changes over time. This broad categorization of senses
Graded Change Detection (Spearman Correlation) 0.51
Adjusted Rand Index (ARI) 0.28
References
Purity 0.89
[1] J. R. Firth, A synopsis of linguistic theory 1930-55.,
Table 3 Studies in linguistic analysis 1952-59 (1957) 1–32.
XL-LEXEME Results [2] P. Cassotti, A. Iovine, P. Basile, M. de Gemmis, G. Se-
meraro, Emerging trends in gender-specific occu-
pational titles in italian newspapers, in: E. Fersini,
can impact the performance of XL-LEXEME, which an- M. Passarotti, V. Patti (Eds.), Proceedings of the
alyzes meanings at a more detailed level. Additionally, Eighth Italian Conference on Computational Lin-
XL-LEXEME has been tested on different languages be- guistics, CLiC-it 2021, Milan, Italy, January 26-
fore but has never been evaluated on Italian. DWUGs-IT 28, 2022, volume 3033 of CEUR Workshop Proceed-
models senses explicitly, whereas previous datasets in- ings, CEUR-WS.org, 2021. URL: https://ceur-ws.org/
ferred senses automatically by comparing pairs of usages. Vol-3033/paper52.pdf.
This automatic inference process is similar to the ap- [3] A. Blank, Prinzipien des lexikalischen Bedeu-
proach XL-LEXEME uses, potentially making it better tungswandels am Beispiel der romanischen
suited for datasets without explicit sense modelling. Sprachen, volume 285, Walter de Gruyter, 2012.
[4] P. Cassotti, S. D. Pascale, N. Tahmasebi, Using syn-
chronic definitions and semantic relations to clas-
6. Conclusion sify semantic change types, in: L. Ku, A. Martins,
V. Srikumar (Eds.), Proceedings of the 62nd An-
This paper presents DWUGs-IT, an extension and stan-
nual Meeting of the Association for Computational
dardization of the Lexical Semantic Change Detection
Linguistics (Volume 1: Long Papers), ACL 2024,
(LSCD) task for Italian, based on the existing DIACR-ITA
Bangkok, Thailand, August 11-16, 2024, Association
dataset. The dataset is expanded with additional target
for Computational Linguistics, 2024, pp. 4539–4553.
words and its format is aligned with that of the resources
URL: https://doi.org/10.18653/v1/2024.acl-long.249.
used for other languages. This involves the introduc-
doi:10.18653/V1/2024.ACL-LONG.249.
tion of the first graded task for Italian. The standard-
[5] F. Periti, P. Cassotti, H. Dubossarsky, N. Tahmasebi,
ized dataset and the evaluation framework we provide
Analyzing semantic change through lexical replace-
can serve as a foundation for future research in LSCD
ments, in: L. Ku, A. Martins, V. Srikumar (Eds.),
for Italian. By aligning the Italian dataset with those of
Proceedings of the 62nd Annual Meeting of the As-
other languages, we facilitate cross-linguistic compar-
sociation for Computational Linguistics (Volume 1:
isons and contribute to the broader understanding of
Long Papers), ACL 2024, Bangkok, Thailand, Au-
semantic change mechanisms. In addition, we provide a
gust 11-16, 2024, Association for Computational
first evaluation of the state-of-the-art LSCD model, XL-
Linguistics, 2024, pp. 4495–4510. URL: https://doi.
LEXEME, for Italian and both show its effectiveness as
org/10.18653/v1/2024.acl-long.246. doi:10.18653/
well as set a baseline for future work.
V1/2024.ACL-LONG.246.
[6] N. Tahmasebi, L. Borin, A. Jatowt, Survey of com-
Acknowledgments putational approaches to lexical semantic change
detection, Computational approaches to semantic
This work has in part been funded by the research pro- change 6 (2021).
gram Change is Key! supported by Riksbankens Ju- [7] S. Montanelli, F. Periti, A Survey on Contextu-
bileumsfond (under reference number M21-0021). The alised Semantic Shift Detection, arXiv preprint
computational resources were provided by the National arXiv:2304.01666 (2023).
Academic Infrastructure for Supercomputing in Sweden [8] R. Navigli, Word Sense Disambiguation: A Survey,
(NAISS), partially funded by the Swedish Research Coun- ACM Comput. Surv. 41 (2009). URL: https://doi.org/
cil through grant agreement no. 2022-06725. 10.1145/1459352.1459355. doi:10.1145/1459352.
We acknowledge the support of the PNRR project FAIR - 1459355.
Future AI Research (PE00000013), Spoke 6 - Symbiotic AI [9] N. Tahmasebi, S. Montariol, H. Dubossarsky, A. Ku-
(CUP H97G22000210007) under the NRRP MUR program tuzov, S. Hengchen, D. Alfter, F. Periti, P. Cas-
funded by the NextGenerationEU. sotti (Eds.), Proceedings of the 4th Workshop on
We would also like to thank Tommaso Caselli, Annalina Computational Approaches to Historical Language
Caputo and Rossella Varvara, who contributed to the Change, Association for Computational Linguis-
development of the DIACR-ITA resource, and Dominik tics, Singapore, 2023. URL: https://aclanthology.org/
Schlechtweg for valuable feedback on a preliminary draft 2023.lchange-1.0.
of this work. [10] D. Schlechtweg, B. McGillivray, S. Hengchen, H. Du-
bossarsky, N. Tahmasebi, SemEval-2020 Task 1: man and computational measurement of seman-
Unsupervised Lexical Semantic Change Detection, tic proximity, sense clusters and semantic change,
in: A. Herbelot, X. Zhu, A. Palmer, N. Schnei- in: N. Aletras, O. D. Clercq (Eds.), Proceedings
der, J. May, E. Shutova (Eds.), Proceedings of of the 18th Conference of the European Chap-
the Fourteenth Workshop on Semantic Evalua- ter of the Association for Computational Linguis-
tion, SemEval@COLING2020, International Com- tics, EACL 2024 - System Demonstrations, St. Ju-
mittee for Computational Linguistics, Barcelona lians, Malta, March 17-22, 2024, Association for
(online), 2020, pp. 1–23. URL: https://www.aclweb. Computational Linguistics, 2024, pp. 137–149. URL:
org/anthology/2020.semeval-1.1/. https://aclanthology.org/2024.eacl-demo.15.
[11] A. Kutuzov, L. Pivovarova, RuShiftEval: A Shared [17] P. Sander, S. Hengchen, W. Zhao, X. Ma, E. Sköld-
Task on Semantic Shift Detection for Russian, in: berg, S. Virk, D. Schlechtweg, The durel annotation
Proc. of the International Conference on Compu- tool, in: Book of Abstracts of the Workshop Large
tational Linguistics and Intellectual Technologies Language Models and Lexicography, 8 October 2024
(Dialogue), 20, Redkollegija sbornika, (online), 2021. Cavtat, Croatia (ed. Simon Krek), 2024.
[12] P. Basile, A. Caputo, T. Caselli, P. Cassotti, R. Var- [18] P. Cassotti, L. Siciliani, M. DeGemmis, G. Semeraro,
vara, Diacr-ita @ EVALITA2020: overview of the P. Basile, XL-LEXEME: WiC pretrained model for
EVALITA2020 diachronic lexical semantics (diacr- cross-lingual LEXical sEMantic changE, in: Pro-
ita) task, in: V. Basile, D. Croce, M. D. Maro, L. C. ceedings of the 61st Annual Meeting of the Associa-
Passaro (Eds.), Proceedings of the Seventh Evalua- tion for Computational Linguistics (Volume 2: Short
tion Campaign of Natural Language Processing and Papers), Association for Computational Linguis-
Speech Tools for Italian. Final Workshop (EVALITA tics, Toronto, Canada, 2023, pp. 1577–1585. URL:
2020), Online event, December 17th, 2020, volume https://aclanthology.org/2023.acl-short.135. doi:10.
2765 of CEUR Workshop Proceedings, CEUR-WS.org, 18653/v1/2023.acl-short.135.
2020. URL: http://ceur-ws.org/Vol-2765/paper158. [19] F. Periti, N. Tahmasebi, A systematic comparison of
pdf. contextualized word embeddings for lexical seman-
[13] F. D. Zamora-Reina, F. Bravo-Marquez, tic change, in: K. Duh, H. Gomez, S. Bethard (Eds.),
D. Schlechtweg, LSCDiscovery: A Shared Proceedings of the 2024 Conference of the North
Task on Semantic Change Discovery and Detec- American Chapter of the Association for Compu-
tion in Spanish, in: Proc. of the Workshop on tational Linguistics: Human Language Technolo-
Computational Approaches to Historical Language gies (Volume 1: Long Papers), Association for Com-
Change (LChange), Association for Computational putational Linguistics, Mexico City, Mexico, 2024,
Linguistics (ACL), Dublin, Ireland, 2022, pp. pp. 4262–4282. URL: https://aclanthology.org/2024.
149–164. naacl-long.240.
[14] V. Basile, D. Croce, M. Di Maro, L. C. Passaro, [20] A. Blank, Why do new meanings occur? A cogni-
Evalita 2020: Overview of the 7th evaluation cam- tive typology of the motivations for lexical semantic
paign of natural language processing and speech change, Historical semantics and cognition (1999).
tools for italian, in: V. Basile, D. Croce, M. Di Maro, [21] D. Schlechtweg, N. Tahmasebi, S. Hengchen, H. Du-
L. C. Passaro (Eds.), Proceedings of Seventh Evalua- bossarsky, B. McGillivray, DWUG: A large Re-
tion Campaign of Natural Language Processing and source of Diachronic Word Usage Graphs in Four
Speech Tools for Italian. Final Workshop (EVALITA Languages, in: Annual Conference of the North
2020), CEUR.org, Online, 2020. American Chapter of the Association for Computa-
[15] D. Schlechtweg, S. S. im Walde, S. Eckmann, Di- tional Linguistics, (NAACL-HLT 2021), Association
achronic Usage Relatedness (DURel): A Framework for Computational Linguistics, Mexico City, Mexico,
for the Annotation of Lexical Semantic Change, in: 2021.
M. A. Walker, H. Ji, A. Stent (Eds.), Proceedings of [22] N. Bansal, A. Blum, S. Chawla, Correlation clus-
the 2018 Conference of the North American Chap- tering, Machine Learning 56 (2004) 89–113. doi:10.
ter of the Association for Computational Linguis- 1023/B:MACH.0000033116.57574.95.
tics: Human Language Technologies, NAACL-HLT, [23] J. Lin, Divergence measures based on the shannon
Volume 2 (Short Papers), Association for Compu- entropy, IEEE Transactions on Information Theory
tational Linguistics, New Orleans, Louisiana, USA, 37 (1991) 145–151.
2018, pp. 169–174. URL: https://doi.org/10.18653/ [24] G. Donoso, D. Sanchez, Dialectometric analysis of
v1/n18-2027. doi:10.18653/v1/n18-2027. language variation in twitter, in: Proceedings of the
[16] D. Schlechtweg, S. M. Virk, P. Sander, E. Sköld- Fourth Workshop on NLP for Similar Languages,
berg, L. T. Linke, T. Zhang, N. Tahmasebi, J. Kuhn, Varieties and Dialects, Valencia, Spain, 2017, pp.
S. S. im Walde, The durel annotation tool: Hu- 16–25.
[25] A. Kutuzov, S. Touileb, P. Mæhlum, T. R. Enstad, doi:10.18653/v1/2020.acl-main.747.
A. Wittemann, Nordiachange: Diachronic seman- [32] M. T. Pilehvar, J. Camacho-Collados, WiC: the
tic change dataset for norwegian, in: N. Calzolari, Word-in-Context Dataset for Evaluating Context-
F. Béchet, P. Blache, K. Choukri, C. Cieri, T. De- Sensitive Meaning Representations, in: J. Burstein,
clerck, S. Goggi, H. Isahara, B. Maegaard, J. Mar- C. Doran, T. Solorio (Eds.), Proceedings of the
iani, H. Mazo, J. Odijk, S. Piperidis (Eds.), Pro- 2019 Conference of the North American Chap-
ceedings of the Thirteenth Language Resources ter of the Association for Computational Linguis-
and Evaluation Conference, LREC 2022, Marseille, tics: Human Language Technologies, NAACL-HLT
France, 20-25 June 2022, European Language Re- 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol-
sources Association, 2022, pp. 2563–2572. URL: ume 1 (Long and Short Papers), Association for
https://aclanthology.org/2022.lrec-1.274. Computational Linguistics, 2019, pp. 1267–1273.
[26] J. Chen, E. Chersoni, C.-r. Huang, Lexicon of URL: https://doi.org/10.18653/v1/n19-1128. doi:10.
changes: Towards the evaluation of diachronic se- 18653/v1/n19-1128.
mantic shift in Chinese, in: N. Tahmasebi, S. Mon- [33] N. Reimers, I. Gurevych, Sentence-BERT: Sentence
tariol, A. Kutuzov, S. Hengchen, H. Dubossarsky, Embeddings using Siamese BERT-Networks, in:
L. Borin (Eds.), Proceedings of the 3rd Workshop Proceedings of the 2019 Conference on Empirical
on Computational Approaches to Historical Lan- Methods in Natural Language Processing and the
guage Change, Association for Computational Lin- 9th International Joint Conference on Natural Lan-
guistics, Dublin, Ireland, 2022, pp. 113–118. URL: guage Processing (EMNLP-IJCNLP), Association
https://aclanthology.org/2022.lchange-1.11. doi:10. for Computational Linguistics, Hong Kong, China,
18653/v1/2022.lchange-1.11. 2019, pp. 3982–3992. URL: https://aclanthology.org/
[27] J. Chen, E. Chersoni, D. Schlechtweg, J. Prokic, C.-R. D19-1410. doi:10.18653/v1/D19-1410.
Huang, Chiwug: Diachronic word usage graphs for [34] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil,
chinese (2023). URL: https://doi.org/10.5281/zenodo. A. Y. Zomaya, S. Foufou, A. Bouras, A sur-
10023263. doi:10.5281/zenodo.10023263. vey of clustering algorithms for big data: Taxon-
[28] P. Basile, A. Caputo, T. Caselli, P. Cassotti, R. Var- omy and empirical analysis, IEEE transactions
vara, A diachronic italian corpus based on "l’unità", on emerging topics in computing 2 (2014) 267–
in: J. Monti, F. Dell’Orletta, F. Tamburini (Eds.), 279. URL: https://ieeexplore.ieee.org/iel7/6245516/
Proceedings of the Seventh Italian Conference on 6939750/06832486.pdf.
Computational Linguistics, CLiC-it 2020, Bologna,
Italy, March 1-3, 2021, volume 2769 of CEUR Work-
shop Proceedings, CEUR-WS.org, 2020. URL: http:
//ceur-ws.org/Vol-2769/paper_44.pdf.
[29] B. McGillivray, D. Schlechtweg, H. Dubossarsky,
N. Tahmasebi, S. Hengchen, Dwug la: Diachronic
word usage graphs for latin (2021). URL: https:
//doi.org/10.5281/zenodo.5255228. doi:10.5281/
zenodo.5255228.
[30] D. Schlechtweg, F. D. Zamora-Reina, F. Bravo-
Marquez, N. Arefyev, Sense through time: di-
achronic word sense annotations for word sense
induction and lexical semantic change detec-
tion, Language Resources and Evaluation (2024).
URL: http://dx.doi.org/10.1007/s10579-024-09771-7.
doi:10.1007/s10579-024-09771-7.
[31] A. Conneau, K. Khandelwal, N. Goyal, V. Chaud-
hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised Cross-
lingual Representation Learning at Scale, in:
D. Jurafsky, J. Chai, N. Schluter, J. R. Tetreault
(Eds.), Proceedings of the 58th Annual Meeting
of the Association for Computational Linguistics,
ACL 2020, Online, July 5-10, 2020, Association for
Computational Linguistics, 2020, pp. 8440–8451.
URL: https://doi.org/10.18653/v1/2020.acl-main.747.