=Paper=
{{Paper
|id=Vol-3878/22_main_long
|storemode=property
|title=DWUGs-IT: Extending and Standardizing Lexical Semantic Change Detection for Italian
|pdfUrl=https://ceur-ws.org/Vol-3878/22_main_long.pdf
|volume=Vol-3878
|authors=Pierluigi Cassotti,Pierpaolo Basile,Nina Tahmasebi
|dblpUrl=https://dblp.org/rec/conf/clic-it/CassottiBT24
}}
==DWUGs-IT: Extending and Standardizing Lexical Semantic Change Detection for Italian==
<pdf width="1500px">https://ceur-ws.org/Vol-3878/22_main_long.pdf</pdf>
<pre>
                                DWUGs-IT: Extending and Standardizing Lexical Semantic
                                Change Detection for Italian
                                Pierluigi Cassotti1,* , Pierpaolo Basile2 and Nina Tahmasebi1
                                1
                                    University of Gothenburg, Department of Philosophy, Linguistics and Theory of Science, Gothenburg, Sweden
                                2
                                    University of Bari Aldo Moro, Department of Computer Science, via E. Orabona, 70125, Bari, Italy


                                                Abstract
                                                Lexical Semantic Change Detection (LSCD) is the task of determining whether a word has undergone a change in meaning
                                                over time. There has been a marked increase in interest in this task, accompanied by a corresponding growth in the scientific
                                                community involved in developing computational approaches to semantic change. In recent years, a number of resources
                                                have been made available for the evaluation of LSC models in a number of languages, including English, Swedish, German,
                                                Latin, Russian and Chinese. DIACR-ITA is the only existing resource for LSCD in Italian. However, DIACR-ITA has a different
                                                format from that used for other languages. In this paper, we present DWUGs-IT, which extends the DIACR-ITA dataset with
                                                additional target words and usage-sense pair annotations and adapts it to the DURel format, including the first implementation
                                                of a LSCD graded task for Italian.

                                                Keywords
                                                Lexical Semantic Change, Sense-annotated corpora, Italian, Historical Linguistics


                                1. Introduction                                                                                        acquired an auto-antonym meaning, i.e. a meaning that
                                                                                                                                       is the opposite of its original meaning. In addition to its
                                As is the case with both society and culture, language original connotation of poor quality or negative, it has
                                is subject to change over time. Two key factors cause also acquired the opposite connotation of good or cool.
                                such linguistic change. Firstly, there are purely evolu- The term meat has undergone a process of specialization
                                tionary and linguistic considerations driven by the need in its meaning, whereby it has shifted from referring to
                                for more efficient communication [1]. One example of any kind of food in general to exclusively denoting the
                                this is the use of abbreviations and acronyms, such as meat of animals consumed as food.
                                LOL (Laughing Out Loud), which have become common-                                                        While traditional linguistic methods are informative,
                                place on social media platforms. Secondly, changes in they are often based on small, carefully curated sam-
                                society and culture lead to changes in language. This can ples. In contrast, linguistic analyses using computational
                                be seen, for example, in the adoption of a more inclu- models not only accelerate our understanding of lan-
                                sive language, as evidenced by grammatically gendered guage change but also provide broader and more detailed
                                languages, including Italian and the introduction of @ to insights, thereby facilitating the study of vast corpora
                                replace masculine and feminine endings [2].                                                            across a wider range of genres and time [4, 5].
                                   Language may undergo alteration at various levels,                                                     From a computational perspective, two key challenges
                                including morphological, syntactic, and semantic. Se- emerge in the study of semantic change: the modelling
                                mantic change concerns the alteration of the meaning of word meanings over time and the detection of
                                of words over time. The study of semantic change is a change [6, 7]. At the synchronic level, ignoring the
                                prominent area of research in Historical Linguistics, with temporal dimension with a focus on modern corpora,
                                the aim of investigating the linguistic mechanisms that the Natural Language Processing community has made
                                characterize the change and the causes that trigger it. For significant strides in modelling word meanings, with ap-
                                instance, Blank [3] provides a broad study on the charac- proaches such as Word Sense Disambiguation (WSD)
                                terization of semantic change, identifying a number of [8] playing a pivotal role. Computational modelling of
                                different types of change, including metaphor, metonymy, semantic change introduces a significant level of com-
                                generalization, specialization, co-hyponym transfer and plexity, as it necessitates the handling of meanings that
                                auto-antonym. The English word bad, for example, has are either extinct or novel in comparison to existing lexi-
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, cographic resources, such as WordNet, as well as dynam-
                                Dec 04 — 06, 2024, Pisa, Italy                                                                         ically changing meaning representations.
                                *
                                  Corresponding author.                                                                                   In recent years, great efforts have been made to ad-
                                $ pierluigi.cassotti@gu.se (P. Cassotti); pierpaolo.basile@uniba.it vance the field of computational methods for Lexical
                                (P. Basile); nina.tahmasebi@gu.se (N. Tahmasebi)
                                                                                                                                       Semantic Change Detection. With initiatives such as
                                 0000-0001-7824-7167 (P. Cassotti); 0000-0002-0545-1105
                                (P. Basile); 0000-0003-1688-1845 (N. Tahmasebi)                                                        the Workshop on Computational Approaches to Histor-
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ical Language Change [9] promoting research in this
                                          Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
field or shared tasks such as SemEval 2020 Task 1 [10],       (2), Closely Related (3) or Identical (4). The scale of se-
RuShiftEval [11], DIACR-ITA [12], or LSCD Discovery           mantic relatedness is derived from the cognitive model
[13] leading to the development of the first evaluation re-   proposed by Blank [20] and corresponds to the values of
sources. DIACR-Ita, hosted in EVALITA 2020 [14], is the       Homonymy (1), Polysemy (2), Context Variance (3) and
first shared task specifically created for the evaluation     Identity (4).
of models for Lexical Semantic Change in Italian. The            The annotations are then presented in the form of a
majority of the evaluation resources follow a two-task ap-    graph, specifically a Word Usage Graph (WUGs) or a Di-
proach: (1) a binary task, which requires the assignment      achronic Word Usage Graph (DWUGs) [21] in cases
of a word to either the changed or stable label, based on     where the usages originate from different time periods. In
whether the word has undergone a change in meaning            these graphs, the nodes correspond to the word uses and
or not; and (2) a graded (ranking) task, which requires       the edges correspond to the median of the annotations.
the sorting of words based on the extent of their change      The diachronic graph is then subjected to clustering in
(over time). These labels are assigned on the basis of        order to identify the senses. Before clustering, a new
human-annotated data, typically in the form of a graded       graph is created by binarizing the edges, where an edge
word-in-context task.                                         between two uses is established if the score of the origi-
   DIACR-Ita, however, diverges from the evaluation pro-      nal edge weight is less than 2.5, or in other words if the
cess employed in SemEval 2020 Task 1, RushiftEval and         average annotation for this pair of uses is less than 2.5.
several other datasets that emerged subsequently. This        Since the graph typically exhibits considerable sparsity,
results in a distinct configuration of the task and the re-   which limits the applicability of conventional clustering
leased data. For example, DIACR-Ita only has a binary         algorithms, a variation of the correlation clustering al-
task but does not include a graded task. Moreover, only       gorithm [22] is typically used, as it is able to model this
the target words with their gold truth labels were made       type of sparsely connected graph.
available for the shared task, while the remaining data          Once the (diachronic) clusters have been obtained, they
produced during the annotation process were not. In this      can be considered to represent the senses. The distribu-
paper,                                                        tion of the usages from different time periods in each
                                1                             cluster (sense) is then analyzed to obtain a change score.
      1. we release DWUGs-IT , a new dataset for Lexical
                                                              For instance, one can determine a graded change score
         Semantic Change Detection for Italian, which:
                                                              by computing the Jensen-Shannon Distance (JSD) on the
             • extends the original DIACR-ITA with 12 probability distributions of senses across various time
                new words;                                    periods. This is expressed as
             • provides sense-annotated usages with                          √︂
                the respective sense labels                                     𝐷(𝑃 || 𝑀 ) + 𝐷(𝑄 || 𝑀 )
             • standardizes DIACR-ITA providing the                                         2
                data in the DURel format [15, 16, 17]         where 𝑃 and 𝑄 represent the probability distributions of
             • introduces the first LSC graded task for clusters from different historical periods, 𝐷 denotes the
                Italian                                       Kullback-Leibler divergence, and 𝑀 = (𝑃 +𝑄)   2
                                                                                                                [23, 24].
                                                                 Furthermore, a binary label can be obtained, whereby
      2. we evaluate DWUGs-IT using XL-LEXEME[18],
                                                              words that have undergone a change in meaning over
         the state-of-the-art model for Lexical Semantic
                                                              time are assigned a changed label (words that have
         Change Detection [19]
                                                              gained/lost a sense), while words that have retained their
                                                              meaning are labelled stable. The label is typically as-
2. Related Work                                               signed by evaluating the frequency of senses in different
                                                              time periods and establishing thresholds to distinguish
DURel [15] is a framework for the annotation of Lexical stable and changed words.
Semantic Change across a pair of time periods or corpora.
The annotation involves human labelling of pairs of sen- Datasets based on DURel SemEval 2020 Task 1 [10]
tences containing the target word. The sentences can be is the first initiative to standardize the evaluation of com-
contemporary, i.e. originating from the same time period, putational approaches to semantic change. SemEval 2020
or diachronic, denoting a divergence in time between Task 1 focuses on English, German, Swedish and Latin
the two periods under consideration. An annotator has and proposes a common evaluation framework with two
to decide whether the meaning expressed by the word tasks: classifying target words as those whose meaning
in the two sentences is Unrelated (1), Distantly Related has changed or remained stable, and ranking words ac-
                                                              cording to their degree of change. Special attention is
1
  DWUGs-IT is available on Zenodo https://zenodo.org/records/ given to Latin due to the lack of native speakers. There-
  13941618.                                                   fore, in the annotation of the Latin dataset, usage-sense
pairs are considered rather than usage-usage pairs, and       tionary for the period 1948-1970 (Group 1) and the new
the annotator is asked to decide how related the consid-      senses introduced after 1970 (Group 2). The annotators
ered usage is to a particular sense, using the DURel scale    were required to determine whether the sense of each
from Unrelated to Identical. RuShiftEval [11] aimed to        word usage belonged to Group 1, Group 2, or to another
detect semantic shifts in Russian across pre-Soviet, So-      category if the word sense did not align with either group
viet, and post-Soviet periods. The dataset included 111       (Other). Additionally, the annotator may indicate a pref-
Russian nouns, with participants ranking them by their        erence of Cannot decide for the uses in which they were
degree of change (using the COMPARE measure [15], an          uncertain. Five annotators fluent in Italian annotated
approximation of the JSD). The task focused on ranking        DIACR-ITA. Each sentence was annotated by two an-
changes, with evaluations based on Spearman rank cor-         notators. The disagreement cases were resolved by the
relations. LSC Discovery [13] focused on detecting and        two annotators involved, analyzing the disagreement and
discovering semantic changes in Spanish. It is divided        deciding on an unambiguous label.
into Graded Change Discovery and Binary Change De-               Each target word was labelled as stable or changed. A
tection. The task required evaluations for all vocabulary     word was considered changed if there was at least one
words in the corpus, covering periods from 1810-1906          instance of Group 2 among the extracted usages from the
and 1994-2020. NorDiaChange [25] studied diachronic           period between 1990 and 2014 and no instances of Group
semantic change in Norwegian. The dataset included            2 among the extracted usages from the period between
80 nouns reflecting significant historical periods, such      1948 and 1970. The final dataset consisted of 18 words,
as pre- and post-war events and technological advances.       of which 6 were changed and 12 were stable.
ZhShiftEval [26, 27] assessed semantic change in Chinese
over 50 years, focusing on the period around Reform and                   Corpus         Period      #Tokens
Opening Up. The dataset used texts from the People’s                      L’Unità       1948-1970    52,287,734
Daily and included 20 words chosen for their frequency                    L’Unità       1990-2014   196,539,403
and noted changes.                                            Table 1
                                                              Sub-corpora statistics.

3. DIACR-ITA
The DIACR-ITA annotation was conducted on word us-
ages extracted from L’Unità corpus [28]. L’Unità corpus       4. DWUGs-IT
comprises a collection of Italian texts extracted from the
                                                              DWUGs-IT builds on the DIACR-ITA dataset, adapting
newspaper L’Unità. In order to evaluate semantic change,
                                                              it to the DURel format and adding eight new words. It
the corpus has been divided into two sub-corpora, cover-
                                                              also provides the usage-sense annotated pairs that were
ing the period from 1948 to 1970 and the period from 1990
                                                              not initially released, as summarized in Table 2. For each
to 2014, respectively. A time window of 20 years between
                                                              target word, we format the annotated usages following
the sub-corpora ensures sufficient distance between the
                                                              the WUG style, including the time period of the usage
two periods, allowing for the tracking of potentially more
                                                              and the word’s position in the sentence. Similarly, we
pronounced semantic changes. The sub-corpora statistics
                                                              format and release the annotated sense labels in a way
are presented in Table 1.
                                                              similar to DWUG LA [29].
   The selection of target words was based on the in-
                                                                 Unlike the traditional WUG approach, where sense
formation provided in the Sabatini-Coletti dictionary of
                                                              preference is not explicitly marked, in DIACR-ITA, anno-
the Italian language, which records the year of the first
                                                              tators clearly indicate their preference for one sense over
occurrence of a word’s sense. The initial step involved
                                                              others. For example, in the usage of the word api (Italian
the extraction of a list of words from Sabatini-Coletti for
                                                              for bees), in the sentence “Dalle api un dolce dono” (“From
which the dictionary reported a semantic change, i.e. the
                                                              bees, a sweet gift”), the annotators choose the sense insect
introduction of at least one new sense after 1970. More-
                                                              while discarding the alternative sense means of transport.
over, an examination of the set of words was conducted
                                                              For each use-sense pair not selected by annotators, a
to ensure that the sampled words appeared at least 10
                                                              rating of 1 (Unrelated) is assigned, while matched pairs
times in each of the two periods and that the occurrences
                                                              receive a rating of 4 (Identical), in line with the DURel
of these words were not significantly affected by OCR
                                                              scale.
errors. Consequently, 26 target words were identified.
                                                                 Since human annotators already provide the sense
For each target word, up to 50 occurrences from each of
                                                              labels, we do not cluster usages automatically (as is typi-
the two sub-corpora were extracted.
                                                              cally done in the WUG approach), but directly assign the
   The senses of each word were classified into two
                                                              annotated meanings. All subsequent calculations, such as
groups: the senses recorded in the Sabatini-Coletti dic-
 Lemma           Group 1                                                                    Group 2                                                             Other
 ultima          Che viene dopo tutti gli altri in una serie numerica, in una               Nel l. fam., l’ultima cosa; la novità, la notizia più recente: la
                 classifica, in una graduatoria o in una successione spaziale o             sai l’ultima?
                 temporale
 emulare         Prendere qlcu. a modello, imitarne meriti e virtù: e. i genitori,          ambito informatico
                 le imprese di uno scalatore
 affido                                                                                     Affidamento di un minore                                             ✔
 bombetta        S1. Cappello maschile di feltro rigido a cupola con tese corte             S2. Fialetta puzzolente che i ragazzi lanciano per diverti-          ✔
                 leggermente rialzate ai lati                                               mento per strada o in ambienti chiusi
 cantieristica   maschile - Di cantiere, relativo ai cantieri: il settore c. oppurre        Attività di costruzione, riparazione navale
                 con riferimento al cantiere
 fondista        Giornalista che scrive l’articolo di fondo su un quotidiano -              Nel gergo della finanza, sottoscrittore di fondi di investi-         ✔
                 Atleta                                                                     mento
 portatile       Che può essere trasportato agevolmente da una persona:                     Piccolo computer facilmente trasportabile, funzionante an-
                 televisore p.                                                              che a batteria e quindi utilizzabile in viaggio - telefono por-
                                                                                            tatile
 impegnativa     agg. che richiede impegno                                                  Dichiarazione con cui si assume un impegno; in partic. nel l.
                                                                                            burocr., documento con cui un ente mutualistico si impegna
                                                                                            a coprire, nella misura prevista dalla legge, le spese sanitarie
                                                                                            di un suo iscritto: fare l’i. per le analisi

Table 2
Newly introduced words together with the senses of Group 1 (1948-1970), Group 2 which involves senses introduced after 1970,
and an indication of the presence of other senses not listed in Group 1 and Group 2.


change scores and related statistics, follow the standard                              Sense Induction step, we cluster the vectors into senses
WUG methodology.                                                                       using Agglomerative Clustering 2 with a cosine threshold
                                                                                       of 0.5 and Average Linkage, which merges clusters with
                                                                                       a similarity greater than 0.5.
5. Evaluation
XL-LEXEME has been tested on different languages be-                                   5.2. Metrics
fore but has never been evaluated on Italian. In this sec-                             We test the ability of XL-LEXEME in ranking words ac-
tion, we evaluate XL-LEXEME on the new DWUGs-IT                                        cording to their change scores (Graded Change Detection)
dataset using the traditional evaluation pipeline for the                              using Spearman Correlation. Cluster quality is assessed
DWUGs [19, 30]. We assess the ability to derive a reliable                             using the Adjusted Rand Index (ARI) [34], which is de-
change score (Graded Change Detection) and evaluate                                    fined as follows:
the possibility of clustering the XL-LEXEME vectors to
automatically induce target word senses, which are then                                                                𝑅𝐼 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝐼
                                                                                                        𝐴𝑅𝐼 =
compared to the DWUGs-IT annotations via the Adjusted                                                                𝑚𝑎𝑥(𝑅𝐼) − 𝐸𝑥𝑝𝑒𝑡𝑒𝑑𝑅𝐼
Rand Index and the Purity measure.
                                                           𝑅𝐼 stands for the Rand Index, which measures the num-
                                                           ber of pair agreements within the data – that is, pairs
5.1. Model                                                 of instances that are correctly placed in the same clus-
XL-LEXEME, built on XLM-RoBERTa large [31], is ter. The 𝐸𝑥𝑝𝑒𝑡𝑐𝑡𝑒𝑑𝑅𝐼 is the expected number of such
trained for the Word-in-Context (WiC) task [32], which agreements by chance, calculated based on the distribu-
determines if a word has the same meaning in two sen- tion of the clusters, while the 𝑚𝑎𝑥(𝑅𝐼) is the maximum
tences. Using a Siamese architecture [33], it creates word possible value of 𝑅𝐼, which occurs when all pairs are
vectors. The loss function adjusts weights via cosine classified perfectly. We use Purity in addition to ARI to
distance, aligning vectors for the same meanings and capture cluster homogeneity and provide clearer insight
separating them for different meanings. To calculate the about how mixed the clusters are in terms of class labels,
change score, a classic approach is to use the Average i.e.
                                                                                  1 ∑︁
Pairwise Distance between the vectors computed over                     Purity =         max |𝑐𝑘 ∩ 𝑡𝑗 |
                                                                                  𝑁        𝑗
the two different periods:                                                            𝑘

                                     𝑁 𝑀
                                                                                       where 𝑁 is the total number of instances, 𝑐𝑘 denotes
      LSC(𝑠𝑡𝑤0 , 𝑠𝑡𝑤1 ) =
                                1 ∑︁ ∑︁
                                            𝛿(𝑠𝑡𝑤,𝑖
                                                 0
                                                    , 𝑠𝑡𝑤,𝑗
                                                         1
                                                            )               (1)        cluster 𝑘, and 𝑡𝑗 represents class 𝑗.
                              𝑁 · 𝑀 𝑖=0 𝑗=0
                                                                                       2
                                                                                           https://scikit-learn.org/stable/modules/generated/sklearn.
where 𝛿 is the cosine distance and 𝑠𝑡𝑤 is the set of sen-                                  cluster.AgglomerativeClustering.html#sklearn.cluster.
tences containing the word 𝑤 at time 𝑡. For the Word                                       AgglomerativeClustering
                                                              fied an additional meaning (Other) that refers to a named
                                                              entity, i.e., Il barone rampante written by Italo Calvino.
                                                              The instances of Il barone rampante fall in the middle of
                                                              the cluster of the rearing and ambitious meanings. Inter-
                                                              estingly, the only instance annotated as Cannot decide
                                                              falls in the rearing cluster:
                                                                  Uno rampante » non ci aia ancora nulla da fare,
                                                                comunque i tecnici....supremazia di le Ferrari. (en. A
                                                              rampant » there is still nothing to be done, in any case the
                                                                     technicians.... supremacy of the Ferrari.)
                        (a) palmare
                                                              This instance is ambiguous since the subject of rampante
                                                              is missing in the sentence. However, interestingly, XL-
                                                              LEXEME assumes it to have the rearing meaning, proba-
                                                              bly due to the presence of the word Ferrari, referring to
                                                              the Ferrari logo. Figure 1c shows how the embeddings
                                                              of the usages of pilotato are perfectly split according to
                                                              the sense labels. However, one instance of the meaning
                                                              driven falls in the cluster of the manipulated instances,
                                                              which can be considered ambiguous and open to inter-
                                                              pretation:

                       (b) rampante                            Twingo Easy offre la grande comodità di un cambio con
                                                                  frizione pilotata, ovvero: non c’ è più il pedale della
                                                              frizione. (en. Twingo Easy offers the great convenience of a
                                                              gearbox with a piloted clutch, that is: there is no longer a
                                                                                       clutch pedal.)
                                                                 The quantitative results of XL-LEXEME are reported
                                                              in Table 3. Compared to LSCD benchmarks in other lan-
                                                              guages, XL-LEXEME shows similar results for the GCD
                                                              score (ranging from 0.567 in NO to 0.851 in RU) and the
                                                              ARI score (ranging from 0.249 in SV to 0.400 in ES). It also
                                                              performs slightly better using the purity measure (rang-
                          (c) pilotato
                                                              ing from 0.766 in SV to 0.836 in ZH). These results likely
                                                              stem from the properties of the dataset that includes sev-
Figure 1: t-SNE visualization of XL-LEXEME embeddings with eral monosemous words, but also from the process that
respect to the annotated clusters for changed words palmare, has been used for DWUGs-IT where senses are modeled
rampante, and pilotato.                                       explicitly. Purity measures the extent to which clusters
                                                              contain a single class. With many monosemous words,
                                                              achieving high purity is easier since these words inher-
5.3. Results                                                  ently belong to one sense group. ARI, on the other hand,
                                                              evaluates the similarity between the clustering results
We begin to discuss qualitative results. Figure 1 illustrates and the ground truth, accounting for both the clustering
the t-SNE visualization of XL-LEXEME embeddings for quality and the number of clusters. In DWUGs-IT, most
the usages of the words palmare, rampante, and pilotato. groups of word senses have just one meaning. But some-
For palmare (Figure 1a), the senses are well separated times, a group of words can have several meanings, and
except for some instances of the sense relating to the how often each meaning is used can change over time.
palm, clear, evident that are placed closer to the PDA For example, the word palmare has three meanings in its
device meaning, for example:                                  Group 1: i) related to the palm of the hand, ii) something
  sono state n levate di le impronte palmari che saranno      that fits in your hand, and iii) something that is obvi-
  inviate al1’ archivio generale segnaletico di Roma. (en.    ous or clear. Over time, some of these meanings might
The palm prints have been removed and will be sent to the be used more or less often. However, because all three
               general sign archive of Rome.)                 meanings are grouped together, DWUGs-IT does not
                                                              take into account how the use of each of those meanings
For the word rampante (Figure 1b), the annotators identi- changes over time. This broad categorization of senses
 Graded Change Detection (Spearman Correlation)     0.51
 Adjusted Rand Index (ARI)                          0.28
                                                             References
 Purity                                             0.89
                                                              [1] J. R. Firth, A synopsis of linguistic theory 1930-55.,
Table 3                                                           Studies in linguistic analysis 1952-59 (1957) 1–32.
XL-LEXEME Results                                             [2] P. Cassotti, A. Iovine, P. Basile, M. de Gemmis, G. Se-
                                                                  meraro, Emerging trends in gender-specific occu-
                                                                  pational titles in italian newspapers, in: E. Fersini,
can impact the performance of XL-LEXEME, which an-                M. Passarotti, V. Patti (Eds.), Proceedings of the
alyzes meanings at a more detailed level. Additionally,           Eighth Italian Conference on Computational Lin-
XL-LEXEME has been tested on different languages be-              guistics, CLiC-it 2021, Milan, Italy, January 26-
fore but has never been evaluated on Italian. DWUGs-IT            28, 2022, volume 3033 of CEUR Workshop Proceed-
models senses explicitly, whereas previous datasets in-           ings, CEUR-WS.org, 2021. URL: https://ceur-ws.org/
ferred senses automatically by comparing pairs of usages.         Vol-3033/paper52.pdf.
This automatic inference process is similar to the ap-        [3] A. Blank, Prinzipien des lexikalischen Bedeu-
proach XL-LEXEME uses, potentially making it better               tungswandels am Beispiel der romanischen
suited for datasets without explicit sense modelling.             Sprachen, volume 285, Walter de Gruyter, 2012.
                                                              [4] P. Cassotti, S. D. Pascale, N. Tahmasebi, Using syn-
                                                                  chronic definitions and semantic relations to clas-
6. Conclusion                                                     sify semantic change types, in: L. Ku, A. Martins,
                                                                  V. Srikumar (Eds.), Proceedings of the 62nd An-
This paper presents DWUGs-IT, an extension and stan-
                                                                  nual Meeting of the Association for Computational
dardization of the Lexical Semantic Change Detection
                                                                  Linguistics (Volume 1: Long Papers), ACL 2024,
(LSCD) task for Italian, based on the existing DIACR-ITA
                                                                  Bangkok, Thailand, August 11-16, 2024, Association
dataset. The dataset is expanded with additional target
                                                                  for Computational Linguistics, 2024, pp. 4539–4553.
words and its format is aligned with that of the resources
                                                                  URL: https://doi.org/10.18653/v1/2024.acl-long.249.
used for other languages. This involves the introduc-
                                                                  doi:10.18653/V1/2024.ACL-LONG.249.
tion of the first graded task for Italian. The standard-
                                                              [5] F. Periti, P. Cassotti, H. Dubossarsky, N. Tahmasebi,
ized dataset and the evaluation framework we provide
                                                                  Analyzing semantic change through lexical replace-
can serve as a foundation for future research in LSCD
                                                                  ments, in: L. Ku, A. Martins, V. Srikumar (Eds.),
for Italian. By aligning the Italian dataset with those of
                                                                  Proceedings of the 62nd Annual Meeting of the As-
other languages, we facilitate cross-linguistic compar-
                                                                  sociation for Computational Linguistics (Volume 1:
isons and contribute to the broader understanding of
                                                                  Long Papers), ACL 2024, Bangkok, Thailand, Au-
semantic change mechanisms. In addition, we provide a
                                                                  gust 11-16, 2024, Association for Computational
first evaluation of the state-of-the-art LSCD model, XL-
                                                                  Linguistics, 2024, pp. 4495–4510. URL: https://doi.
LEXEME, for Italian and both show its effectiveness as
                                                                  org/10.18653/v1/2024.acl-long.246. doi:10.18653/
well as set a baseline for future work.
                                                                  V1/2024.ACL-LONG.246.
                                                              [6] N. Tahmasebi, L. Borin, A. Jatowt, Survey of com-
Acknowledgments                                                   putational approaches to lexical semantic change
                                                                  detection, Computational approaches to semantic
This work has in part been funded by the research pro-            change 6 (2021).
gram Change is Key! supported by Riksbankens Ju-              [7] S. Montanelli, F. Periti, A Survey on Contextu-
bileumsfond (under reference number M21-0021). The                alised Semantic Shift Detection, arXiv preprint
computational resources were provided by the National             arXiv:2304.01666 (2023).
Academic Infrastructure for Supercomputing in Sweden          [8] R. Navigli, Word Sense Disambiguation: A Survey,
(NAISS), partially funded by the Swedish Research Coun-           ACM Comput. Surv. 41 (2009). URL: https://doi.org/
cil through grant agreement no. 2022-06725.                       10.1145/1459352.1459355. doi:10.1145/1459352.
We acknowledge the support of the PNRR project FAIR -             1459355.
Future AI Research (PE00000013), Spoke 6 - Symbiotic AI       [9] N. Tahmasebi, S. Montariol, H. Dubossarsky, A. Ku-
(CUP H97G22000210007) under the NRRP MUR program                  tuzov, S. Hengchen, D. Alfter, F. Periti, P. Cas-
funded by the NextGenerationEU.                                   sotti (Eds.), Proceedings of the 4th Workshop on
We would also like to thank Tommaso Caselli, Annalina             Computational Approaches to Historical Language
Caputo and Rossella Varvara, who contributed to the               Change, Association for Computational Linguis-
development of the DIACR-ITA resource, and Dominik                tics, Singapore, 2023. URL: https://aclanthology.org/
Schlechtweg for valuable feedback on a preliminary draft          2023.lchange-1.0.
of this work.                                                [10] D. Schlechtweg, B. McGillivray, S. Hengchen, H. Du-
     bossarsky, N. Tahmasebi, SemEval-2020 Task 1:                  man and computational measurement of seman-
     Unsupervised Lexical Semantic Change Detection,                tic proximity, sense clusters and semantic change,
     in: A. Herbelot, X. Zhu, A. Palmer, N. Schnei-                 in: N. Aletras, O. D. Clercq (Eds.), Proceedings
     der, J. May, E. Shutova (Eds.), Proceedings of                 of the 18th Conference of the European Chap-
     the Fourteenth Workshop on Semantic Evalua-                    ter of the Association for Computational Linguis-
     tion, SemEval@COLING2020, International Com-                   tics, EACL 2024 - System Demonstrations, St. Ju-
     mittee for Computational Linguistics, Barcelona                lians, Malta, March 17-22, 2024, Association for
     (online), 2020, pp. 1–23. URL: https://www.aclweb.             Computational Linguistics, 2024, pp. 137–149. URL:
     org/anthology/2020.semeval-1.1/.                               https://aclanthology.org/2024.eacl-demo.15.
[11] A. Kutuzov, L. Pivovarova, RuShiftEval: A Shared          [17] P. Sander, S. Hengchen, W. Zhao, X. Ma, E. Sköld-
     Task on Semantic Shift Detection for Russian, in:              berg, S. Virk, D. Schlechtweg, The durel annotation
     Proc. of the International Conference on Compu-                tool, in: Book of Abstracts of the Workshop Large
     tational Linguistics and Intellectual Technologies             Language Models and Lexicography, 8 October 2024
     (Dialogue), 20, Redkollegija sbornika, (online), 2021.         Cavtat, Croatia (ed. Simon Krek), 2024.
[12] P. Basile, A. Caputo, T. Caselli, P. Cassotti, R. Var-    [18] P. Cassotti, L. Siciliani, M. DeGemmis, G. Semeraro,
     vara, Diacr-ita @ EVALITA2020: overview of the                 P. Basile, XL-LEXEME: WiC pretrained model for
     EVALITA2020 diachronic lexical semantics (diacr-               cross-lingual LEXical sEMantic changE, in: Pro-
     ita) task, in: V. Basile, D. Croce, M. D. Maro, L. C.          ceedings of the 61st Annual Meeting of the Associa-
     Passaro (Eds.), Proceedings of the Seventh Evalua-             tion for Computational Linguistics (Volume 2: Short
     tion Campaign of Natural Language Processing and               Papers), Association for Computational Linguis-
     Speech Tools for Italian. Final Workshop (EVALITA              tics, Toronto, Canada, 2023, pp. 1577–1585. URL:
     2020), Online event, December 17th, 2020, volume               https://aclanthology.org/2023.acl-short.135. doi:10.
     2765 of CEUR Workshop Proceedings, CEUR-WS.org,                18653/v1/2023.acl-short.135.
     2020. URL: http://ceur-ws.org/Vol-2765/paper158.          [19] F. Periti, N. Tahmasebi, A systematic comparison of
     pdf.                                                           contextualized word embeddings for lexical seman-
[13] F. D. Zamora-Reina,               F. Bravo-Marquez,            tic change, in: K. Duh, H. Gomez, S. Bethard (Eds.),
     D. Schlechtweg,           LSCDiscovery: A Shared               Proceedings of the 2024 Conference of the North
     Task on Semantic Change Discovery and Detec-                   American Chapter of the Association for Compu-
     tion in Spanish, in: Proc. of the Workshop on                  tational Linguistics: Human Language Technolo-
     Computational Approaches to Historical Language                gies (Volume 1: Long Papers), Association for Com-
     Change (LChange), Association for Computational                putational Linguistics, Mexico City, Mexico, 2024,
     Linguistics (ACL), Dublin, Ireland, 2022, pp.                  pp. 4262–4282. URL: https://aclanthology.org/2024.
     149–164.                                                       naacl-long.240.
[14] V. Basile, D. Croce, M. Di Maro, L. C. Passaro,           [20] A. Blank, Why do new meanings occur? A cogni-
     Evalita 2020: Overview of the 7th evaluation cam-              tive typology of the motivations for lexical semantic
     paign of natural language processing and speech                change, Historical semantics and cognition (1999).
     tools for italian, in: V. Basile, D. Croce, M. Di Maro,   [21] D. Schlechtweg, N. Tahmasebi, S. Hengchen, H. Du-
     L. C. Passaro (Eds.), Proceedings of Seventh Evalua-           bossarsky, B. McGillivray, DWUG: A large Re-
     tion Campaign of Natural Language Processing and               source of Diachronic Word Usage Graphs in Four
     Speech Tools for Italian. Final Workshop (EVALITA              Languages, in: Annual Conference of the North
     2020), CEUR.org, Online, 2020.                                 American Chapter of the Association for Computa-
[15] D. Schlechtweg, S. S. im Walde, S. Eckmann, Di-                tional Linguistics, (NAACL-HLT 2021), Association
     achronic Usage Relatedness (DURel): A Framework                for Computational Linguistics, Mexico City, Mexico,
     for the Annotation of Lexical Semantic Change, in:             2021.
     M. A. Walker, H. Ji, A. Stent (Eds.), Proceedings of      [22] N. Bansal, A. Blum, S. Chawla, Correlation clus-
     the 2018 Conference of the North American Chap-                tering, Machine Learning 56 (2004) 89–113. doi:10.
     ter of the Association for Computational Linguis-              1023/B:MACH.0000033116.57574.95.
     tics: Human Language Technologies, NAACL-HLT,             [23] J. Lin, Divergence measures based on the shannon
     Volume 2 (Short Papers), Association for Compu-                entropy, IEEE Transactions on Information Theory
     tational Linguistics, New Orleans, Louisiana, USA,             37 (1991) 145–151.
     2018, pp. 169–174. URL: https://doi.org/10.18653/         [24] G. Donoso, D. Sanchez, Dialectometric analysis of
     v1/n18-2027. doi:10.18653/v1/n18-2027.                         language variation in twitter, in: Proceedings of the
[16] D. Schlechtweg, S. M. Virk, P. Sander, E. Sköld-               Fourth Workshop on NLP for Similar Languages,
     berg, L. T. Linke, T. Zhang, N. Tahmasebi, J. Kuhn,            Varieties and Dialects, Valencia, Spain, 2017, pp.
     S. S. im Walde, The durel annotation tool: Hu-                 16–25.
[25] A. Kutuzov, S. Touileb, P. Mæhlum, T. R. Enstad,        doi:10.18653/v1/2020.acl-main.747.
     A. Wittemann, Nordiachange: Diachronic seman- [32] M. T. Pilehvar, J. Camacho-Collados, WiC: the
     tic change dataset for norwegian, in: N. Calzolari,     Word-in-Context Dataset for Evaluating Context-
     F. Béchet, P. Blache, K. Choukri, C. Cieri, T. De-      Sensitive Meaning Representations, in: J. Burstein,
     clerck, S. Goggi, H. Isahara, B. Maegaard, J. Mar-      C. Doran, T. Solorio (Eds.), Proceedings of the
     iani, H. Mazo, J. Odijk, S. Piperidis (Eds.), Pro-      2019 Conference of the North American Chap-
     ceedings of the Thirteenth Language Resources           ter of the Association for Computational Linguis-
     and Evaluation Conference, LREC 2022, Marseille,        tics: Human Language Technologies, NAACL-HLT
     France, 20-25 June 2022, European Language Re-          2019, Minneapolis, MN, USA, June 2-7, 2019, Vol-
     sources Association, 2022, pp. 2563–2572. URL:          ume 1 (Long and Short Papers), Association for
     https://aclanthology.org/2022.lrec-1.274.               Computational Linguistics, 2019, pp. 1267–1273.
[26] J. Chen, E. Chersoni, C.-r. Huang, Lexicon of           URL: https://doi.org/10.18653/v1/n19-1128. doi:10.
     changes: Towards the evaluation of diachronic se-       18653/v1/n19-1128.
     mantic shift in Chinese, in: N. Tahmasebi, S. Mon- [33] N. Reimers, I. Gurevych, Sentence-BERT: Sentence
     tariol, A. Kutuzov, S. Hengchen, H. Dubossarsky,        Embeddings using Siamese BERT-Networks, in:
     L. Borin (Eds.), Proceedings of the 3rd Workshop        Proceedings of the 2019 Conference on Empirical
     on Computational Approaches to Historical Lan-          Methods in Natural Language Processing and the
     guage Change, Association for Computational Lin-        9th International Joint Conference on Natural Lan-
     guistics, Dublin, Ireland, 2022, pp. 113–118. URL:      guage Processing (EMNLP-IJCNLP), Association
     https://aclanthology.org/2022.lchange-1.11. doi:10.     for Computational Linguistics, Hong Kong, China,
     18653/v1/2022.lchange-1.11.                             2019, pp. 3982–3992. URL: https://aclanthology.org/
[27] J. Chen, E. Chersoni, D. Schlechtweg, J. Prokic, C.-R.  D19-1410. doi:10.18653/v1/D19-1410.
     Huang, Chiwug: Diachronic word usage graphs for [34] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil,
     chinese (2023). URL: https://doi.org/10.5281/zenodo.    A. Y. Zomaya, S. Foufou, A. Bouras, A sur-
     10023263. doi:10.5281/zenodo.10023263.                  vey of clustering algorithms for big data: Taxon-
[28] P. Basile, A. Caputo, T. Caselli, P. Cassotti, R. Var-  omy and empirical analysis, IEEE transactions
     vara, A diachronic italian corpus based on "l’unità",   on emerging topics in computing 2 (2014) 267–
     in: J. Monti, F. Dell’Orletta, F. Tamburini (Eds.),     279. URL: https://ieeexplore.ieee.org/iel7/6245516/
     Proceedings of the Seventh Italian Conference on        6939750/06832486.pdf.
     Computational Linguistics, CLiC-it 2020, Bologna,
     Italy, March 1-3, 2021, volume 2769 of CEUR Work-
     shop Proceedings, CEUR-WS.org, 2020. URL: http:
     //ceur-ws.org/Vol-2769/paper_44.pdf.
[29] B. McGillivray, D. Schlechtweg, H. Dubossarsky,
     N. Tahmasebi, S. Hengchen, Dwug la: Diachronic
     word usage graphs for latin (2021). URL: https:
     //doi.org/10.5281/zenodo.5255228. doi:10.5281/
     zenodo.5255228.
[30] D. Schlechtweg, F. D. Zamora-Reina, F. Bravo-
     Marquez, N. Arefyev, Sense through time: di-
     achronic word sense annotations for word sense
     induction and lexical semantic change detec-
     tion, Language Resources and Evaluation (2024).
     URL: http://dx.doi.org/10.1007/s10579-024-09771-7.
     doi:10.1007/s10579-024-09771-7.
[31] A. Conneau, K. Khandelwal, N. Goyal, V. Chaud-
     hary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
     L. Zettlemoyer, V. Stoyanov, Unsupervised Cross-
     lingual Representation Learning at Scale, in:
     D. Jurafsky, J. Chai, N. Schluter, J. R. Tetreault
     (Eds.), Proceedings of the 58th Annual Meeting
     of the Association for Computational Linguistics,
     ACL 2020, Online, July 5-10, 2020, Association for
     Computational Linguistics, 2020, pp. 8440–8451.
     URL: https://doi.org/10.18653/v1/2020.acl-main.747.

</pre>